The Problem To Be Solved.
A good friend of mine had found that an Excel spread sheet no longer had a formulae in a particular cell. It had a number there instead. This meant that any changes in other cells associated with the missing formulae had no effect on the result in the TOTAL column. He wanted to go back to a time when the spreadsheet had the correct formulae in it. He had many backup files and feared he might have to examine each file until he found when the problem occurred. So I advised him of a quick way to examine his backups using a binary chop method to look at just a few critical files.
The Binary Chop Method
The Binary Chop method can only be operated on a sorted list. In my friend’s case all the backups were in a folder where they were sorted in DATE ORDER. This was good because it was the date when the change occurred that he wanted to find.
The Binary Chop method involves initially testing the item nearest to the middle of the list (not necessarily the middle date) and chopping the list to create two new sections of equal length. (One section can have one more item than the other if together they have an odd number of items in total. That will give:
- a lower section from the bottom of the list up to the tested item but not including it,
- an upper section from the tested item (and including it) to the top of the list.
NOTE: It’s important to continue including the tested item in one of the two sections. It must be consistently included in the same type of section, i.e. either the upper section or lower section. I have chosen here to include it in the upper section.
According to the result of the test either the lower section or the upper section is chopped in a similar way using a new test point in the middle of the appropriate section. This is a reiteration of the initial testing and chopping process.
The tests and chops are reiterated over and over on ever decreasing data as the upper and lower sections under test get smaller. When the next section to be tested is down to one data item the search is over and that item is seen as the result of the search.
My Friend’s Tests
My friend had seen that the latest spread sheet was in error and had checked the first use of the spread sheet to check that the formulae had been copied from the template used initially. So it just remained to check all the backups in between the first and the last.
When using the Binary Chop method the next thing is to choose a backup that is half way through the list and test it. Then if it is in error the search is confined to the oldest section of the backups (the lower section). Else if the formulae is found the search is confined to the latest section of the backups (the upper section).
The same principal is then applied to whichever section is searched next. i.e. choose a backup that is half way through that section and test it. Then if it is in error the search is confined to the oldest part of that section of the backups. Else if the formulae is found the search is confined to the latest part of that section of the backups, i.e. approximately a half of one of the two initial sections or roughly a quarter of all the backups.
The same principal must then be applied to whichever quarter must be searched. i.e. choose a backup that is half way through that quarter and test it. Then if it is in error the search can be confined to the oldest half of that quarter of the backups. Else if the formulae was found the search can be confined to the latest half of that quarter of the backups, i.e. half of a quarter or an eighth of all the backups.
This method will narrow down the backups that must be tested until there is either one or two left. Testing one or both as required will show when the error occurred.
The Binary Chop method reduces the number of tests which have to be conducted considerably. The proportion of tests required reduces as the number of files to be considered rises. See Example 1, Example 2, and Example 3 compared in the section Binary Chop – A Detailed Look below:
Binary Chop – A Detailed Look
The Binary Chop method is commonly used by computers to search sorted lists. Computers can have some very long lists to search and so it can save a lot of time. As I have mentioned the list to be searched must be sorted and if it is not it can take time to sort it into order. With a computer time can be saved if the list is sorted into order as items are added to the list. The overall sorting time may not be reduced but the time taken to sort one item as it is entered is very small compared to the time taken to make an entry. This is especially so if there would otherwise be some idle time. So the sorting time is not noticed by the user.
Each iteration of testing and chopping reduces the amount of data left to search according to the following binary series hence the term “Binary Chop”:
Let i = iteration number,
Let d = total number of data items to search,
Let r = remaining number of data items to search,
Then:
if i = 1 then r = d/21 = d/2,
if i = 2 then r = d/22 = d/4,
if i = 3 then r = d/23 = d/8,
if i = 4 then r= d/24 = d/16,
if i = 5 then r = d/25 = d/32,
if i = 6 then r = d/26 = d/64,
if i = 7 then r = d/27 = d/128.
So if d = 128 there will only be 1 item left to search on the 7th itteration and only 7 items will have had to be examined in the search.
If the search is for a unique item it may be found on the first iteration or it may take all possible iterations to find it. There is a maximum number of possible iterations ‘n’ needed to obtain the result for any given sorted list size ‘x’ as shown in Expression 1 below:
Expression 1.
2n-1 < x < 2n
Below are three examples using Expression 1.
Example 1.
x1 = 15, n1 = 4 gives:
24-1 < 15 < 24 = 8 < 15 < 16
Example 2.
x2 = 150, n2 = 8 gives:
28-1 < 150 < 28 = 128 < 150 < 256
Example 3.
x3 = 150,000, n3 = 18 gives:
218-1 < 150,000 < 218 = 131,072 < 150,000 < 262,144
NOTE: From the examples above:
x2 = 10x1
but n2 = 2n1
while x3 = 10,000x1
but n3 = 4.5n1
From these examples it can be seen that a 1,000 fold increase in ‘x’ from 150 to 150,000 only results in a 2¼ fold increase in ‘n’ from 8 to 18. So the testing and chopping task doesn’t get significantly harder when the list size grows enormously.
The example below is directly related to my friend’s problem:
Example 4.
Suppose there are 150 untested backup copies of a spread sheet all made on different dates and the first ‘a’ backups are set to use a formulae and the rest are not. Determine which backup (backup a) was the last to use a formulae.
In this example a = 48. (In an actual search this wouldn’t be known in advance but to know it here will allow a better understanding of the test results in each iteration.)
A Special Case
This type of search is a special case in that two results are required:
- the latest backup which contains the formulae, i.e. backup a.
- the oldest backup which doesn’t contain the formulae, i.e. backup a+1.
In most cases it’s only after the maximum number of iterations required have been conducted that the highest backup with a +ve result and the lowest backup with a -ve result can be known with certainty.
This comes about because the backups are sorted by date but it isn’t the date field that is being tested and the field being tested (the one holding the formulae) offers two possible results contained it two groups.
How To Perform Each Iteration
The equations used in each iteration are:
Equation 1.
current test item = current lowest item + quotient[(current highest item – current lowest item) , 2] + 1
In this equation the modulus (remainder) of each chop is removed, i.e 75÷2 = 37.5 whereas what is required is (75÷2) – mod[75÷2] = 37. An alternative form of notation used here is quotient[75, 2] = 37.
Initially:
- the lowest item = 0;
- the highest item = number of items in the list.
Once there is a result from the first iteration Equation 2 and Equation 3 below determine the lowest and highest item values.
Equation 2.
If result = -ve then next lowest item = previous lowest item, and next highest item = current test item – 1.
Equation 3.
If result = +ve then next lowest item = current test item, and next highest item = previous highest item.
The Style Used In This Example’s Iterations:
Testing Formulae & Next Group Selection Style Used When The Result Is Negative
This uses Equation 1 and Equation 2.
Iteration i. Test backup #, i.e. current test item = current lowest item + quotient[(current highest item – current lowest item), 2] + 1. Result = -ve. Select next group: (previous lowest item) to (current test item – 1) for iteration i+1.
Testing Formulae & Next Group Selection Style Used When The Result Is Positive
This uses Equation 1 and Equation 3.
Iteration i. Test backup #, i.e. current test item = current lowest item + quotient[(current highest item – current lowest item), 2] + 1. Result = +ve. Select next group: (current test item) to (previous highest item) for iteration i+1.
Iteration Notes
- Follow the colour scheme from one iteration to the next.-ve results don’t contain the formulae and the group below the tested backup is chopped next.
- +ve results contain the formulae and the group above the tested backup is chopped next.
- From Expression 1 above n = 8 when number of items x = 150, i.e. A maximum of 8 iterations are required.
The Iterations
Select first group 0 to 150 for iteration 1.
Iteration 1. Test backup 76, i.e. 0+quotient[(150–0), 2]+1. Result = -ve. Select next group: 0 to 75 for iteration 2.
Iteration 2. Test backup 38, i.e. 0+quotient[(75–0), 2]+1. Result = +ve. Select next group: 38 to 75 for iteration 3.
Iteration 3. Test backup 57, i.e. 38+quotient[(75–38), 2]+1. Result = -ve. Select next group: 38 to 56 for iteration 4.
Iteration 4. Test backup 48, i.e. 38+quotient[(56–38), 2]+1. Result = +ve. Select next group: 48 to 56 for iteration 5. This is the test that would produce the latest backup with a formulae.
Iteration 5. Test backup 53, i.e. 48+quotient[(56–48), 2]+1. Result = -ve. Select next group: 48 to 52 for iteration 6.
Iteration 6. Test backup 51, i.e. 48+quotient[(52–48), 2]+1. Result = -ve. Select next group: 48 to 50 for iteration 7.
Iteration 7. Test backup 50, i.e. 48+quotient[(50–48), 2]+1. Result = -ve. Select next group: 48 to 49 for iteration 8.
Iteration 8. Test backup 49, i.e. 48+quotient[(49–48), 2]+1. Result = -ve. All iterations completed. This is the test that would produce the oldest backup without a formulae.
Result
The maximum number of iterations have been completed.
- Iteration 4 produced the highest backup (Backup 48) with a +ve result.
- Iteration 8 produced the lowest backup (Backup 49) with a -ve result.
- Note the results above are adjacent.
So backup 48 must be the last one to have a formulae, i.e. a = 48. Q.E.D.
Tests To Prove The Formulae Work At Both Ends Of The List
Sometimes it is possible to devise a list searching/processing algorithm that works in the middle of the list but fails to either process the lowest or highest items in the list correctly. Below I have worked through four tests for my formulae where:
- the latest backup with a formulae is the lowest item on the list.
- the oldest backup without a formulae is the highest item on the list.
- all backups have a formulae.
- no backups have a formulae.
Test 1: Prove Formulae Work When Only Backup 1 Has A Positive Result
Select first group 0 to 150 for iteration 1.
Iteration 1. Test backup 76, i.e. 0+quotient[(150–0), 2]+1. Result = -ve. Select next group: 0 to 75 for iteration 2.
Iteration 2. Test backup 38, i.e. 0+quotient[(75–0), 2]+1. Result = -ve. Select next group: 0 to 37 for iteration 3.
Iteration 3. Test backup 19, i.e. 0+quotient[(37–0), 2]+1. Result = -ve. Select next group: 0 to 18 for iteration 4.
Iteration 4. Test backup 10, i.e. 0+quotient[(18–0), 2]+1. Result = -ve. Select next group: 0 to 9 for iteration 5.
Iteration 5. Test backup 5, i.e. 0+quotient[(9–0), 2]+1. Result = -ve. Select next group: 0 to 4 for iteration 6.
Iteration 6. Test backup 3, i.e. 0+quotient[(4–0), 2]+1. Result = -ve. Select next group: 0 to 2 for iteration 7.
Iteration 7. Test backup 2, i.e. 0+quotient[(2–0), 2]+1. Result = -ve. Select next group: 0 to 1 for iteration 8. This is the test that would produce the oldest backup without a formulae.
Iteration 8. Test backup 1, i.e. 0+quotient[(1–0), 2]+1. Result = +ve. This is the test that would produce the latest backup with a formulae.
Test 2: Prove Formulae Work When Only Backup 150 Has A Negative Result
Select first group 0 to 150 for Iteration 1.
Iteration 1. Test backup 76, i.e. 0+quotient[(150–0), 2]+1. Result = +ve. Select next group: 76 to 150 for Iteration 2.
Iteration 2. Test backup 114, i.e. 76+quotient[(150–76), 2]+1. Result = +ve. Select next group: 114 to 150 for Iteration 3.
Iteration 3. Test backup 133, i.e. 114+quotient[(150–114), 2]+1. Result = +ve. Select next group: 133 to 150 for Iteration 4.
Iteration 4. Test backup 142, i.e. 133+quotient[(150–133), 2]+1. Result = +ve. Select next group: 142 to 150 for Iteration 5.
Iteration 5. Test backup 147, i.e. 142+quotient[(150–142), 2]+1. Result = +ve. Select next group: 147 to 150 for Iteration 6.
Iteration 6. Test backup 149, i.e. 147+quotient[(150–147), 2]+1. Result = +ve. Select next group: 149 to 150 for Iteration 7. This is the test that would produce the latest backup with a formulae.
Iteration 7. Test backup 150, i.e. 149+quotient[(150–149), 2]+1. Result = -ve. All iterations completed. This is the test that would produce the oldest backup without a formulae.
Iteration 8. Not Required.
Test 3: Prove Formulae Work When All Backups Have A Positive Result
Select first group 0 to 150 for Iteration 1.
Iteration 1. Test backup 76, i.e. 0+quotient[(150–0), 2]+1. Result = +ve. Select next group: 76 to 150 for Iteration 2.
Iteration 2. Test backup 114, i.e. 76+quotient[(150–76), 2]+1. Result = +ve. Select next group: 114 to 150 for Iteration 3.
Iteration 3. Test backup 133, i.e. 114+quotient[(150–114), 2]+1. Result = +ve. Select next group: 133 to 150 for Iteration 4.
Iteration 4. Test backup 142, i.e. 133+quotient[(150–133), 2]+1. Result = +ve. Select next group: 142 to 150 for Iteration 5.
Iteration 5. Test backup 147, i.e. 142+quotient[(150–142), 2]+1. Result = +ve. Select next group: 147 to 150 for Iteration 6.
Iteration 6. Test backup 149, i.e. 147+quotient[(150–147), 2]+1. Result = +ve. Select next group: 149 to 150 for Iteration 7.
Iteration 7. Test backup 150, i.e. 149+quotient[(150–149), 2]+1. Result = +ve. All iterations completed. This is the test that would produce the latest backup with a formulae because the result is positive. Because it is the latest backup of all it would show that ALL spreadsheets have the formulae and that the original premise that some later ones don’t is false.
Iteration 8. Not Required.
Test 4: Prove Formulae Work When All Backups Have A Negative Result
Select first group 0 to 150 for iteration 1.
Iteration 1. Test backup 76, i.e. 0+quotient[(150–0), 2]+1. Result = -ve. Select next group: 0 to 75 for iteration 2.
Iteration 2. Test backup 38, i.e. 0+quotient[(75–0), 2]+1. Result = -ve. Select next group: 0 to 37 for iteration 3.
Iteration 3. Test backup 19, i.e. 0+quotient[(37–0), 2]+1. Result = -ve. Select next group: 0 to 18 for iteration 4.
Iteration 4. Test backup 10, i.e. 0+quotient[(18–0), 2]+1. Result = -ve. Select next group: 0 to 9 for iteration 5.
Iteration 5. Test backup 5, i.e. 0+quotient[(9–0), 2]+1. Result = -ve. Select next group: 0 to 4 for iteration 6.
Iteration 6. Test backup 3, i.e. 0+quotient[(4–0), 2]+1. Result = -ve. Select next group: 0 to 2 for iteration 7.
Iteration 7. Test backup 2, i.e. 0+quotient[(2–0), 2]+1. Result = -ve. Select next group: 0 to 1 for iteration 8.
Iteration 8. Test backup 1, i.e. 0+quotient[(1–0), 2]+1. Result = -ve. This is the test that would produce the oldest backup without a formulae because the result is negative. Because it is the first backup of all it would show that NO spreadsheets have the formulae and that the situation was worse than thought.
Reference:
- Binary search algorithm: http://en.wikipedia.org/wiki/Binary_search_algorithm
- Definition of Binary Chop: http://cplus.about.com/od/glossar1/g/binary_chop.htm
- Wolfram Mathematica: http://www.wolfram.com/mathematica/
Leave a Reply