For a long time I’ve had two ideas for software applications that I had dreamed up, but since I’m not a good programmer, I could never really build them to test them. I finally tried to figure out one of my ideas on paper with my limited mathematical skills. The idea was a software compression program that would compress any information to extremely small sizes, like a byte. I’m not very technical about computer language and the fewest bits of information required to contain my compressed data, but basically there would be the level of the compression, so if we imagine the data was compressed a thousand times, then that would be the level, “1000”, then you would have the compressed data which should be about two or three letters or symbols. That should be all that the compression software would need to decompress the compressed package back to it’s original size. The only other thing the compression software would need is the library of symbols that were used to compress the data file.

The problem I ran into when I finally worked out my program mathematically was the combinations you would require to continue to compress to the next level and then a bigger library of combinations for the next level, up till the data was finally down to about two or three letters or symbols.

Let’s say we are compressing data that contains only the English alphabet, without any other symbols *we’ll also ignore spacing*. That means we have 26 letters total and my compression program takes two-letter combinations and turns them into one symbol that represents the two letter combination. Let’s say aa becomes the symbol x. If you go through the whole data and replace all two-letter combinations with symbols, you will have reduced your data file by about half. The problem is that you have increased your library of symbols that represent each two-letter combination. So if you had 26 alphabet letters, two-letter combinations for a total of 676 pairs, but you also have to include a symbol for each of the 26 letters in case you end up with single letters that don’t have a pair at the end of the compression cycle. So you end up with 702 combinations.

Now, the second level of compression has halved the size of the data file, but now to go to the third level of compression you’ll have to create a new set of library symbols for the 702 symbols that you need to compress two symbols at a time, so this creates a new library of 492,804 two-symbol combinations, plus adding a symbol for each of the 702 previous level symbols that have no pairs, so you end up with 493,506 total combinations for level three compression.

I think you’re starting to see the problem I was seeing when I was working out my compression program on paper. The library starts to get pretty big fast as you go down each level of compression. By level four compression, I am dealing with a library of symbols the size of about 243 billion. By level four compression, you will have reduced the data file by an immense amount, but I have enlarged the symbols that represent the data by billions.

I believe that the data file could continue to be reduced to the size I mention, bytes, maybe bits if computers can work with very little bits of information that contains just the level of compression and the final symbols that you are left with. So it’s my belief that you could compress the universe into bytes, but the library required to decompress the data file back to its full size would probably be bigger than the size of the universe.

My other software application I know will work. It’s an encryption program that I believe no computer could break, even if you gave the computer a hundred pages of the encrypted data, a thousand pages, to decrypt. The only flaw this program has is that you have to maintain the library somewhere and so getting at the library would be the only way to break the encryption.

*Math is not my strong point, so if there are errors in calculations, my apologies.*

Read Full Post »