A flawed random-number theory
IBM's Privacy Research Institute recently revealed techniques that aim to preserve individual privacy while giving e-businesses information to generate data models. These techniques scramble or "randomize" private information and reconstruct data distributions at an aggregate level to perform data mining. This means that Web site administrators and merchants can use scrambled data without knowing the underlying private information.
Let's say I enter 45 in a forthcoming Java application that uses the IBM techniques to provide a merchant with age information in return for a music sample. The Java app takes my age and adds or subtracts a random value. The value would differ with each user. Then it sends the new number to the merchant. So, my 45 years may be reduced to 32. This program may also increase my net worth in a single keystroke! I like it already, but what's the value to the merchant?
Although the numbers change, the allowed range of randomization does not. That range is linked to an acceptable range of data at an aggregate level-and a level of privacy. The merchant might not care about my exact age, but it might like to know I'm between 30 and 50. Large randomizations will increase the personal privacy for users but reduce accuracy for merchants. If my age were randomized to 17, that would hardly be valuable to a merchant if it were used in conjunction with the title to the music I requested. Not too many 17-year-olds are into Bob Dylan.
[ Read more ]