Thursday, March 13, 2008

The data aggregator and you

In last week’s installment, I wrote about how much of a data trail each of us leaves on the Internet and in our daily lives. This week, it’s time to look at why that data trail has become so valuable.

If your personal data were scattered to the four corners of a nearly infinite Internet, then storing your personal data online would pose few problems. In fact, in the years B.G. (Before Google), putting the data clues together would have taken the skills of dozens of Sherlock Holmeses. Now, software technology has found a means to overcome that formerly mind-numbing and time-consuming task of data mining: the data aggregator.

The aggregator allows databases of information to be compiled together and the data to then be parsed in whatever way the customer chooses. For instance, an aggregator allows a marketing firm to take the records of those who have filed bankruptcy and compare it to a state’s licensing database. Why would that be valuable? That data would be priceless for a bank that is looking to push high-interest auto loans because they would know which individuals, who could not otherwise get credit, would soon be in the market for a new car, by the age of their current vehicle.

The more data saved in a database and the longer time span the data covers, the more valuable it becomes. This is one of the reasons Google is sitting on a marketing gold mine:

Because Google saves the results of consumer searches for a long time, perhaps forever, and because it has the search string and Internet address of many searchers, it can do real damage with database matching, which involves taking information from one context, like searching, and equating it to an unrelated venue, like product shopping on an e-commerce site or commenting on a blog. . . . Its g-mail product looks at the content of e-mails so that Google can serve up targeted banner ads. Google Desktop and related products index material stored on home and office computers. . . .Google provides no guarantee, contractual or legal that . . . misuse will never occur. In the meantime, it keeps collecting information. . . .[1]

Businesses are not the only entity to see the value in the aggregation of data. The U.S. government and a number of other national entities are utilizing aggregators in their pursuit of those deemed “dangerous to national security.” In the United States, those aggregators are being called “fusion centers.”

Fusion centers are somewhat controversial and mysterious – the public does not know what goes on inside. . . .they [the intel and security agencies] share current threat information from each of their organizations. Multiple databases from different agencies bring gigabytes of law enforcement and intelligence information into the fusion center....there are now almost 70 fusion centers nationwide. . . .[2]

The scope of data collected by both governmental and corporate aggregators extends well beyond the obvious: demographic, purchasing, and net behavior tracking. “The data collected extends beyond [even] information about consumers’ views of products to information about the consumers themselves, often including lifestyle details and even a full psychological profile.”[3]

This aggregation of data is even extending beyond traditionally sacred barriers of privacy such as that of the physician-patient relationship. Both Google and Microsoft have recently announced that they will collect and aggregate medical, psychiatric, and pharmacy records in order to make them available, by subscription, via the Internet.

Google said it has signed deals with hospitals and companies - including medical tester Quest Diagnostics Inc, health insurer Aetna Inc, Walgreens and Walmart Stores Inc pharmacies. The password-protected Web service stores health records on Google computers, with a medical services directory that lets users import doctors' records, drug history and test results. . . . "We don't know how to suck it out of the brains of doctors, but we know how to suck it out of the computer systems of doctors," [Chief Executive Eric] Schmidt said in an interview after his speech. . . . [4]

Meanwhile, Microsoft has introduced a competing product called “HealthVault.” Both medical aggregation services will allow the data to be utilized by other platforms to search for specialized groups or even individuals within the database. Schmidt was also quoted as saying: “There are a lot of applications you can’t envisage today.” [5] Unfortunately, even those who are advocating a U.S. Government electronic medical records sharing system admit that the privacy of the patients’ records is not their highest priority as they move forward with the initiative.[6] In fact, the medical records which are already online have, according to the Department of Homeland Security, become a target of Chinese and Russian hackers.[7]

Medical aggregators are dwarfed in size and resources by other aggregators, such as the three major American credit bureaus, which spend their time and effort creating data profiles of every person in their sampling group. Many of these, such as Equifax and Experian, operate in the open and regularly deal with the individuals they profile. However, there are many data aggregators who are little-known and prefer to keep it that way. Among them are ChoicePoint, which holds approximately 19 billion records for almost every adult in the U.S., and Acxion whose Profiler program alone gathers data on over 95 percent of American homes.[8] There are a number of other large and specialty aggregators which operate in the public shadows:

Catalina Marketing Corporation maintains supermarket buying history databases on 30 million households. . . . Aristotle, Inc. markets a database of 150 million voters. . . . . Donelly Marketing Information Services of New Jersey keeps track of 125 million people. Wiland Services has constructed a database containing over 1,000 elements, from demographic information to behavioral data on over 215 million people.

There are around five database compilers that have data on almost all households in the United States. . . . Credit reporting agencies also prepare investigative consumer reports, which supplement the credit report with information about an individual’s character and lifestyle. . . . the Global Regulatory Information Database (GRID) gathers information from more than 20,000 sources around the world. GRID’s purpose is to help financial companies conduct background checks of potential customers for fraud, money laundering, terrorism, and other criminal activity.[9]

Aggregation allows companies and governments the opportunities to provide personally targeted marketing. Before computing made such data compilation possible, marketing campaigns anticipated very low responses to their efforts. Direct mail, for instance, would, before aggregation, consider a campaign a success if two percent or more of those who received the flyer in the mail responded.

Compiling the data allows marketers to target specific demographic groups, or groups based on any variable depending upon the level of detail of the collected data. More data equals more precise marketing; even if it means invading one’s privacy to make certain that one is sent a coupon for the proper shade of lipstick. What is even more astonishing about this current situation is that the average user is blissfully unaware of the erosion of their privacy, or they have placed such a low value upon their privacy that they have bartered it away for other digital narratives such as “security” or “a semblance of a relationship.” So, all of the sudden, you may be getting advertising that sees a lot more interesting to you; but, the downside is that this data doppelganger which has been created tells all your secrets and it doesn’t care to whom it spills the beans.

Next week, a look at the illusory and changing face of privacy rights in the U.S.

[1] David Holtzman, Privacy Lost: How Technology is Endangering Your Privacy, (San Francisco: Jossey-Bass, 2006), 12-13.
[2] Ben Bain, “A New Threat, A New Institution: The Fusion Center,” Federal Computer Week, February 18, 2008, 18, 20.
[3] Daniel J. Solove, The Digital Person: Technology and Privacy in the Information Age, (NY: New York University, 2004), 17.
[4] Barbara Liston, “Google Unveils Personal Medical Records Service,” Scientific American, February 28, 2008,
[5] Richard Waters, “Google Reveals Plans For Health Databases,” Financial Times, February 28, 2008,
[6] Nancy Ferris, “Persona Non Grata?” Government Health IT, February, 2008, 18 – 19.
[7] Nancy Ferris, “Foreign Hackers Prey on U.S. Health Records,” Government Health IT, February, 2008, 6 – 7.
[8] Holtzman, 190.
[9] Solove, 20-21.

-B. Keith Murphy is the Associate Dean for The College of Arts and Sciences at Fort Valley State University.

No comments: