Predictive Analytics

Understanding the Ethical Limits of Using Public Data in Analytics

By June 28, 2016 No Comments

The powers of business intelligence software that incorporates big data are great and diverse for most businesses. Many important decisions that used to require some basic level of information and gut instinct can now be informed by data that gives nigh-concrete evidence. However, it's important to consider the data sources. There are many non-local data points that companies can use that will help them make better assessments. Public data is also available when it suits the purpose of operations. However, if a company chooses to take on public data, it must be aware of the ethics and rules associated with utilizing it to avoid causing harm to those with data on there, as a recent incident demonstrated.

Public data is not free to use
PC Magazine reported in mid-May 2016 a group of scientists in Denmark publicly released the data of more than 70,000 users on the dating site OKCupid. The scrape included user names, ages, genders, locations and what questions they answered on the site's extensive survey, which is used for finding suitable matches. The release of the profiles served no purpose other than to provide data to anyone interested. This dataset was placed on the forum for the online journal Open Differential Psychology.

"Data being public doesn't grant anyone permission to use it."

There were immediate conflicts following news of the release. The reason for this is that the scientists claimed that the data was already public by being on the dating site, so that in and of itself gave them permission to perform the scrape and release. However, as Dr. Michael Zimmer noted in a column for Wired magazine, data being available to the public does not automatically grant an individual or group of people permission to use it. There is a principle in ethics known as informed consent, in which a person must give their express permission after being told what action is being performed, which applies to such research studies. The people behind the release failed to abide by this rule.

Moreover, the scientists, in their release, heavily implied that they created an OKCupid profile as part of the process. Many profiles on the dating site are only visible to registered users. This indicates a further violation in that they collected profiles that weren't available to the public at all. Many researchers claimed the Danish scientists flagrantly violated ethics in this regard. Businesses that want to use public data should look to this story as a cautionary tale on how to work with sources.

X