Privacy & Ethics Questions

Back to the full list of FAQs about learning analytics

1. What is Privacy?

The right to privacy is a basic human right and an established element of the legal systems in developed countries. Already in 1890, Warren & Brandeis wrote an article about “The Right to Privacy”, where they explained that privacy is the “right to be let alone”, and focused on protecting individuals. This right is often debated in the context of the yellow press with regards to royals and celebrities.

The concept of privacy as a right to be let alone was further developed by Westin in 1968 who made it clear that new technologies change the balance of power between privacy and societal technologies. From this, Westin went on to specify privacy as the “right of informational self-determination” and as a vital part for restricting government surveillance in order to protect democratic processes.

According to Westin [ibid.], each individual is continually engaged in a personal adjustment process in which they balance the desire for privacy with the desire for disclosure and interaction with environmental conditions and social norms. Flaherty (1989) took the informational self-determination further and claimed that networked computer systems pose a threat to privacy. He first specified ‘data protection’ as an aspect of privacy, which involves “the collection, use, and dissemination of personal information“. This concept forms the foundation for fair information practices used by governments globally. Flaherty promoted the idea of privacy as information control. Roessler (2006) later operationalised the right to privacy across three dimensions: 1. Informational privacy, 2. Decisional privacy, 3. Local privacy. It is important to note that privacy is not the same as anonymity or data security. They are related concepts that have an effect on privacy, but do not represent privacy as such.

2. What is Ethics?

Ethics is a moral code of norms and conventions that exists in society externally to a person, whereas privacy is an intrinsic part of a person’s identity and integrity. The understanding of what constitutes ethical behaviour varies and fluctuates strongly over time and cultures. Research ethics have become a pressing and hot topic in recent years, first and foremost arising from discussions around codes of conduct in the biomedical sciences such as the human genome, but also, more recently, in the shape of “responsible research and innovation” (RRI) which is being promoted by the European Commission .

The first basic written principles for ethical research originated from the Nuremberg trials in 1949, and were used to convict leading Nazi medics for their atrocities during the Second World War (Kay et al., 2012). The basic principles derived from the development of research ethics since the Nuremberg Code (cf. Kay et al. 2012) can be summarised as:

  • Voluntary participation in research;
  • Informed consent of the participants, and, with respect to minors, the informed consent of their parents or guardians;
  • Experimental results are for the larger good of society;
  • Not putting participants in situations where they might be at risk of harm (either physical or psychological) as a result of participation in the research;
  • Protected privacy and confidentiality of the information;
  • Option to opt-out;

A recent example for an ethical debate about a Big Data experiment was the facebook contagion study (Kramer, Guillory & Hancock, 2014), where a team of researchers manipulated the newsfeed of over 650.000 facebook users without notification or informed consent. The (negative) reaction to this manipulation has been massive among the user community and beyond. However, ethics is a volatile human made concept and what we see after the facebook study, is, that researchers now discuss the pros and cons of the study. Some people argue that the study has indeed been unethical, but, at the same time, contributed new insights into human behaviour (Kleinsman & Buckley, 2015).

3. What data will I need to collect?

This depends on the aims the data are collected for. If the data are collected in order to provide personal recommendations to the student, then rather fine granulated data collectio will be needed. Such detailed data can be captured via metadata standards such as xAPI or IMS Caliper that allow different analysis and feedback mechanisms (see the LACE report D7.4 Learning Analytics Interoperability: Requirements, Specifications and Adoption).

This data can enable educational institutions to provide students with information about their own performance compared to a close neighbourhood of similar students or to a larger group of students within the whole course. Or institutions can make predictions based on the comparison of one student’s performance to that of previous clusters of similar students with the same performance level. In any case, according to EU Data Protection Directive 95/46/EC article 12 students as data subjects have the right to view their personal data that are collected about them, and also to be able to opt-out of the data collection at any moment. See the LACE Review Paper Is Privacy a Show-stopper for Learning Analytics? A Review of Current Issues and their Solutions.

4. Should I be told what happens to the data that is collected?

Open online course providers have to specify what will be done with the collected data and who has access to them. This is true both for conventional providers and for for example those offering MOOCs. A study presented by Prinsloo and Slade at LAK 15, Student privacy self-management: implications for Learning Analytics, investigated and compared the terms and conditions of three large MOOC providers, i.e. Coursera, FutureLearn and edX (Prinsloo & Slade, 2015). The authors analysed the TOCs according to seven criteria: 1. Length of TOC, 2. Types of data collected, 3. Methods of data collection, 4. Conditions for sharing data, 5. Use of data, 6. User access to, responsibility and control of data, and 7. Institutional duty of care.

The authors conclude that for the use of learning analytics in MOOCs students are not sufficiently informed of the ways their data is being used to track progress or to offer partial insight with no opportunity to opt-out. They point out that the approach taken should be a more intelligent one as for such smart learning environments there simply should not only be a choice between opt-in or opt-out. Prinsloo and Slade (2015) therefore call for a more pro-active engagement with the students to inform and more directly involve them in the ways individual and aggregated data are being used.

5. Who owns the collected data?

Data ownership is a very complex legal concept. In principle, the data traces someone leaves behind belong to the data subject. But in practice, the data subject cannot manage all quantity of data breadcrumbs that they leave behind on a daily basis. Because of this data subjects often appreciate the services of providers who take care of the data storage and management. A data subject has the right to view their data according to EU Data Protection Directive 95/46/EC article 12.

Data ownership becomes even more complex, however, when we consider the processing of data. If there is a computational model developed out of a collection of data traces from a system, do the data subjects then also have some information rights on this data model? Can a student still opt-out of such a data model that is being generated from their data traces? In this way data ownership is mainly determined by the technical power the service providers offers to the data subjects. There are proposals to change this power relationship by enabling individuals to curate their own datasets through personal data stores. If this idea were to be realised, then it would fundamentally change the whole process of data movement in education. But, in practice, aggregated and processed data no longer belongs to the data subject. A data model derived from a collection of data points of data subjects belongs to the data client if this entity had the right to collect the original data. So it may be the case, that a data client needs to remove single data entries from a data subject but still can hold the data model computed out of the collected data.

In the medium term the ownership of data may be clarified by the the General Data Protection Regulation (Regulation (EU) 2016/679 of the European Parliament and of the Council 27 April 2016). It will, however, take some time to clarify how this works in practice. See also the LACE Review Paper Is Privacy a Show-stopper for Learning Analytics? A Review of Current Issues and their Solutions for a discussion of some of these issues.  

6. Can data be anonymised?

Anonymisation is often seen as an ‘easy way out’ of data protection obligations. Data owners often consider that the replacement identifiers as sufficient to make data anonymised. But many studies have shown that this kind of anonymisation is better described as pseudonymisation. Data that has been anonymised in this weak sense can rather easily be de-anonymised when the data are joined with other data sources. A famous example is Sweeney’s paper from 2011, Simple Demographics Often Identify People Uniquely, which shows how a medical dataset can be de-anonymised: the names of patients were not included but demographic information that could be linked to registries of voter lists was included and allowed the retrieval of names and contact information for the medical data. Among those relative easy approaches are also more computational ones as presented by Narayanan and Shmatikov in their paper from 2008 Robust De-anonymization of Large Datasets (How to Break Anonymity of the Netflix Prize Dataset). They showed how some users could be re-identified from the dataset of the Netflix competition by combining the dataset with data from the movie platform IMDB. These and other examples show that it is difficult to achieve robust anonymisation by means of an increase of computational methods and calculation power. Similarly, the residual risk of identification has to be taken into account also for educational data and Learning Analytics.

A potential solution to the issue of anonymisation could be a timestamp when data should be deleted in order to protect privacy. There are some interesting solutions like the concept of data degradation. Van Heerde’s dissertation from 2010 Privacy aware data management by means of data degradation: Making private data less sensitive over time. Proposes a way that lets data decay over time while protecting the privacy and informational self-determination of data subjects.

See also the LACE Review Paper Is Privacy a Show-stopper for Learning Analytics? A Review of Current Issues and their Solutions.