Towards a Manifesto for Data Ownership


In this guest blog post Kirsty Kitto of Queensland University of Technology, Australia, asks who owns educational data. Her answer leads to an invitation to join her building no less than a genuine data pathway for a lifelong learning ecosystem.

If its free then you are the product: That is what someone I was working with last year (Sue Walsh) always said while we were trying to rethink how my institution (QUT) could deliver fully online and flexible modes of learning to postgraduate students.

Each and every day we reveal an enormous amount of information about ourselves online. Websites send information all over the place about what sites we are visiting and what we do there (even with adblock installed and running, lightbeam shows that I am sharing a scary amount of data with other sites when I am browsing). We leave these digital traces everywhere we go. You might have managed to stay away from social media… maybe you use some pretty strict security settings and think you are safe, but you could find yourself rather surprised if you go to this site and use it to find use it to find out how unique – and therefore trackable – your browser is.

This is a problem. Don’t get me wrong.. I love data! Most of my job as an academic revolves around trying to get more of it and thinking about novel ways in which to use it. However, I do feel a certain amount of sympathy for the idea that: “algorithms have become a market unto themselves, in which the competition and the proprietary logic appear to seriously hinder attempts to understand, criticize or re-appropriate them” Tyler Butler Reigeluth, p251. Lets face it, most of us can’t even map out the ecosystem of data traces that we leave… which leaves us at the mercy of the large companies that do manage to do something like this. (And plenty of them are doing it already.) People have a much less developed understanding of the value of our data than companies. This means that we often just give away an incredibly valuable resource for a new toy that has a couple of pretty trivial bells and whistles. The companies most involved in collecting that data usually say that we could just opt out… but everyone knows that this is not really an option in the modern world. Sometimes the service being offered is a very good one. I remember the days before web search actually worked, and I would not like to return to them. However, the price I pay for a decent search engine (in data) is very high. Do we really have a choice not to accept the terms and conditions and miss out on the service? I think that this defence is disingenuous and rather missing the point.

What then of education?

In the educational sphere such issues become even sharper. Who owns the educational data that is generated by student interactions with EdTech and Learning Analytics? Here is an easier starting question: How many of your students see any of the data that you collect about them? If I ask our students to tell me what data we have collect about them at QUT then I usually find out that they have a very shallow understanding of what is going on. They are also quite often shocked when I start telling them about the kinds of data that we do collect. In the Open University, with its world leading policy on the ethical use of student data we see a genuine attempt to fix this problem, and I would very much like to see more institutions heading in this direction. Similarly, Jisc has released a code of practice for learning analytics which sets out principles around: Responsibility; Transparency and consent; Privacy; Validity; Access; Enabling positive interventions; Minimising adverse impacts; and Stewardship of data. Thus we can see that privacy and consent are key factors that are increasingly recognised as essential to Learning Analytics.

These policies are a great starting point, but I would like to see the emphasis moving even further on… Privacy is very important, but to me everything comes down to Data Ownership. In contrast to the above two educationally focussed policies, in this article by Christine O’Keefe we see a brief discussion of a detailed set of Guidelines for the Ethical Use of Digital Data in Human Research that were written by Health researchers. Health has been worrying about the ethical use of data for much longer than education, and I think that we should be taking their ideas very seriously. That report suggests that ownership is one of the key ethical issues that require consideration in the use of all digital data (in addition to four other issues that revolve around: consent; privacy; data sharing; and governance/custodianship). I am inclined to agree with this point, but I think that the other four issues can actually be subsumed into that of ownership. As long as we take one particular approach to the notion of data ownership: If the people generating data owned it, then consent to use that data would be implicit. They could share the data as they liked, or turn up their privacy settings as desired in a process of data custodianship that was generated by the owner of the data (themselves).

By the way: to all you web scrapers, social media companies, solutions providers, and yes, even educational institutions out there – just to be clear – by talking about “the people generating the data” I do not mean you. I mean the people who actually create personal data via their actions in the world. Not you. You are just collecting it.

We need a Data Ownership manifesto

Personal data feels a bit like it is entering a tragedy of the commons phase. We are all implicitly letting a bunch of people who move first gobble up a resource that rightfully belongs to us all. I think that the only sensible way to get back some control is to take a hard line on data ownership… so I will reiterate my above point just to make sure you all catch it, but this time, I think I will suggest that it should be the first key principle of the manifesto for data ownership that we need to develop:

Manifesto for data ownership, Principle 1: it is the people who create personal data via their actions in the world who own that data.

And yes. I know. There are a whole heap of problems here. For example, on the philosophical front: what if those actions concern two people, and one wants to collect all the data that they can about themselves, but the other wants to maintain complete privacy? How would the data collector be able to store information about this interaction? I would like to see people with more expertise in ethics playing around here. (Yes – I mean you Sharon Slade and Paul Prinsloo – your LAK’15 paper on student privacy self-management was a great step forward here.. but what if you used ownership as the mechanism for giving agency and hence opt-in vs opt-out consent options?)

I can see some solutions for this idea emerging on the horizon too. The concept of a Digital Identity that can be managed by the person who owns it is entering the public discourse, although no technological solutions are proposed as to how this might actually be achieved. However, tools that will start to allow this kind of behaviour are emerging, such as the Hub of All Things (HAT) project, and myWave which allows consumers to sell data about themselves, rather than giving it away. However, the model for these tools is still somewhat naive. For example, I find it hard to imagine that Facebook, Google and Twitter are going to do anything but hold onto the valuable stuff that they get right now.. and for now many companies will opt to buy their data about me rather than myWave’s. I will be curious to see where these ideas go though, and I think that this kind of data market scenario is going to become markedly more sophisticated with a bit of help from some key players.

Learning Analytics could help here. In the EdTech space we have an opportunity to move significantly forward on this thorny set of problems, due mainly to the more restricted domain that we work in. We could lead in a data ownership revolution. Here’s how.

Data Pathways would enable both lifelong learning and data ownership

We need to create data pathways. In a data ownership model, a person should be able to take their learning data with them for life, and it needs to make sense throughout their life. We owe this to our students, as it would allow them to make use of their own learning data in ways that we cannot even envision yet. For that to happen we need the underlying data standards that describe learning data to make sense across multiple contexts. Right now we have two emerging modern educational data standards (xAPI and IMS Caliper), not to mention all the ad hoc formats that are used in research, legacy systems, and other random data sources, which leads to its own problems. My bet is that the two data standards will end up in their natural niches:

  • xAPI is really strong in the professional learning context, and for recording events that occur beyond the LMS (or in the wild, where most of our ongoing lifelong learning interactions occur). Many interesting xAPI solutions and applications are starting to emerge. Check out the past xAPI Camp videos on the Connections Forum webpages for many examples of how xAPI can be used. We are currently using xAPI in a project funded by the Australian Government’s Office for Learning and Teaching that is aimed at enabling connected learning in the wild. The Connected Learning Analytics (CLA) toolkit interfaces with standard social media APIs to store data about student participation in specific learning events (and only if they opt in by signing up). It uses xAPI as its data format to ensure that the data we collect will be portable across different social media stores, and also into the future once the data pathways that we need are created. And yes, we show as much of this data as we ethically can to our students, and we only collect a very small subset of their online data (that which pertains to pre-defined learning activities that they consent to data collection for), but we have many issues even proceeding like that. For example, the Twitter terms and conditions mean that we cannot legally let students download the text of their tweets which means that the data stored in the Learning Record Store we are using is not the complete data set that we collect. How is that for an immediate challenge to data ownership? I think that as a community we should be pushing for change to these kinds of terms, but this would be much easier if we had a data ownership manifesto :)
  • IMS Caliper will probably end up a pretty standard solution for the more defined parts of the learning ecosystem, by which I mean enterprise sized LMSs and large scale applications that want to interact with them easily. This would be really nice, as it would already go a long way towards unifying learning data for this narrow context, but it is a bit hard to tell for sure. IMS have yet to release the standard, and as someone who is not a member of the consortium I do not have access to the data description. This is a shame, as it would make the final part in this story (see below) much easier to get moving if I knew where they were going right now.

We need common vocabularies that allow for the data created in one learning system to be sensibly compared with that created in another. xAPI recipes go some way towards achieving this, and the notion of a Caliper profile might.. but we need concerted efforts to ensure that this all makes sense in the emerging learning data ecosystem that we are headed towards. Aneesha Bakharia (who is working with me to develop the CLA toolkit) has worked out that any data source that takes its format from the Activity Streams Schema is likely to be similar enough that it will be possible to unify the data somehow, and we are working the details of that out now for a paper – watch this space.

Next? We need technology that can enable people to make use of their data pathways. The data needs to be easy to store in the cloud, and easy for data owners to both access, and control access to. They should be able to easily curate their learning data, and to create their own new reports about things that they are interested in. One of the xAPI providers, HT2 has been building a Personal Learning Record Store (PLRS) but they have not really got the killer application for this yet. I think that the killer app is the data pathway.

And this is the final part of the story that I pointed to above. With a proper data pathway, and a well developed data ownership model that was driven by our manifesto, we would be able to genuinely enable people to do things like: identify weaknesses and strengths in their skills and knowledge; find ways in which to progress towards identified goals; uncover hidden relationships in their behaviour data; and much, much, more. We would need a whole heap of learning analytics tools for this, but they are all on the horizon anyway. Data owners could then curate their learning data, sharing parts of it with recruiters, employers, Universities and other trainers as appropriate. This kind of option would allow them to enter into the lifelong learning relationships that George Siemens suggested we need at his HERDSA appearance in Australia earlier this year. Data owners could use their data to claim prior learning, reflect upon their interactions, identify jobs that they never even realised they were close to being able to apply for, and perhaps gain the necessary competency for some dream job if some educatational providers ever learn how to provide this kind of modularised solution in a sensible format. The DeakinDigital approach is a very interesting first attempt towards this idea, but in my opinion we need to help the learner who is undergoing this kind of assessment driven educational format work out where to find the trusted knowledge and online help that they need to complete the task. Coupling an individual’s data pathway with a sophisticated recommendation system that linked to a curated set of open educational resources would be a pretty powerful way of achieving a far more interesting form of personalised learning than the stupid adaptive engines that we are currently getting foisted upon us. This will become even more powerful for those indiduals who have a rich PLRS as learning analtyics starts to move into the workplace, and the LACE LAW manifesto starts to become a reality.

Want to join me in trying to build a genuine data pathway for a lifelong learning ecosystem? I reckon this area is going to get rather exciting for a couple of years… You could start by coming to the LAK’16 hackathon which is going to centre around the Jisc/Apereo analytics initiatives. I am going to be organising a session there on recipes and common vocabularies across systems, and I would really like to get everyone interested in the data ownership manifesto around the table. The recipes that we work on there will help to define the first data pathways that we start to build as a community… for all data users… and most especially for the data owners.

Get in touch with me if you don’t want to wait until then :)

View Twitter conversations and metrics using [Topsy]

Kirsty Kitto is a Senior Research Fellow at Queensland University of Technology, and models the ways in which humans interact with complex information environments, paying special attention to the interdependencies between language, attitudes, memory and learning. She is currently partially seconded to the Learning Futures Team, which is devoted to exploring and shaping the future of learning and teaching, at Queensland University of Technology and beyond.

About Author


    • Thanks for the links Dai! No, I do not know if anything has come of this, and the openPDS they talk about there is indeed very much the kind of thing that I think we need. Yes, I agree with, the New Deal is proposing exactly the same thing as my Principle 1, but in a much broader context (which makes it far harder to achieve). The nice thing about working with a subset (i.e. educational data) is that we are much closer to being able to pull back control… if we work quickly. Definitely going to do a bit of sniffing around to see what happened with the New Deal though.

  1. How the future learning is going to take place is not clear now and as research continues to evolve to shape knowledge in the area answers to your question will be very difficult. Just look at the role social media along with its analytic is playing now, just look at how expert refer to the future web, the role smartphones are playing in accessing educational sites, look at the museum and game technologies. I actually don’t think we should jump into conclusion now but I have the notion that in the near future learners will create and own their knowledge.But in terms of educational data it will depend on the kind of learning management system the institution opt for.

Leave A Reply