The Responsible Service of Data

If you’re a resident of Australia, you will be familiar with “Robodebt”. Robodebt was a scheme, put in place by the previous government, to identify and recover overpayments of welfare. It was later scrapped after growing controversy and thousands of incorrect debt recovery actions. The impact on people caught up in the scheme was profound, with associated claims of self-harm and suicide. Parts of the scheme were later deemed unlawful by a federal court.

A Royal Commission (public enquiry) into the scheme was announced in August 2022, which put it back into the news. Listening to the ensuing avalanche of old tropes and scapegoats, you’d have been forgiven for thinking you’d gone back to the 1990’s. We thought the days of blaming I.T. for poorly conceived and executed solutions were over, but clearly not (the discussion in an episode of ABC’s The Drum was a personal favourite[1]).

Robodebt will remain a great case study for years to come – its generosity knows no bounds. With so many threads that you could pull apart, it is a great illustration for some themes that cry out for inspection through an ethical lens[2].

Data Ethics Starts at the Beginning

Robodebt was fundamentally a data project. At its simplest, it worked like this:

- The Department of Human Services (DHS) calculates and issues welfare payments. The calculation is adjusted to account for any income declared by the recipient for the period in question.
- Then DHS uses income data from the Australian Taxation Office (ATO) for a retrospective verification of welfare entitlements. It does this by re-calculating the payment using the ATO income data in place of the recipient-declared income.
- If a discrepancy is found, revealing a DHS overpayment, then an automatic debt recovery process is triggered. Recipients must then pay the debt or prove their legitimate entitlement. If anyone has information about how they handled underpayments, we’d love to know.

What you focus on is shaped by your values and your values shape your focus. The focus of this project was clearly on transactional outcomes and not relationships. A transactional focus, with no emphasis on relationships, erodes trust in both directions. Worse, the scheme appears from the outside to be premised on two assumptions:

- A significant number of recipients (enough to cost-justify this project) understate their income to receive benefits they’re not entitled to.
- On that basis, it is reasonable to hand them the burden of proof to defend themselves against Robodebt recovery actions.

Thinking through these assumptions at the start should have triggered a serious look at:

- what is missing or incorrect about this narrative (people sometimes make mistakes, and so might Robodebt);
- the power imbalance between the project owner and subjects of the scheme; and
- the potential impact of these on trust in the community (with trust in government diminishing, this is more pressing than ever[3]).

Ethical goal setting isn’t antithetical to innovation or efficiency. It looks beyond the projects goal to the beneficence and capability improvement of the solution, key to sustainable innovation. The goal of this project to minimise ineligible spending is not in itself controversial (though you could argue that there are richer veins to mine). The primary problem was its design and execution, driven by un-checked assumptions about the problem. Secondly, it put a Band-Aid on the problem, clawing back overpayments after the fact, rather than improving the process.

Beware the Temptation of “Low Hanging Data”

The signifiers of human existence are messy. Despite the potential benefits of new technologies, we must be alert to the dangers of low-hanging fruit. That is to say, data that is chosen primarily for its accessibility (versus accuracy and representational qualities).

There are many challenges with linking data across systems, including discrepancies with timing and granularity. Possibly to get around such challenges, Robodebt went for the low-hanging data. Rather than use actual income data from the ATO for the period in question, it used a calculated average: a mechanism called “income averaging” to calculate an “apportioned fortnightly income” (see pages 4 and 5 of Deanna Amato v The Commonwealth of Australia court order). This average was used to determine whether they had made an overpayment for the period and instigate debt recovery actions[4], and, like any average, it was a poor substitute for the real number.

It must be assumed that actual income information for the period in question was not easily available, so they used what they could get and made it fit. Ethically a scheme with such major real-world impacts should use real-world (validated) information.

Plan to Fail

Beyond the design decisions that made Robodebt problematic, the biggest failure was the apparent lack of a plan for addressing failure itself. Without being fatalistic, it behoves us to approach every project with humility and assume that even our sophisticated risk management tools will not identify every risk. Planning to fail acknowledges that things can, and likely will, go wrong. This is the difference between a correctable mistake and abject disaster in the event.

You get to your destination faster if the path in front is already marked and paved. Not to be confused with a risk management plan, a plan for failure paves the path for response to failure by pre-defining the project’s tolerances, responsibilities and communications. To detect and address unintended outcomes, regular monitoring and verification of results (especially if you’re using estimated values coupled with an automated process) is essential. So is a common understanding of what constitutes a “significant” issue and how it will be recognised as well as showing the community, as major stakeholders, how they can engage in the process. For example – affected citizens spent hours on the Centrelink helpline – there should have been a hotline and/or website for reporting problems with the scheme given its potential impact and the vulnerability of much of the community.

Don’t forget the humans behind the data

There is a story we sometime use in our workshops about a U.S. study conducted by Professor Bar-Yam of the New England Complex Systems Institute. This study used sentiment analysis on geo-located Tweets to create a sentiment map of Manhattan. The “saddest spot in Manhattan”, according to this study, turned out to be an elite public school in the city. The finding was picked up and reported by “local” papers, including the New York Times. Staff and students at the school were distressed and confused by this, and worried it would affect the school’s reputation. When the Professor took a closer look at the data he

“realized that he had incorrectly interpreted a data map” and that “Closer analysis revealed that the posts had actually come from a single Twitter account ‘from a region just south of the school’ ”, according to a NYT follow-up article.

It is unlikely that this is the type of interest Prof. Bar-Yam or the Institute wished to generate from his study (the Lure of Low hanging data again). When creating new “datafied” assets it is important to remember that this is a real-world proxy. Like Robodebt, this case study offers lessons at many levels. For now, we want to focus on the need for designing data systems that build in respect for and feedback from the people represented in the data.

Create a two-way street

As Jer Thorpe writes in his own discussion of this case (emphasis added):

It’s a world that flows in one direction: data comes from us, but it rarely returns to us. The systems that we’ve created are designed to be unidirectional: data is gathered from people, it’s processed by an assembly line of algorithmic machinery, and spit out to an audience of different people […]. This new data reality is from us, but it isn’t for us.

How can we build new data systems that start as two-way streets, and consider the individuals from whom the data comes as first-class citizens?

By inviting the faculty and students to review the pre-published data as subject matter experts, we would shift from a perception of them as mere data sources to full data partners. This opens the opportunity to catch data errors before publication and gain insights that may have otherwise been missed. Most importantly this two-way street empowers the “subjects” to have a voice on the data’s legitimate and beneficial use.

We need to be more deliberate in our efforts to ensure that people and communities are appropriately represented in the data we work with. Imagining the pathway from person to proxy as a two-way street would go a long way to building more representative – and thereby better quality – data. Returning to the community earlier in the process would elicit richer understandings about any data collected. Involving them in the analysis of the data in some way also offers an opportunity to enhance their data literacies, empowering them to speak about data that represents them. At the very least, better documentation of context (for instance in the descriptive metadata associated with a data set) could disambiguate the data and support community feedback.

Working to create such two-way streets in our data work would make sure that the community from whom data was taken also benefit from the process. A good example of this approach is the MobileMe study[5]. We looked at mobile phone activity of students to inform school policies on phone use at a point in time when the mobile was an emerging technology. We gave this data back to the students, and trained them in research methods, so that they could tell us what they thought the data revealed (with the added benefit of building their data literacy skills). The importance of a two-way street is all the more acute when dealing with young people, who have precious little power and influence over policy decisions that affect them.

“I just sell the weapons”

Many of us agree that if you market high power weapons to civilians in an irresponsible manner you share some blame for a negative outcome. By the same token, if you’re in the business of publishing or sharing data, we don’t believe that your responsibility ends when the data leaves your servers.

Publicly available data is regularly used to generate what we call “clickable factoids” for attention-grabbing headlines and to promote clickthrough’s. The High School Twitter story is a typical example of this. Of course, like weapons, we can’t control what someone does with our data when it leaves our servers, but we can control who we share it with and how we frame its ethical use. It is our responsibility to think about how our data could be used and how this could impact specific groups. It is important to know our ethical red lines and communicate them to the downstream users of our data products. For example, data publishers could have a clause about “frivolous use” in their terms of use, and a definition of what they consider “beneficial use” (note that Twitter does talk in a limited way about data use).

By the way, thanks for the metaphor Cathy O’Neil “Weapons of Math Destruction” if you haven’t read it, stop reading this article and go get a copy now.

The “responsible serving of data”

In Australia, all employees who serve alcohol must have a Responsible Serving of Alcohol certificate. What would an RSA for data look like?

Data has a social life that influences the way it is made and used. How it is shaped by researchers and analysts will reveal specific aspects of the critical social networks the data inhabits. In the words of Brown and Duguid[6]

“…to participate in that shaping and not merely to be shaped requires understanding such social organization, not just counting on (or counting up) information.”

Prof. Bar-Yam would have surfaced and ranked many groups in his analysis, based on sentiment scoring. The press chose to name one data point (a high school) as the “saddest place” in their re-telling of the study (the “clickable factoid”). In this case the data point turned out to be incorrect. What responsibility would Bar-Yam and the Institute, alongside the press, have had if the data had in fact been “correct”? By extension, what was Twitter’s responsibility as the original supplier of the data?

The responsible use of data involves doing more than simply scratching the surface on the data you are working with. Data’s value—and its power – is more than skin deep. It means not just paying attention to regulated uses of the data, but appreciating that data’s meaning – and power – is shaped by its social life. Responsible use also involves thinking beyond what you can do, and instead focusing on what you OUGHT to do.

Data Ethics is more than…

There are many dimensions to the ethical use of data. More often than not, discussions of data ethics pivot around data privacy and security, and specific applications such as Artificial Intelligence (AI) and automation. Yes, there are important ethical considerations in each of these areas, but, to be a responsible server of data, we should be thinking of a multitude of ethical considerations from the moment the data comes into our orbit. Doing so will make us all better data practitioners, generating more useful and reliable data products, ethically.

We need to find ways to engage productively, both individually and collectively, with risk and adversity. We need to ensure oversight so outcomes can be assured of serving all sectors of our communities — particularly the most vulnerable. We need a culture of care.

This is the first in a number of articles we intend to publish over the coming months on the subject of data ethics. It dips a toe in the water but leaves many issues untouched. We hope to dig into these in some detail in future discussions.

What do you think it takes to be a responsible server of data?

About Us

We bring a mixed disciplinary approach to our work on the ethical use of data. We have worked in Information Technology and Data Science disciplines for 25+ years. Both focussing on different aspects of ethical practice in Data Science and AI for the past decade. The strong socio-technical components of Theresa’s research in ethical data practices and Ruth’s experience in data science and experiential learning design round out our collaborative practice. We both have experience in the development of reference and actionable frameworks at national or international levels and are currently involved in the development of international standards for Data Usage and Trustworthiness (ISO JTC1/SC32/WG6 & JTC1/WG13).

Please leave a comment or reach out to us on LinkedIn:

Ruth Marshall – Director, Ethical and Privacy-Preserving Analytics here at Hocone Pty Ltd

linkedin.com/in/ruthemarshall

Dr. Theresa Anderson – Director and Social Informaticist at Connecting Stones

linkedin.com/in/theresadanderson

[1] ABC’s The Drum August 25th 2022 (39:49). Blaming the developers seems especially clumsy following Shane Wright’s analysis just beforehand “…this was always about dollars and the people involved in the welfare system were ignored completely…”

[2] This is an outsiders perspective. We have no inside information on Robdebt and other high profile cases discussed in this article.

[3] “While Australia saw a record level increase in public trust in institutions during the pandemic, this ‘trust bubble’ has since burst, with societal trust in business dropping by 7.9% and trust in government declining by 14.8% from 2020-21.” CSIRO Future Report: Our Future World – CSIRO

[4] This part of the scheme was later deemed illegal by a federal court.

[5] an ARC-funded project conducted in partnership with the NSW Commission for Children and Young People (2006-2011), using child-centred participatory research practices, the project brought together young people, educators, researchers and industry players into productive dialogue to explore the benefits and dangers of mobile phones in schools. See the following link for a little more discussion: http://informationr.net/ir/18-1/paper565.html#.Y0TmUHZBzD4

[6] Duguid, P., Brown, J. S. (2000). The social life of information. Boston: Harvard Business School Press.