I was recently reading this article “How Economics lost its soul” by Jostein Hauge and found myself agreeing with the sentiments of all of it. Reading it reminded me of a conversation I had had earlier where I realise Economics is indeed a social science. I have known it is a social science par the definition that literally includes it and more because I have a Bachelors degree in Economics. But until recently did it really hit me that more than the statistical and mathematical parts of it, this subject/field is meant to intersect with and operate with recognition of the human and social conditions. And I don’t know that my program placed that much focus on the more social courses especially as related to the then present day issues. (However, it’s possible that they did and (1) I never knew it because I barely listened or (2) I missed it because my memory is clouded by the trauma of Advanced Microeconomics calculations and also by time—it’s been eight years!)
Similar to the thoughts Jostein shared in his article are my thoughts on data and the ways we approach and handle data especially in the development of AI technology and even policies. I shared a bit about what I mean in my last article. Data, Economics, we are likely to treat these topics with a more analytical or even science lens as opposed to a social one. This is why we have many data professionals with analytical skills but none or little awareness of the social components of data.
While building dashboards and processing complex data are very useful skills especially in today’s world, it is increasingly necessary to have data (social) scientists who can also understand the humanity of the data before them in order to extract relevant inclusive insights that in turn impacts the nature and behavior of the tools they build and policies that rely on them. This is where thick data relevance comes in.
A few things have been written generally about why big data needs thick data. This isn’t one of those. This is more focused on AI development.
Why big data needs thick data
Thick data as defined by WordSpy is “data related to qualitative aspects of human experience and behaviour, particularly when used as context for the analysis of a large data set.” Unlike big data that is quantitative and distinguished by its size, thick data is smaller and meant to give context to patterns and behaviours in big data.
This matters in AI development because the focus has been on acquiring access to large datasets to help AI models and tools perform at ‘optimal’ levels and on a variety of tasks. And even when we talk about biases in these datasets being inherited by algorithms, the requested solution is always getting more inclusive data or reweighting model behaviour based on outputted biases. As mentioned in the Amsterdam example from my last article, the latter was what they did but only ended up switching demographics affected by the algorithm’s biases.
Take for example, public secondary school data from a certain district or local government shows that students who get school lunches also usually are the ones with the better grades. Compared to their peers who didn’t get school lunches. A plausible response based on this would be to maybe remove the stipend paid for this. Because the assumption is that maybe those students who don’t get school lunches are going hungry or aren’t eating well enough so it’s affecting their concentration in class and subsequently their performances. This makes a lot of sense, right? But one major rule of data is that correlation isn’t equals to causation.
What if the real reason why those students whose guardians can afford the stipend for school lunches and are performing better is because for those parents it’s financially not a luxury item. Maybe they come from families where they are not required to work to contribute to daily upkeep because their guardians income is too little to cover stuff. Or maybe their parents are also better educated on learning challenges and able to afford certain healthcare costs like ADHD diagnosis and medication but the others’ parents can’t afford such. What if the reason students not on school lunches are performing poorly isn’t necessarily because of that lunch but because of other surrounding factors which the data doesn’t show?
This is exactly what’s happening with AI data. Algorithms are missing the contexts and only learning the patterns in these datasets that say ‘every time A happens X follows so that must be the spelling of axe’. Resulting in machine learning models that are biased and inaccurate. Sometimes those biases are not even racist or sexist but still harmful. Like a machine learning algorithm researchers at the University of Pittsburgh developed in the 1990s to predict the risk outcomes of pneumonia patients. They trained this model on a database of almost a million patients in 78 hospitals across 23 states. What their algorithm determined at the end of the day was that patients with asthma as an existing condition were less likely to suffer dire consequences from pneumonia compared to patients without asthma. What was missing from that data was the fact that when admitted with pneumonia, asthmatic patients were immediately given urgent care which improved their chances of survival making them seem ‘low risk’ compared to those without asthma. If deployed in medical settings, this algorithm would have prescribed lesser care for asthmatic patients because that was the tool’s recommendation.
A typical AI development process summarily looks like data -> training -> deployment -> adjustments/abandonments after failures/complaints. While there are so many conversations about the ethics of data in AI from how it’s sourced to how representative it is, the focus here is how inadequate handling of it by placing more focus on size over contexts is an issue. Algorithms have no way of deciphering contexts or causality, they follow patterns. This is why it matters that we have data professionals who understand the social aspects of dealing with big data and more than that, that they care.
Technologists and analysts need soul
There’s so much emphasis on gaining STEM skills in order to be ‘useful’ in the digital world. Severally, Reno Omokri, a public Nigerian figure, has made remarks about the ‘uselessness’ of certain degrees—majorly those in the humanities and social sciences. One reason being that these courses and professions can and will be replaced by AI in a few years. I respect that argument but I have and earlier shared my own very strong thoughts on the value of humanities in AI, specifically AI in Africa.
Yes, AI is reshaping industries and redistributing values especially in the workforce. The jobs that they say would get replaced are usually humanities related. I remember seeing one that had translators as the number one (shortly after completing this, I opened LinkedIn and found this series; AI Killed My Job: Translators). But also, I have seen recent lists that referenced mathematical and computer based jobs, a lot of which are also STEM courses including data scientists & analysts.
AI can do some of the tasks expected of a data scientist and at a faster rate and so I know there’s a founder somewhere that’s considering firing their entire data team. But, no matter how much faster AI can clean and sort through data than humans and provide immediate insights, AI lacks critical thinking skills, empathy, and human understanding. Essentially, while AI can handle the numbers, it has no socio-cultural consideration and context. In the coming years I believe these are the skills the data science field would require more than ever—people who can truly humanise data. And these skills can be built by reintroducing the soul factor into analytical courses. By acknowledging the role humanities and social sciences play in building better technologies. And by equally valuing the ‘social’ in the social science fields that are heavily mathematical like data and Economics.