The Duty to Data Portability

Giovanni Magoga

January 27, 2024

Filed under “ideas”

TLDR: The public cannot reach consensus on policy despite gains in the information asymmetry scale due to the lack of acknowledgment of our psychological and moral differences. These latent traits can be inferred from digital trace data, made available via data portability rights. LLMs can bridge the communication gaps by crafting unanimous policy prescriptions tailored to these traits. Open source, confidentiality, and the rule of law can be leveraged to prevent abuse of this system.

I just finished reading The Revolt of the Public by Martin Gurri which is a great analysis of the current sociopolitical climate in the West. I’m a software engineer in France and—given the self-immolation of Germany’s energy sector and Putin knocking on the door—I’ve started thinking in practical terms about the underlying information dynamics to help Europe get its act together.¹

Gurri argues that the public’s revolt from the Arab Spring and Occupy Wall Street, to Brexit and Trump is not just a rebellion against specific policies or leaders, but a fundamental challenge to the top-down, hierarchical structures of the 20th century. This rebellion has been fueled by the explosion of digital information, which has democratized knowledge and destabilized traditional power structures but has struggled to propose cohesive and progressive policies in all its major instances.

A major factor in this lack of cohesion is the increasing polarization of the politically involved public, leading to artificial disagreements on prescriptions How descriptive consensus is reached in the first place is another problem. There are a few promising examples of how to solve this, such as Wikipedia’s collaboration model and Twitter community notes. that are fundamentally super-partisan and would benefit everyone according to the shared set of Western values. Utilitarianism, individual freedom, justice, equality, etc.


Computing center in charge of Интенсификация 90, Gorbachev’s plan to fix the communist economy. Source: Traumazone by Adam Curtis.

From a purely informational standpoint, the relationship between public and government can be better understood through the principal-agent model, where the principal (public) delegates to the agent (government) the authority to calculate the optimal policy on its behalf.

This model is generally used to illustrate the problem where the agent may pursue its own interests rather than those of the principals, resulting in policies that don’t necessarily reflect the public’s needs or desires. For the sake of argument, we’ll consider an idealized version of the principal that only has “good” self-interest: to have effective policy enacted for itself. The self-interest of the agent in this case is just the bias toward reactivity rather than proactivity, which is characteristic of rigid hierarchies.

principal-agent

The principal-agent relationship is necessary for effective governance due to the greater ability of the agent in handling complexity, leveraging specialized skills, improving efficiency, managing risks and establishing clear accountability structures. However, with the advent of the Internet, the public is proving equally capable of achieving some of these same competencies.

Throughout history, the asymmetry of information has always been tilted in favor of the agent, with a few notable events like the printing press, which was at the core of the success of the Protestant Reformation, progressively challenging this imbalance.


Don’t quote me on this.

This progression has been captured by a tradition of thought, from Adam Smith through Friedrich Hayek and Yochai Benkler, The Wealth of Nations, The Use of Knowledge in Society, The Wealth of Networks. culminating in an ideal of perfect information where all principals share the same comprehensive state of the world, drastically reducing the purview of the agent, much like in a blockchain. There are a few major caveats to consider in this analogy, as it’s only relevant to knowledge work and the politically active public: 1. States retain the monopoly on violence, which hardly maps 1-to-1 with blockchain consensus mechanisms. 2. Not all node operators in a blockchain are necessarily maintainers of the code, and not all citizens necessarily want to be involved in policymaking.

What the chart above doesn’t tell us is how effectively this information is processed and acted upon. In contrast to a centralized agent, which operates with a streamlined hierarchical structure, the democratic principal comprises numerous independent actors, each driven by their unique self-interests and complex psychologies.

Consent

Maybe the Chinese And Plato and Osho. are right in arguing that this is an unrealistically hard problem to solve, and we should just rely on enlightened leaders. However, given recent advances in the understanding of human psychology, we should be optimistic about “decentralized enlightenment” that starts with a better understanding of the individual.

In The Righteous Mind, Jonathan Haidt suggests that our polarization is influenced by various psychological foundations, such as care/harm, fairness/cheating, loyalty/betrayal, authority/subversion, and sanctity/degradation, bucketing the population into groups that are more influenced by one or the other.

Some individuals lean towards the left, emphasizing care and fairness, while others lean towards the right, valuing loyalty, authority, and sanctity. Then there are those in which the moral foundations balance themselves out, so that they’re less likely to adhere to a specific ideology.

While the exhaustiveness and exclusiveness of these dimensions is an area of active research,² the core of the polarization problem is that, for many people, the instinct of identifying themselves within a group sharing the same moral foundations is stronger than the instinct for broader inter-group coordination. This is possibly due to genetic roots that transmit predispositions from previous generations in a world of low information availability.³

MFT

What Haidt tells us is not that one side is generally right and one side is generally wrong. We should still uphold pluralism since all MFT traits are “positive” adaptations,⁴ but we now have access to so much empirical data to guide our decision-making that we do not need to hinge on our basic moral intuitions as much as we needed to in the past.⁵

Take the current internal security situation in Sweden. Over-indexing on the care and fairness foundations has led to a very welcoming and ambitious immigration policy, whose mismanagement has led to a surge of immigrant gang violence and a drastic increase in the rate of gun crime deaths.⁶⁷ As a counterexample, over-indexing of authority, in-group, and purity laid the foundations of Japan’s immigration policy, which has proven inadequate in face of the country’s aging workforce crisis and consequent economic decline.⁸


Moralist politics as a gradient descent. The surface represents the ground truth and is being navigated by government to reach its bottom, the optimal policy. Source: Dall-e

Obviously, for most people, it’s hard to give up on these strongly held beliefs. However, there still is an opportunity to leverage MFT traits to show how a contentious but positive policy (in utilitarian terms) ends up satisfying both sides, showing that there is more agreement among people who hold different beliefs than not.

In our quest for a practical implementation of this solution, how can we discover MFT traits at scale?

The Great Data Powers Redistribution

As revolutionary as cookie banners are, The democratization of cookie clicker! there’s another very important piece of the GDPR that has gotten little love over the years: the right to data portability.

The data subject shall have the right to receive the personal data concerning him or her, which he or she has provided to a controller, in a structured, commonly used and machine-readable format and have the right to transmit those data to another controller without hindrance from the controller to which the personal data have been provided⁹

Of course the article Coming soon to a jurisdiction near you. doesn’t provide any specific guidance on the implementation and, given the current absence of incentive structures, none of this exists with interoperable standards and any resemblance of a user experience.

That said, for the first time, we now have a legal right to demand the most comprehensive digital representations of our identities, from personal Google search histories to Instagram messages and Outlook emails. All this data, collected passively over the years, is the best proxy to pinpoint the underlying state of the principals.

Research has shown that in this data we manifest the most comprehensive phenotype of our personalities. The most recent models are able to predict with reasonable accuracy many of our private characteristics, from more immediately observable attributes (like interests and habits) to latent invariant traits (like OCEAN and 16FP, including our moral foundations).¹⁰¹¹

Large Language de-Moralists

Possibly due to the lexical hypothesis,¹² LLMs have proven capable of understanding and integrating a spectrum of psychological and moral alignments as well as generating explanations and arguments that resonate more profoundly with one end or the other.¹³¹⁴


Eager to witness these strange outcomes. Source

Imagine a system where all the principals are proposed the same set of utilitarian policy decisions but with rationalizations tailored to their unique moral traits. Just as targeted ads are more effective because they are based on the consumer’s own interests and online behavior, policy explanations so generated are likely to be less divisive, enabling super-partisan consensus on the ground truth:


An example of GPT4 steering an argument towards a specific set of moral foundations.

In more practical terms, when it’s about morality, anything short of the biggest and most resource-intensive model would be inadequate to the task. Such a system will necessarily end up being centralized if we want it feasible, bringing up once again the problem of the self-interest of whoever controls the training of the model and the inference infrastructure.

Accountable government

One of the three pillars of modern government according to Fukuyama, along with state capacity and rule of law.

Drawing another analogy from advertising, the primary concern is whether a malicious actor can disrupt the market by biasing either the delivery or targeting algorithms of the ad exchange to implement some kind of agenda. In our scenario these two directly map to biases in the LLM and the integrity of the outputs. Assuming the supply of inventory is not compromised.

This is where open-source and confidentiality become fundamental. An LLM whose code integrity is attestable can be deployed inside a Confidential VM, Only generally available on Azure since December ‘23¹⁵ so that each principal can verify that the code running inside the remote machine is loading exactly that LLM, the code is fairly matching the user with tailored message outputs, and the data in transit is client-side encrypted.

Yet, even with these integrity constraints in place, a system like this poses arguably a bigger psychological threat if anything goes wrong, as one single actor can use this unprecedentedly powerful representation of identity for surveillance and coercion, also known as “the chilling effect.” This not only compromises personal freedom but also hinders creativity, innovation, and the free flow of ideas, which are crucial for a thriving and progressive society.


Meta’s chilling Luleå Data Center

At present this isn’t as much of a concern: despite Big Tech’s attempts to monopolize online activities, their datasets remain distinct: Google has your search intentions, Facebook your communications, and Amazon your shopping habits, and these are shared with each other only partially in order to preserve the respective competitive edges. While third-party data providers like Nielsen and LiveRamp do have insights into broader aspects of our lives, such as family makeup and income, no single organization has both a broad and deep understanding of our digital selves.

Confidential environments can be rearranged to reflect the same type of separation, also potentially incorporating fancier techniques like federated learning and differential privacy, but the fundamental legal aspect remains. Our society is permeated in a strong—at times too much so—and reasonably decentralized judicial system which is ready to spring into action where technical guardrails fall short, and to provide a fundamental deterrent to their deliberate breach by the controlling entities.

Privacy

I’m not advocating for ignoring privacy concerns, but I suspect that the polarization on privacy postures might reflect a form of group behavior as described by Haidt. Such extremes can lead to consequences like the ICMR and Equifax breaches on one side, and Norway’s ban on targeted ads¹⁷ or the media outrage over Cambridge Analytica Whose effectiveness was oversold by its employees¹⁸ and overblown by the journalists. on the other.

I want to conclude with the etymology of privacy¹⁹:

From the Latin privatus: set apart from what is public, personal and belonging to oneself, and not to the state.

Despite the resemblance of an electoral process in the Roman Republic, the state back then was hardly about collaborative governance, with the majority of the population excluded from the democratic mechanisms. Given these historical differences, it seems appropriate to reconsider the applications of privacy in the modern context: As with the widespread frustration with poor targeting in online ads, dating apps, and recruitment emails, a democratic state that doesn’t know much about its constituents will inevitably fail to engage them and will instead alienate them.

Privacy utility curve

The image above is usually found in differential privacy contexts, yet it generalizes to how we should perceive our broader relationship with personal data. The further we go backwards from the blue dot, the further we go backward on the principal-agent sigmoid shown before, indirectly justifying the centralization of power.

Wrapping up

(See TLDR above)

I’ve often found myself questioning my life path from my cushy roles across tech startups and FAANG companies, and I’ve found this same sentiment in many of my early-career peers, who find themselves in the Search for Meaning outside of their jobs.

This essay was an attempt to directly search my professional experience for opportunities to advance the society of Homo Informaticus. In summary, I believe the new personal data legislation and the latest innovations in privacy tech, psychometrics and NLP offer an unprecedented launch ramp for a revolution in collective coordination.

I’ve started a little open-source project to take a stab at this. Shoot me an email if you’re interested :)

It is the engineers who make true democracy possible

https://www.axios.com/2024/01/17/alex-karp-davos-ai-us-advantage ↩
https://psycnet.apa.org/record/2020-82672-001 ↩
https://pubmed.ncbi.nlm.nih.gov/33420605/ ↩
https://www.sciencedirect.com/science/article/abs/pii/B9780124072367000024 ↩
https://people.brandeis.edu/~teuber/Singer_Ethics_and_Intuitions.pdf ↩
https://link.springer.com/article/10.1007/s12115-019-00436-8 ↩
https://ourworldindata.org/grapher/homicide-rates-from-firearms?tab=chart&country=~SWE ↩
https://books.google.com/books?id=izlpBQAAQBAJ ↩
https://gdpr-info.eu/art-20-gdpr/ ↩
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7395458/ ↩
https://www.sciencedirect.com/science/article/pii/S1877050917320537 ↩
https://en.wikipedia.org/wiki/Lexical_hypothesis ↩
https://arxiv.org/abs/2209.12106 ↩
https://r2hcai.github.io/AAAI-23/files/CameraReadys/49.pdf ↩
https://techcommunity.microsoft.com/t5/azure-confidential-computing/new-innovations-in-confidential-computing-from-azure-at-ignite/ba-p/3982146 ↩
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2726905 ↩
https://www.politico.eu/article/facebook-instagram-norway-ban-track-users-ads/ ↩
https://ico.org.uk/media/action-weve-taken/2618383/20201002_ico-o-ed-l-rtl-0181_to-julian-knight-mp.pdf ↩
https://www.etymonline.com/word/privacy ↩