Whose History Trains the Model
Issue 07ReflectionBlack History Month, American Heart Month

The Monthly Intelligence Report

Whose History Trains the Model

On the slow bias of perspective in the language models now educating Caribbean children, why the Berbice uprising is a paragraph in their training data, and what the Caribbean intellectual tradition has to say about it.

Hector Ramirez-Diaz·February 2025

Note from the President

February in our region carries a particular weight. It is the shortest of months, and it is the one we spend reflecting on the longest of histories. Black History Month invites a kind of attention that is not natural to a busy person, and that is the point. This month, we are asked to slow down enough to see ourselves whole.

I asked Hector Ramirez-Diaz, whose work on Caribbean philosophy and technology has helped me think more clearly than I would have on my own, to write our feature this month. His piece resists the comfortable conclusion. I trust you to sit with it.

A short note on activity from this office. The Charter Working Group has published our draft regional AI Charter for member comment. Comments are open until March 31. Read it at caribbeanaiassociation.com/charter and tell us where we are wrong. February is also American Heart Month, and our Working Group on AI in Health is convening a short series on cardiac AI in the second half of the month. Details with the next mailing.

Adrian Dunkley Founder and President, Caribbean AI Association


Feature

Whose History Trains the Model

By Hector Ramirez-Diaz

There is a question I have begun asking the students in my seminar at the University of Havana, where I have been a visiting lecturer this academic year. The question is simple, and I confess I do not yet know what to do with most of the answers. The question is this. When a language model learns to speak, whose voices does it learn to speak with, and whose voices does it learn to speak about, and what is the relationship between the two.

This is not, in the way I am asking it, a technical question. The technical answer is well known. Large language models are trained on enormous corpora of text drawn predominantly from the public web, supplemented by digitized books, journals, and licensed datasets. The web is overwhelmingly English. The books are overwhelmingly published in the global north. The journals are overwhelmingly written by faculty in the same universities that have shaped the world's intellectual life for two centuries. The model learns to speak in the voice that wrote the texts. That voice, when we are honest, is the voice of a small and historically specific community.

For the Caribbean in Black History Month, this is not an abstract observation. It is a continuation, in a new medium, of a problem we know intimately. The libraries that record us have rarely been ours. The texts that describe our history have rarely been written by us. The languages we have made, against the grain of every condition imposed on us, have rarely been treated by the rest of the world as proper languages. When a Caribbean child today asks a language model about the Berbice slave uprising of 1763, the model will answer. It will answer with what is in its training data. What is in its training data is, in the main, what the colonial archive recorded and what historians outside our region later reconstructed. The voice the child hears, when the machine speaks, is not the voice of Cuffy. It is the voice that wrote down what Cuffy did, and the voice that wrote down what Cuffy did was not Cuffy's.

I do not say this to indict the technology. I say it because the technology is now in the room with our children, and the question of whose voice it carries is therefore a question we have to think about with the seriousness we would bring to any other instrument of education.

A great deal of the contemporary discussion about bias in AI focuses on its harms in immediate, practical settings. The hiring algorithm that under-rates Black candidates. The facial recognition system that misidentifies darker skinned women. The medical model that has not been validated on patients who look like us. Each of these is real, and the Association's work on each is serious. But there is a slower bias that I find more difficult to name, and that I think Black History Month is the right moment to name it.

It is the bias of perspective. The slow shaping of what a generation considers ordinary, by repeated exposure to the assumptions, the references, the rhythms, and the silences of a particular intellectual tradition. The Caribbean has spent four hundred years pushing back against that bias when it came through the church, through the schoolbook, through the British Broadcasting Corporation, through the academic journal, and through the international newspaper of record. The pushback was rarely glamorous. It happened in the kitchens of grandmothers, in the cane fields, in the small magazines that nobody read outside our region, in the calypso, in the patois that would not be ground down. We did the work, and we kept ourselves.

The arrival of the language model is the same situation in a new key. The model does not declare its perspective. It speaks in an open, friendly, accommodating tone, and it offers itself as the answer to whatever you ask. The friendliness is not a moral failure of the technology. The friendliness is the surface. Underneath the surface is a particular weighting of which sources were read and which were not, which questions were taken seriously and which were assumed already answered, which dialects of which language were treated as standard and which were treated as colour. The Caribbean intellectual tradition has long understood the difference between what is said and what is assumed in the saying. We will need that understanding intact in this season.

I want to be specific about what this means in practice.

The Haitian Revolution is, in the language models I have tested, treated as a sub-topic of the French Revolution. The Berbice uprising is, in most of them, a paragraph appended to a section on Caribbean slavery in general. The figure of Toussaint Louverture appears more often as a footnote to Napoleon than as a sovereign actor in his own history. The intellectual lineage from Aime Cesaire through Edouard Glissant through Sylvia Wynter through David Scott is, in most of the major models, thin where it appears at all. The names show up. The arguments do not. The thinking the names represent has not made the leap from the journals to the training data, because the journals it lived in are not the journals these models were trained on.

Some of this will be repaired. The Brazilian Institute of Pure and Applied Mathematics, the African Institute for Mathematical Sciences, and a handful of African and Latin American digitization projects have begun publishing materials that the next generation of models will absorb. The University of the West Indies has, slowly, increased the share of its journals available in machine readable form. CAIRA's own Library Initiative, which has been working since October to digitize Caribbean intellectual material under appropriate licence, will contribute. The arc of repair is real.

But repair is not the same as origination. There is a deeper question, which is whose hands shape the model in the first place. Today, the answer remains a small number of laboratories in three countries, staffed predominantly by people whose intellectual upbringing did not include the Caribbean tradition. There are exceptions, and the exceptions deserve our support. But the structural answer is what it is.

What follows from this, for us in February of 2025.

A Caribbean child should not learn the Berbice uprising from a chatbot. They should learn it from a teacher, from a book, from a grandmother if they have one who knows the story, from a museum if their country has built one, and from the chatbot only as a fifth source after the first four. This is not a Luddite position. It is the same position responsible educators have always taken on encyclopaedias, on newspapers, and on the internet. Trust the source. Triangulate. Insist on plurality.

A Caribbean researcher should be alert to the way these tools will, by their nature, surface the established narrative more readily than the contested one. The work of the historian, the literary critic, the political scientist, is still the work it was. The tool changes the speed. It does not change the responsibility.

A Caribbean policymaker should fund the libraries, the archives, the museums, and the journals that hold our intellectual life. Public investment in those institutions is now also AI investment, because what is preserved and published is what will, in a decade, train the models that explain us to our grandchildren.

And every Caribbean reader, in this month, should remember that the question of whose voice is in the room has never had a quiet answer in our part of the world. It is a question our parents lived. It is a question their parents lived. It is a question our writers have made their work for a hundred years. It will be a question of this generation as well, in a new instrument and in a new tongue. The instrument is more friendly than the ones that came before. The instrument is still an instrument. We were never the instrument. We were always the people who insisted on being heard.

Black History Month is the month for remembering that. The remembering is the work.


Hector Ramirez-Diaz writes on Caribbean philosophy, technology, and cultural studies. He is a visiting lecturer at the University of Havana and serves on the CAIRA Editorial Board.

Originally published in The Monthly Intelligence Report, February 2025.

Read every issue of The Monthly Intelligence Report

One feature, one President's note, every month. Written by the CAIRA contributor bench from across the Caribbean and the diaspora.