The only official webpage (i.e., maintained by the paper's authors) for the life2vec model described in Nature Computational Science. The paper has gone viral, and it is essential to clarify that much of the viral coverage misrepresents the model's capabilities, including claims that the model can predict the time of death, financial status, or health condition. Additionally, many sources falsely claim that the model is publicly available online.
We, the paper's authors, aim to clarify certain aspects of our model and prevent the public from being misled by counterfeit services, such as the so-called AI Doom Calculator, Telecharger, Intelligent Death AI, Death Predictor, etc. In the FAQ below, we try to explain what the paper actually says.
But first, a warning: We are aware of social media accounts and at least one fraudulent website claiming to be associated with the life2vec model. We are not affiliated with these or any other entities that claim to use our technology.
We represent human lives in a way that shares structural similarity to language, and we exploit this similarity to adapt natural language processing techniques to examine the evolution and predictability of human lives based on detailed event sequences. We do this by drawing on a comprehensive registry dataset, which is available for Denmark across several years, and that includes information about life-events related to health, education, occupation, income, address and working hours, recorded with day-to-day resolution.
Our model, life2vec, allows us to predict diverse outcomes ranging from early mortality to personality nuances. Using methods for interpreting deep learning models, we probe the algorithm to understand the factors that enable our predictions. Our framework allows researchers to discover potential mechanisms that impact life outcomes as well as the associated possibilities for personalized interventions.
No! Let us explain. First, let's explain what the number 78.8% accuracy (that has been widely reported) actually means.
So far so good.
Some important consequences.
No! The dataset and model contain sensitive data and both are safely stored at Statistics Denmark. They cannot be accessed via the internet. Some follow-ups:
life2vec
(e.g. deathcalculator[dot]ai, life2vec[dot]io, life2vecai[dot]com).
Those are fraudulent and have nothing to do with us and
our work, so be careful.
Transformer models (the technology we use) have been developed to find patterns in language. By structuring lives in sequences, we can identify very complicated patterns in life-events (Just as in language, where the ordering of words is very important, so is the ordering of events in human lives. In a US context, for example, it matters if you get a job with health care and then get sick, rather than first getting sick without having the healthcare).
Those patterns and relationships between life-events are encoded in mathematical spaces (called them embedding spaces). We learn the structure of those spaces by processing life-sequences
We know it may sound strange, but we chose the topic of predicting death, because it is a problem so many people have worked on. (For example, due to insurance companies, and so on). That means that we know more about what to expect … and that if we are very good at it, it is in competition with many other algorithms.
The arguments in the paper come in the following order:
It is this last part that we are really excited about. Working to understand what new things we can learn about human beings, human behavior, and societies based on the structure of the embedding spaces.
Yes. This work should never be used for insurance for example. The whole idea of insurance is based on the idea of sharing risk across many people. If a million people get together, they don’t know who’s going to be sick so everyone can pay a small amount to a large shared pool and the few who are unlucky enough to get seriously ill early on can draw on the pool to get help. Since we don’t know who is going to get sick, it’s a good deal. If we could tell who would get sick ahead of time, it would undermine the whole idea of insurance.
There are many other reasons our algorithm should not be used yet. For example there are many issues related to privacy or biases that need to be worked out before using it in practice.
That said, there are many places where this algorithm could be very helpful when applied (after additional work). It is most clear within healthcare and medicine. Earlier diagnoses could lessen the severity of many diseases.
There are also some areas in between healthcare and insurance where we are less sure. For those areas we need to have a public discussion about the use of such an algorithm. Should we identify people who are predicted to struggle in school to help them? Maybe it’s a good idea, maybe not – honest people could disagree.
What we hope happens, is that our algorithm helps start a discussion about these technologies and how we should use them. Predictions like these are already happening inside large tech companies. There are reasons why Meta, Google, Microsoft, etc, collect so much data about us. But right now those predictions are happening behind closed doors with the intention of predicting (and sometimes manipulating) our behavior. For now, it’s mostly to make us stare longer at our screens or sell products, but that will likely change. But predictions are happening and will likely just become more and more common.
This is why we wanted to create something open and public to bring these topics out of the secret rooms inside billion dollar corporations – to start a discussion around prediction of human behavior. Probably mostly within science to begin with, but hopefully soon in society more generally.
If you can't find an answer, contact us on GitHub Discussions.
[1] Germans Savcisens, Tina Eliassi-Rad, Lars Kai Hansen, Laust Hvas Mortensen, Lau Lilleholt, Anna Rogers, Ingo Zettler, and Sune Lehmann. ‘Using Sequences of Life-Events to Predict Human Lives’. Nature Computational Science 4, no. 1 (1 January 2024): 43–56. https://doi.org/10.1038/s43588-023-00573-5
[2] Germans Savcisens. “Life Trajectories as Symbolic Language.” PhD Thesis. Technical University of Denmark, 2023.