Life2vec is an AI model introduced in the Nature Computational Science paper 'Using Sequences of Life-events to Predict Human Lives'. It is developed to analyze complex sequences of life events, as well as get insights into early mortality and personality traits.

What is the paper about?

In the paper, we represent human lives in a way that shares structural similarity to language. By using a comprehensive registry dataset available in Denmark, we analyze life-events such as health, education, occupation, income, and working hours with day-to-day resolution. Our model, life2vec, predicts diverse outcomes ranging from early mortality to personality nuances and explores potential mechanisms for personalized interventions.

Can life2vec predict people's day of death, age when you die, or anything like that?

No, the life2vec model does not predict the exact day of death. The widely reported 78.8% accuracy refers to a balanced dataset used to predict mortality over the next 4 years for a cohort aged 35-65. The model can identify patterns in life-trajectories to explore early mortality factors, but it does not predict time-of-death.

Can you download or try out the life2vec model?

No. The original model is not publicly available. The model and related data is safely stored at Statistics Denmark - it cannot be accessed via the internet. So far only Denmark-based researchers can request access to the model or data. We are researching secure ways to share the model with the research community.

Many websites claim to have access to the life2vec model, is that true?

No, the original life2vec model is not publicly available for people to try. Fraudulent websites claiming to offer life2vec are not associated with us.

What about discrimination and ethical implications? Could this model or data information be misused?

Yes, there are ethical concerns. Privacy and bias issues must be addressed before applying such models in practice. We hope life2vec sparks discussions on ethical AI use and transparent prediction algorithms, encouraging public and scientific debate.

Is Life2Vec crypto connected to the life2vec related research?

No, life2vec crypto has nothing to do with our research or team. The claims that life2vec crypto funds our research are false.

life2vec - Official Model and Publication Source

Q: If you are not focused on predicting death, what is the aim of the study?

Transformer models (the technology we use) have been developed to find patterns in language. By structuring lives in sequences, we can identify very complicated patterns in life-events (just as in language, where the ordering of words is very important, so is the ordering of events in human lives. In a US context, for example, it matters if you get a job with health care and then get sick, rather than first getting sick without having healthcare). Those patterns and relationships between life-events are encoded in mathematical spaces (called embedding spaces). We learn the structure of those spaces by processing life-sequences.

We represent human lives in a way that shares structural similarity to language, and we exploit this similarity to adapt natural language processing techniques to examine the evolution and predictability of human lives based on detailed event sequences. We do this by drawing on a comprehensive registry dataset, which is available for Denmark across several years, and that includes information about life-events related to health, education, occupation, income, address and working hours, recorded with day-to-day resolution.

Our model, life2vec, allows us to predict diverse outcomes ranging from early mortality to personality nuances. Using methods for interpreting deep learning models, we probe the algorithm to understand the factors that enable our predictions. Our framework allows researchers to discover potential mechanisms that impact life outcomes as well as the associated possibilities for personalized interventions.

No, the life2vec model does not predict the exact day of death. Here’s what the widely reported 78.8% accuracy truly refers to:

We look at a subset of individuals aged between 35 and 65. This is because it is particularly challenging to make survival predictions in this cohort. The vast majority of individuals who pass away are older. And young have extremely low probability of dying.
We split this dataset into two parts:
- Training data: Used to teach the model which correlations are in the data. The training data is the vast majority of the data.
- Test data: We used the test data to understand how well the model is doing.
We now train the model on the training data
In the training data, the model learns from information in the years 2008-2015 to tell the difference between actual life/death outcomes for people in the training data during 2016-2020.
The trained model is then run on the test data (100,000 individuals). Here the model sees the 2008-2015 data and makes a prediction. We then check against actual outcomes if it got it right.

So far so good! There is one final wrinkle. Accuracy is defined as (# correct guesses)/(total guesses). Because our cohort is very young, almost everyone survives (more than 95%).

This means that if we created an algorithm that always predicted “survive”, it would get a very high accuracy (over 95%).
To address the issue, we balance the dataset, equivalent of 50,000 with survive outcome and 50,000 with death outcome. In this balanced dataset a random guess would get 50% accuracy.
When we run our algorithm on that balanced dataset, we get 78.8% accuracy.

Here are some important consequences:

We do not make predictions for everyone in Denmark, only the test data.
We are not predicting how long people will live. Rather we test mortality over the next 4 years for a young cohort of individuals. A key theme of the paper is about the factors that contribute to such early mortality.

No! The dataset and model contain sensitive data and both are safely stored at Statistics Denmark. They cannot be accessed via the internet. Some follow-ups:

There are websites that claim to implement life2vec (e.g. deathcalculator[dot]ai, life2vec[dot]io, life2vecai[dot]com). Those are fraudulent and have nothing to do with us and our work, so be careful.
We are working on ways to share the model with the wider research communities, but as LLMs are known to potentially leak data, we have to do further research before we can do this.
We have not yet studied how our results generalize to other countries/contexts, but are actively investigating this topic.

Transformer models (the technology we use) have been developed to find patterns in language. By structuring lives in sequences, we can identify very complicated patterns in life-events (Just as in language, where the ordering of words is very important, so is the ordering of events in human lives. In a US context, for example, it matters if you get a job with health care and then get sick, rather than first getting sick without having the healthcare). Those patterns and relationships between life-events are encoded in mathematical spaces (called them embedding spaces). We learn the structure of those spaces by processing life-sequences

We know it may sound strange, but we chose the topic of predicting death, because it is a problem so many people have worked on. (For example, due to insurance companies, and so on). That means that we know more about what to expect … and that if we are very good at it, it is in competition with many other algorithms.

The arguments in the paper come in the following order:

First, we show that the algorithm is very good at making diverse predictions (death, personality).
Since it is very good, we know that it is capturing interesting patterns in the data.
That means it makes sense to study the embedding spaces that capture those patterns in the data.

It is this last part that we are really excited about. Working to understand what new things we can learn about human beings, human behavior, and societies based on the structure of the embedding spaces.

Yes. This work should never be used for insurance for example. The whole idea of insurance is based on the idea of sharing risk across many people. If a million people get together, they don’t know who’s going to be sick so everyone can pay a small amount to a large shared pool and the few who are unlucky enough to get seriously ill early on can draw on the pool to get help. Since we don’t know who is going to get sick, it’s a good deal. If we could tell who would get sick ahead of time, it would undermine the whole idea of insurance.

There are many other reasons our algorithm should not be used yet. For example there are many issues related to privacy or biases that need to be worked out before using it in practice.

That said, there are many places where this algorithm could be very helpful when applied (after additional work). It is most clear within healthcare and medicine. Earlier diagnoses could lessen the severity of many diseases.

There are also some areas in between healthcare and insurance where we are less sure. For those areas we need to have a public discussion about the use of such an algorithm. Should we identify people who are predicted to struggle in school to help them? Maybe it’s a good idea, maybe not – honest people could disagree.

What we hope happens, is that our algorithm helps start a discussion about these technologies and how we should use them. Predictions like these are already happening inside large tech companies. There are reasons why Meta, Google, Microsoft, etc, collect so much data about us. But right now those predictions are happening behind closed doors with the intention of predicting (and sometimes manipulating) our behavior. For now, it’s mostly to make us stare longer at our screens or sell products, but that will likely change. But predictions are happening and will likely just become more and more common.

This is why we wanted to create something open and public to bring these topics out of the secret rooms inside billion dollar corporations – to start a discussion around prediction of human behavior. Probably mostly within science to begin with, but hopefully soon in society more generally.

It's important to protect yourself online, especially when it comes to sharing your personal data with predictive algorithms. Here are some essential tips on how to stay safe online:

Look for secure HTTPS connections (the padlock icon in your browser).
Avoid sharing personal, financial, or sensitive information on unverified platforms.
Be cautious of emails or messages claiming to represent any official entity, especially if they ask for personal data or payments.
Enable strong passwords and two-factor authentication for online accounts.

Trusted resources for staying safe online:

Borger.dk - Online safety tips from the Danish Officials (DK).
StaySafeOnline.org - Online safety tips from the US National Cybersecurity Alliance.
NCSC.gov.uk - Online safety tips from the UK National Cyber Security Center
EFF.org - Guides for protecting your online privacy.

By staying informed and cautious, you can protect yourself from potentially fraudulent or unsafe online activities.

No! The life2vec crypto is not connected to our project, and any claims suggesting a connection are false. Similarly, life2vec crypto does not fund any research connected to our project or team. We are not affiliated with life2vec crypto.

The best way to learn more about the life2vec model is to read the published paper. However, you can also explore the following resources:

Check out the thesis^[2] that provides more detailed description of the algorithm and methods.
Check out recent talk about the technical details of the project: "Life trajectories in high-dimensional spaces."
Directly contact authors.

LIFE2VEC

Our Statement

Reliable Sources

University of Copenhagen

Nature Portfolio

Northeastern University

Technical University of Denmark

Frequently Asked Questions

Metrics

References

LIFE2VEC

Our Statement

Reliable Sources

University of Copenhagen

Nature Portfolio

Northeastern University

Technical University of Denmark

Frequently Asked Questions

Abstract, or what is our paper about?

Is your algorithm really able to predict people's day of death, age when you die, or anything like that?

Can you download the software and try this out?

But if you're not obsessed with predicting death, what is the aim of the study as you see it?

But having access to this information could be dangerous. What about discrimination? As via being rejected for a bank loan or insurance. Have you considered these ethical implications?

How can I stay safe online while exploring/trying tools like life2vec?

Is life2vec crypto connected to our project or the team?

Where can I learn more about this research (except for the published paper)?

Metrics

References