Meet the computer scientist trying to solve the mystery of the long COVID
So much about the “long COVID,” more officially known as the post-acute sequelae of SARS-CoV-2 infection, or PASC, is still unknown.
Why do some patients, weeks or months after their initial infection, still have symptoms such as “brain fog”, shortness of breath, or heart problems? What types of underlying conditions, or demographic characteristics, are associated with these persistent conditions? And how many people are still suffering?
The National Institutes of Health aims to answer these and other questions by launching an initiative that brings together experts to collect data on thousands of PASC patients.
Announced in February of this year, the PASC Initiative, also known as RECOVER, is a multi-million dollar project to coordinate and support nationwide medical information analysis on “long-haul” COVID-19. In doing so, the agency hopes to improve understanding of how to treat long-term complications of the disease.
Dr Shawn Murphy, Research Information Officer at Mass General Brigham, is one of the leaders of the team serving as the PASC data resource base. During the course of the project, the DRC will facilitate the collection and analysis of standardized data in different cohort studies, while contributing to the design of the studies.
Murphy spoke with IT health news this week on the need to study PASC, the challenges of collecting and standardizing data and its hopes for the future of the project.
Q: Tell me a bit about the PASC Initiative. How did it start and what is the current state?
A: The government has launched a call for candidates. What they wanted to do was set up a pretty complicated system of data feed from hospitals, where they were going to find patients who had this really nasty COVID result.
For whatever reason, the virus leaves many of us with breathing difficulties, heart problems, or neuropsychiatric issues. One of the worst by far is this chronic fatigue syndrome – myalgic encephalomyelitis. You get that brain fog where it’s hard to focus. And of course, with that comes depression. It was a really difficult thing. And these symptoms can appear long – more than 30 days – after COVID. So that’s what we’re trying to figure out.
The starting point is with the patients. The data comes from 20 adult sites, 10 pediatric sites and seven autopsy sites: some people do not survive the syndrome or die from something else.
It is important to make sure that we get enough different diversity in our populations; we have found that COVID affects different types of populations differently. We think it’s the same mechanism – residual inflammation – but are each of these symptoms a different type of inflammation? Or does a person’s health history make general inflammation worse?
It’s something we need to figure out, nut soup.
Q: How do you plan to collect the data?
A: We’re trying to get 20,000 patients, more or less – depending on how many people are needed for the studies – from those 37 sites.
That’s sort of the starting point, and then we collect three different types of data. There is data entered manually by doctors or patients themselves, data from electronic health records and imaging data – doing MRI scans on the living and the deceased.
Q: Can you say more about this manually entered data? Where does it come from and how to extract it?
A: What will happen is that there will be two classes of manually entered data. Providers will have case report forms. They have a tour program. When the patient arrives, he asks him a lot of questions and he fills out the form specifically with the answers. We try to make the question as consistent as possible.
The second class is that of the patients themselves, who are often much more active in entering data. You can ask a patient to write down how they are feeling each day. They are often willing to put in details each day: that they drank less water one day, for example. These things are important.
It’s like a needle in a haystack as far as it’s actually able to help.
Next, we’ll use an app to capture the data.
Everything goes to the core of the data resource: the DRC. That’s what I’m leading, along with Andrea Foulkes, chief biostatistics officer at Massachusetts General Hospital, and Dr. Elizabeth Karlson, director of rheumatic disease epidemiology at Brigham and Women’s Hospital.
If you really want something awesome, get a biostatistician, a computer scientist, and an epidemiologist together. These skill sets combine well to form a cohesive plan.
Q: Are all of your study sites using the same EHR providers?
A: No. The plan is to bring all this filtered data to the DRC, where it can all be made interoperable.
The way we do this is to put it in a data meta-model called i2b2, or Informatics for Integrating Biology and the Bedside – a project that has lasted for over 15 years. This creates a place where the data can all fit together. And then you can query it with web query tools and see what kinds of data you have, and what kinds of patients have what symptoms.
Usually what you need to do is take the data out of the DSE, manipulate it to fit into the i2b2, and transport it to the DRC. And we do all of this without keeping the patient’s name.
Q: I know this is a four year study. What do your timeline goals look like?
A: They are actively trying to recruit their first patient by September 1. It is extremely aggressive. Normally it would take a year before recruiting your first patient.
We meet almost every day. It’s a very aggressive timeline. But that’s the goal, because we have to figure out what we can do for our patients. The longer this lasts, the more disabled patients will be. You can see the impact that something like this is going to have on our entire economy.
As far as we can tell, 10-15% of people who have had COVID show these kinds of symptoms. We are really talking about a huge number of patients who will have this problem.
This is going to have an incredible impact. There’s a lot of angst in going in there and doing something.
This interview has been edited and condensed for clarity.