
10.12-13.22-DigitalTech-Syntegra

-
Interactive transcript
CARTER PRINCE: Thank you. Hi, everyone. I'm Carter Prince. I'll going to tell you a little bit about Syntegra. Here we go. So we're a synthetic data company specifically working in health care, helping make health care data more flexible, private, rapidly accessible, and really just lower friction overall for more advanced use cases.
And I will say our connection to MIT is our co-founder Dr. Michael Lesh is an MIT grad. So why is this an issue? Why is this something that's needed today? And so really in health care, data is an enormous challenge for a couple of different reasons. One, there's a ton of friction in data use, a lot of which is very rightfully so, so there's lots of regulations around privacy.
For example, HIPAA here in the US. GDPR in Europe. Data silos within organizations. And reduced fidelity that comes from bringing that data set together or deidentifying data in traditional means to enable that privacy. There's often a lack of representation in the data that is available. That may just be because the data that's made available is not all the data that's there. It's a very limited set.
But it also may be in certain cases the data is collected in a way that favors certain groups. A classic case that's talked about is if you look at clinical trial data, it's predominantly white men. And the volume of data available is often a limiting factor. So traditionally, things like randomized clinical trials, very small patient populations may be sufficient.
But as organizations are starting to use more modern techniques like predictive modeling, something like 50 patients is no longer even close to sufficient. And so the volume of available data becomes a big blocker as well. So synthetic data becomes a new approach to address these issues. And so what is synthetic data?
Well, synthetic data the way we are talking about it at Syntegra is algorithmically generated data where we train on real patient data-- for example, data coming out of hospitals, data coming from claims records from payers, patient registries, et cetera-- and using a transformer-based language model approach. We're able to generate entirely new patient records that maintain all of the statistics of the real data.
And so this allows us to have a level of fidelity that matches the real data. So you can actually use it for research cases, predictive modeling-- really, any of those cases you would use the real data set itself-- while having complete privacy protection. So you're no longer actually working with real patient records.
So for example, things like GDPR where you have complicated audit trails needed for things like the right to be forgotten are no longer required. You can also start to go beyond the original data-- and I'll show an example of that on the next slide-- where, given this generative approach, you can start to generate more patient records, so generate more patients either just to increase the total number as a whole or generate more specific populations, maybe to increase a certain rare disease or generate more of a certain patient group that's underrepresented.
And so like I mentioned, one of those cases that's really interesting is around how we can improve predictive modeling performance through data augmentation. And it was actually mentioned earlier. A big limiting factor in a lot of models that are put into place in health care is there isn't enough data or there isn't a ideal amount of data to actually train these models.
And so this is a case we did with one of our partners where you can see the blue line here on the curve was trained with the initial real data set where we had 50 patients with chronic kidney disease. That was the outcome of interest we were trying to predict for. And as you might guess, it was a pretty bad model. It's just not enough data.
But with our synthetic data engine, we actually generated an additional 1,500 patients of chronic kidney disease and used that as an augmented data set, which is what you see in the green line. So seeing a large increase in the performance of the model, both in its predictive power but actually interestingly in its clinical effectiveness-- or actually, the amount that made sense if you look at it compared to literature that's been published.
And so we're doing work right now and partnering and would love to partner with others across health care. So in the pharmaceutical industry, often working with real world evidence or advanced analytics teams, with health systems that are looking to do more data collaboration, sharing with some of their other collaborators, or even just accelerating internal research or education.
With health tech companies, so early stage companies that are really struggling to access the data they need to actually put their ideas in place. And then data providers as well just looking to expand their reach. Maybe they're overcoming privacy concerns or representation concerns. And so if you have any questions, I'd love to talk during lunch. Feel free to come over in chat or reach out. My information's right here. Thank you very much.
SPEAKER: Thank you, Carter.
-
Interactive transcript
CARTER PRINCE: Thank you. Hi, everyone. I'm Carter Prince. I'll going to tell you a little bit about Syntegra. Here we go. So we're a synthetic data company specifically working in health care, helping make health care data more flexible, private, rapidly accessible, and really just lower friction overall for more advanced use cases.
And I will say our connection to MIT is our co-founder Dr. Michael Lesh is an MIT grad. So why is this an issue? Why is this something that's needed today? And so really in health care, data is an enormous challenge for a couple of different reasons. One, there's a ton of friction in data use, a lot of which is very rightfully so, so there's lots of regulations around privacy.
For example, HIPAA here in the US. GDPR in Europe. Data silos within organizations. And reduced fidelity that comes from bringing that data set together or deidentifying data in traditional means to enable that privacy. There's often a lack of representation in the data that is available. That may just be because the data that's made available is not all the data that's there. It's a very limited set.
But it also may be in certain cases the data is collected in a way that favors certain groups. A classic case that's talked about is if you look at clinical trial data, it's predominantly white men. And the volume of data available is often a limiting factor. So traditionally, things like randomized clinical trials, very small patient populations may be sufficient.
But as organizations are starting to use more modern techniques like predictive modeling, something like 50 patients is no longer even close to sufficient. And so the volume of available data becomes a big blocker as well. So synthetic data becomes a new approach to address these issues. And so what is synthetic data?
Well, synthetic data the way we are talking about it at Syntegra is algorithmically generated data where we train on real patient data-- for example, data coming out of hospitals, data coming from claims records from payers, patient registries, et cetera-- and using a transformer-based language model approach. We're able to generate entirely new patient records that maintain all of the statistics of the real data.
And so this allows us to have a level of fidelity that matches the real data. So you can actually use it for research cases, predictive modeling-- really, any of those cases you would use the real data set itself-- while having complete privacy protection. So you're no longer actually working with real patient records.
So for example, things like GDPR where you have complicated audit trails needed for things like the right to be forgotten are no longer required. You can also start to go beyond the original data-- and I'll show an example of that on the next slide-- where, given this generative approach, you can start to generate more patient records, so generate more patients either just to increase the total number as a whole or generate more specific populations, maybe to increase a certain rare disease or generate more of a certain patient group that's underrepresented.
And so like I mentioned, one of those cases that's really interesting is around how we can improve predictive modeling performance through data augmentation. And it was actually mentioned earlier. A big limiting factor in a lot of models that are put into place in health care is there isn't enough data or there isn't a ideal amount of data to actually train these models.
And so this is a case we did with one of our partners where you can see the blue line here on the curve was trained with the initial real data set where we had 50 patients with chronic kidney disease. That was the outcome of interest we were trying to predict for. And as you might guess, it was a pretty bad model. It's just not enough data.
But with our synthetic data engine, we actually generated an additional 1,500 patients of chronic kidney disease and used that as an augmented data set, which is what you see in the green line. So seeing a large increase in the performance of the model, both in its predictive power but actually interestingly in its clinical effectiveness-- or actually, the amount that made sense if you look at it compared to literature that's been published.
And so we're doing work right now and partnering and would love to partner with others across health care. So in the pharmaceutical industry, often working with real world evidence or advanced analytics teams, with health systems that are looking to do more data collaboration, sharing with some of their other collaborators, or even just accelerating internal research or education.
With health tech companies, so early stage companies that are really struggling to access the data they need to actually put their ideas in place. And then data providers as well just looking to expand their reach. Maybe they're overcoming privacy concerns or representation concerns. And so if you have any questions, I'd love to talk during lunch. Feel free to come over in chat or reach out. My information's right here. Thank you very much.
SPEAKER: Thank you, Carter.