
5.5.22-Efficient-AI-MosaicML

-
Interactive transcript
JULIE CHOI: Good afternoon, everyone. My name is Julie Choi, and I'm a vice president and chief growth officer at MosaicML.
MosaicML is a new startup that launched out of Stealth late last year. We are a pandemic startup. The thing to remember about MosaicML is that we are here to partner with anyone in this room who is on a mission to train best-in-class neural networks with their volumes of data. We do this by accelerating the training [INAUDIBLE] algorithmically, as well as through system optimizations.
Before I get into the methodology and some use cases, I'd like to introduce you to our team, who are really at the core of the MosaicML advantage. Earlier today, we heard a keynote from Professor Michael Carbin. And we are so lucky to have Mike Carbin as our founding advisor, and his advisee, Jonathan Frankle, who was a principal author of Lottery Ticket Hypothesis.
So Professor Carbin and Jonathan, they really give us the scientific and research edge. And we also have CEO and CTO Naveen Rao and Hanlin Tang from Intel, where I worked with them. And so we have this compute optimization part of the team. And it's just an incredible team. Probably about 50% MIT, actually, at this time.
So let's talk about why we started MosaicML. We started the company because it is our belief that best-in-class, world-class machine learning models should not be limited only to five companies. I've been working in machine learning for the past 10 years or so. I started at HP, launching their platform.
And we've seen a lot of progress over the past decade. Indeed, larger models like Megatron coming into being. But it is absolutely untenable for enterprises and corporations to be paying $15 million per training run. And so MosaicML was founded to make best-in-class, world-class machine learning training accessible to all corporations.
So let me talk you through a use case. And maybe that will contextualize how we're doing this. We are working with a pretty decently sized publicly traded corporate in the fintech arena. And they have asked us to help them train an NLP model. And we're at the beginning of their training development cycle. And that's probably the best time for us to start helping.
We're helping them size, like, what type of BERT model they need to train, on which kind of hardware back end. And we're helping them see and estimate what the actual cost and time might be to train their NLP model. And this is the type of partnership that we're here to offer to anyone in this room that's solving an NLP type of problem. As you can see, the benefits to our partnership here is we're speeding up NLP training by roughly 2x. And this translates to less time in expensive GPU clouds, which translates to a significant cost savings.
How do we do this? Well, at MosaicML, our technology at this time is pretty simple. There's two steps. One way is through a bespoke cloud that we are engineering that is built from the ground up just for ML training. From the compute, so the chip, all the way to the algorithm, the MosaicML cloud is optimized. So you can see these multipliers in terms of speed that we can deliver to the person training the neural network.
And the other component is, of course, our speed-ups, our algorithms. As Professor Carbin mentioned earlier, it is very interesting indeed that we can speed up machine learning algorithms algorithmically. And so actually a month ago, we launched our open source library, Composer, that has 25 methods for speeding up vision and speech models by about 2 to 5x.
And today, our ask is, again, for anyone here who has large volumes of data and wants to tap in to the unfair advantage of machine learning, please come stop by. We do believe world-class models should be attainable for anyone. I'm here with my colleague, Niklas Nielsen, who's the head of our product, back there. And we would love to talk to you at our table. Thank you so much.
[APPLAUSE]
-
Interactive transcript
JULIE CHOI: Good afternoon, everyone. My name is Julie Choi, and I'm a vice president and chief growth officer at MosaicML.
MosaicML is a new startup that launched out of Stealth late last year. We are a pandemic startup. The thing to remember about MosaicML is that we are here to partner with anyone in this room who is on a mission to train best-in-class neural networks with their volumes of data. We do this by accelerating the training [INAUDIBLE] algorithmically, as well as through system optimizations.
Before I get into the methodology and some use cases, I'd like to introduce you to our team, who are really at the core of the MosaicML advantage. Earlier today, we heard a keynote from Professor Michael Carbin. And we are so lucky to have Mike Carbin as our founding advisor, and his advisee, Jonathan Frankle, who was a principal author of Lottery Ticket Hypothesis.
So Professor Carbin and Jonathan, they really give us the scientific and research edge. And we also have CEO and CTO Naveen Rao and Hanlin Tang from Intel, where I worked with them. And so we have this compute optimization part of the team. And it's just an incredible team. Probably about 50% MIT, actually, at this time.
So let's talk about why we started MosaicML. We started the company because it is our belief that best-in-class, world-class machine learning models should not be limited only to five companies. I've been working in machine learning for the past 10 years or so. I started at HP, launching their platform.
And we've seen a lot of progress over the past decade. Indeed, larger models like Megatron coming into being. But it is absolutely untenable for enterprises and corporations to be paying $15 million per training run. And so MosaicML was founded to make best-in-class, world-class machine learning training accessible to all corporations.
So let me talk you through a use case. And maybe that will contextualize how we're doing this. We are working with a pretty decently sized publicly traded corporate in the fintech arena. And they have asked us to help them train an NLP model. And we're at the beginning of their training development cycle. And that's probably the best time for us to start helping.
We're helping them size, like, what type of BERT model they need to train, on which kind of hardware back end. And we're helping them see and estimate what the actual cost and time might be to train their NLP model. And this is the type of partnership that we're here to offer to anyone in this room that's solving an NLP type of problem. As you can see, the benefits to our partnership here is we're speeding up NLP training by roughly 2x. And this translates to less time in expensive GPU clouds, which translates to a significant cost savings.
How do we do this? Well, at MosaicML, our technology at this time is pretty simple. There's two steps. One way is through a bespoke cloud that we are engineering that is built from the ground up just for ML training. From the compute, so the chip, all the way to the algorithm, the MosaicML cloud is optimized. So you can see these multipliers in terms of speed that we can deliver to the person training the neural network.
And the other component is, of course, our speed-ups, our algorithms. As Professor Carbin mentioned earlier, it is very interesting indeed that we can speed up machine learning algorithms algorithmically. And so actually a month ago, we launched our open source library, Composer, that has 25 methods for speeding up vision and speech models by about 2 to 5x.
And today, our ask is, again, for anyone here who has large volumes of data and wants to tap in to the unfair advantage of machine learning, please come stop by. We do believe world-class models should be attainable for anyone. I'm here with my colleague, Niklas Nielsen, who's the head of our product, back there. And we would love to talk to you at our table. Thank you so much.
[APPLAUSE]