We just sent you a verification email. Please verify your account to gain access to
SC24. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For SC24
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for SC24.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
SC24. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to SC24
Please sign in with LinkedIn to continue to SC24. Signing in with LinkedIn ensures a professional environment.
Cerebras Systems offers the largest chip in the computer industry, 46,000 square millimeters in size, delivering exceptional compute performance for training and inference. Their innovative chip design incorporates redundant tiles to manage flaws, resulting in exceptional results in less time and with lower power consumption. Cerebras has made significant advancements in training and inference, outperforming competitors in speed and efficiency. Partnering with organizations globally, they have established a strong presence in the AI industry, focusing on high...Read more
exploreKeep Exploring
What performance numbers were achieved for the launch of inference support for Llama 405B at the show?add
What types of work do you do in the field of artificial intelligence?add
What are some key aspects of AI training and inference that are important to consider, particularly in relation to companies like Nvidia?add
What is the evolution and impact of chip sizes in the technology industry, particularly in relation to Nvidia's chips?add
>> Welcome back everyone to the Cube's live coverage here in Atlanta for Supercomputing '24, SC24, as it's called. I'm John Furrier, host of theCUBE, with my co-host, Dave Vellante, also the co-host of theCUBE Pod every Friday. Check it out. We're recording at this Wednesday because we're traveling. Our next guest, Andrew Feldman, entrepreneur, co-founder and CEO of Cerebras Systems. Andrew, welcome to theCUBE because we've been talking about for now three years about this whole systems culture, systems revolution. So you are not only living it, you're building it, you're rolling it out. You have the big chip.
Andrew Feldman
>> Yes, that we do.>> Thanks for coming on theCUBE.
Andrew Feldman
>> Thank you for having me. I really appreciate it.>> So first of all, before we get started, show the device because I think this is super important. It's a chip. Explain what this is.
Andrew Feldman
>> This is the largest chip in the history of the computer industry. So this is the chip that it is 46,000 square millimeters. As you guys can see, it's the size of a dinner plate. Traditionally, chips are the size of a postage stamp. A chip this big processes more information in less time using less power, and that's the goal. What we do is we don't sell the chip->> Can you flip it around?
Andrew Feldman
>> Yeah, sure. We don't sell the chip. We use it in a system we build to deliver extraordinary compute performance, both for training and for inference.>> Awesome. And I bring this up because I want to set the table with the size of the chip first, because traditionally, smaller, faster, cheaper was the old way. Okay, now we've got bigger and better with these clustered systems. We've been writing about it on SiliconANGLE. Dave's heading up our research team, and I've wrote in research notes that it's the systems of servers together, Ethernet, networking components, interconnect. You're seeing bigger chips because they're optimizing ...
Andrew Feldman
>> That's right.>> Core performance that's needed. And GenAI is requiring massive resource. This is a fact, people now, it's no debate. That's just a fact. Okay, now the next level question is that what software runs on this? How does it change the existing configurations of the classic, I call, the Holy Trinity of the compute industry, storage, networking can compute? And then now you've got connected cloud, on-prem edge, full distributed computing environment at scale. This is the internet.
Andrew Feldman
>> This is.>> This is where we are. What's the significance of this? How does the chip size matter? What kind of software is coming? I know it's a big question, but .
Andrew Feldman
>> Well, it is a big question. It's an everything question. I think obviously big chips aren't right for everything. They're wrong for your cell phone and they're probably wrong for your car. But in AI, we're not interested in the behavior of one chip. We're interested in the behavior of tens of thousands of chips and tens of thousands of little tiny chips creates a tremendous complexity of trying to tie them together again, all right. Remember when chips start, they start as a wafer. They're cut up into little pieces, they're put in different machines, and then we tie them back together again to get them to behave. We put Humpty Dumpty back together again. Our view is why are we cutting them up? What if we could yield the largest chip in history? What if we could keep more information on the chip? We would use less power, we would produce results in less time and we would make it vastly easier to program. And that was our vision eight and a half years ago at Cerebras and it's been a phenomenal run since. We have deployments around the world. It's been an extraordinary run.
Dave Vellante
>> I remember, I first heard of Cerebras in the news, but then Anastasi in Tech did a deep dive. She's amazing.
Andrew Feldman
>> She is amazing.
Dave Vellante
>> And she talked about the power efficiency, the performance, the number of transistors, the more efficient software capabilities that you have. And she also talked about the yields and you just mentioned yields. And I thought, oh, maybe the yields are going to be worse, but you had-
Andrew Feldman
>> The yields are better.
Dave Vellante
>> Better yields. And explain how you achieve that with such a large phone factor.
Andrew Feldman
>> Sure. So many people told us this couldn't be done. And the first reason, and the world is filled with people who say, "It'll never work, it can't be done." We have no interest in doing business with them. Those aren't our people. Our people are the engineers who say, "If he says it can't be done, I want to do it," all right. Those are our people. The first thing, those with a mouth full of, it'll never work. What they said is you'll never yield. And that's because every wafer has some flaws that are inherent to it. And traditionally, as we put little chips on them and then we cut up the wafer into little die, we tested each die. And if they had a flaw, we throw it away. That's right. And everybody said, "Well, you'll never have a full wafer this big that doesn't have a flaw. How are you going to yield a part?" And what we knew was that there were other ways to manage flaws. And in fact, in memory, they manage it very differently. In DRAM, they have almost perfect yields and they have almost perfect yield because they have a repeated tile design. They have hundreds of thousands of bit cells, each identical, and they have redundant rows and columns. And when they have a flaw, they map it out, use one of the redundant cells and keep going. And so we came up with the idea, mostly my co-founders, JP and Michael and Sean and Gary, they came up with the idea that if we built a repeated tile design with hundreds of thousands of identical tiles, if one was a flaw, we could have some redundant ones layered in and we could shut down the flaw and use the redundant tile. So the whole idea was built to withstand flaws, not need to eliminate them. We yielded it right away. It took us about 18 months, about $10 million to solve a problem everybody in the industry said could never be done.
Dave Vellante
>> I mean, I think about log-structured file back in the storage days. This is obviously in real-time.
Andrew Feldman
>> That's exactly right. I think that was some of the magic of doing work that other people say can't be done.>> Talk about some of the examples you guys have in production right now, because the research that we're seeing in the marketplace relative to your value proposition is that technical people love this. If you go to the MLOps folks, people who are grinding right now, then you got an onboarding wave of developers coming in.
Andrew Feldman
>> We do.>> You guys had great benchmarks we just covered it on SiliconANGLE this week news. Talk about the news or the performance gains, record numbers and talk about the efficacy of the benchmark. And if you can, comment on the, I won't say whitewashing the benchmarks, but I mean you could fudge the benchmarks if you throw more power at it. I mean anyone can get more anything, so talk about how should we evaluate benchmarks, talk about the news and the benchmarks.
Andrew Feldman
>> So at the show we launched inference support for Llama 405B. This is the largest open source model. And we launched performance numbers of 969 tokens per second. To give you an idea, Azure, Nvidia running under Azure is at 13 tokens per second. All right? So we're more than 75 times faster than hyperscalers. We are offering Nvidia product. This is the fastest in the industry, bar none. And so we're really proud of that. And today, since we launched our inference services in August, we were the fastest and have been the fastest every single day at Llama 8B, Llama 70B, And as of yesterday, Llama 405B. We're so fast that it changes the way you can use Llama 405B, and you can use it to compete effectively with the largest closed source models. So that's the first answer to your question. We do both inference and training. We have partners around the world. Our largest strategic partner is a group called G42. And with them, we have deployments that have built super computers measured in exaflops. Built them in Santa Clara, California, built them in Dallas, Texas, and now we're building them in Minneapolis, Minnesota. We've trained leading models. Right now, the premier Arabic-English model LLM is a model that we worked with G42 together to train. And it's now being used by hundreds of millions of native Arabic speakers. We've trained models in Catalan, in Kazakh, in Hindi, all of which is in our training business. In our inference business, business is exploding. Remember, training makes AI and inference uses AI. And right now, people want to use AI like crazy. And what we're seeing is overwhelming demand for inference.>> And the technical people like it. Talk about the inference as the killer rap because training is like going to school. I don't go back to fourth grade. I train and then maybe I reinforce my learning, but I ensure-
Andrew Feldman
>> You're trying to solve for X.>> I go to school, I get trained, and then I graduate and I infer in the real world. That's like AI brain and then I reinforce it. This is AI. AI is very brain-like. How do you guys do on inference and how do you sell to your customers? Because is it, I mean, I can see Amazon and Azure, I want to build my own system. So where are you guys targeting? I know you're going right after Nvidia. You guys are pretty clear on that. The big green machine, I call it. What is the killer app for inference and how does that compare?
Andrew Feldman
>> Look, Nvidia is a great company and nobody's done better over the last 10 years. I think in 2014, they were worth 10 billion. And today they were, what? 3 trillion. I mean they've had an extraordinary run. But this is a very big market and there's room for a lot of winners. And we're going to do our best to put ourselves in front there, so we're one of those winners. I think high performance inference is a game changer. What OpenAI recently showed is that you can use performance to get better accuracy. What we all want is more accurate models. And what they've shown is through techniques like Agentic models and through techniques like chain of thought, you can use speed to ask the model to improve itself in a train of thought flow and the accuracy of the model improves. And when you're 10, 20, 50, 70 times faster than the competitors, you can use some of that time to improve the accuracy of the model. You can give the user a better answer and they won't even notice. And so what we've pioneered is the fastest inference, bar none.
Dave Vellante
>> So I want to come back to a couple of things. So it seems to me that, well, of course the economics of training LLMs are just horrendous, right?
Andrew Feldman
>> It's an expensive process.
Dave Vellante
>> If you have to adhere to the scaling laws and you look at the price per token, it's just painful.
Andrew Feldman
>> It's expensive.
Dave Vellante
>> So you help with that problem?
Andrew Feldman
>> We do help with that problem.
Dave Vellante
>> Okay. It's still a huge problem even after-
Andrew Feldman
>> It's a huge problem. There are two parts of the problem. There's the capital that goes into buying the equipment, and then there's the operating expense, which is almost all power, right?
Dave Vellante
>> Right.
Andrew Feldman
>> The power used at the data center. And we cost less to buy and we use less power to generate FLOPS. So we help on both of those tokens, but even then training giant models is expensive.
Dave Vellante
>> Okay. And then the other thing is I've talked to financial institutions that say they don't want to use a closed LLM, proprietary LLM. They don't want to use Llama because they're afraid of the fine print. And so they said, "We're going to build our own." Now we'll see. They used to say that about the cloud ...
Andrew Feldman
>> They did.
Dave Vellante
>> But they have money and they're talented. And a couple of them have said, "Well, we're bringing this Cerebras and we're going to build our own model." And so that's another opportunity. I know you can't name names, but is that a trend?
Andrew Feldman
>> I'm happy to name names. I mean, customers like GlaxoSmithKline have trained their own models on our machines. Customers like Mayo Clinic, we've announced, trained their own models on our machines. We've announced customers like TotalEnergies. So we've signed an MOU with Aramco. And as I said, our work across the G42 companies, including Core42 or MBZUAI and a collection of others in the United Arab Emirates, and they're leaders in their field, all of whom are using both open source and training their own, using their own data to create advantage through AI.
Dave Vellante
>> And I know of at least two others that you didn't mention, so I won't. And then the third piece is on the inference side. Our vision of Agentic is that these agents, people think these agents are like God agents. No, they're worker bees, but they will learn from human reasoning traces.
Andrew Feldman
>> They will.
Dave Vellante
>> And they're going to need really powerful inferencing to do that. And that's how we're going to automate that long tail of processes that are unautomated today. That's a multi-trillion dollar opportunity.
Andrew Feldman
>> Yeah. That's right. In 2017, the guys at OpenAI published a paper. I think Ilya Sutskever, one of the founders, published it. And what he said was that they were able to identify a scaling law. A law that said, as you added compute to a training problem, all right, the accuracy would improve and they could see no end. Now over the last five years, the amount of compute we've needed to train frontier models increased by 40,000X. Now a few months ago, they announced they found the same thing for inference. As you add more compute, exactly as you said, as you ask it for an answer and then ask it again and ask it to improve it again, it continues to improve as you add more compute. And this is what's led many to say inferencing by itself, separate from training, is going to add millions of X of compute requirements.
Dave Vellante
>> I want to ask you because people think it's an either or an or and it's not an either or an or-
Andrew Feldman
>> It's not.
Dave Vellante
>> But correct me if I'm wrong that paper, I remember it, but aren't there diminishing returns? You not only need compute, you need data, you need parameters. All three have to scale together.
Andrew Feldman
>> All true.
Dave Vellante
>> And is it true that basically they're running out of data and the synthetic data is maybe an answer to that, but that synthetic data is not going to be able to replicate JP Morgan Chase's proprietary data, so that's a huge opportunity.
Andrew Feldman
>> I think there's a tremendous opportunity in the creation of synthetic data. But I think also there's an opportunity for those companies that have spent the last several decades marshaling and husbanding their data. Mayo Clinic has one of the great medical repositories of data, patient records, MRIs, fill, tissue samples, genetic data. There's a huge amount of insight there. Now, these aren't companies that scrape the internet for their data. There's a place for that too. But these are companies like Total, who have seismic data, all right. Or Aramco or Adnoc that have billions of dollars worth of data that has been collected. GlaxoSmithKline, Novo Nordisk, they have spent unbelievable amounts of money gathering data over the years and now there's an opportunity to use these tools to find insight in that data. And I think there's a tremendous opportunity there and we're just beginning to scratch the surface.>> We wrote a post on SiliconANGLE , authored with George Gilbert and the team, and what we said was that title's a little bit salacious to get the, we don't really do clickbait-
Andrew Feldman
>> Say it's not true. Say it's not true. The headline wasn't designed for->> No, we usually don't design for salacious headlines, but in this case, it was designed to get attention. And he said in the headline, Jamie Dimon and Sam Altman's new competitor, basically to make the premise, hey, the enterprise has data and they're not going to go to OpenAI to do it. They're going to build their own OpenAI for themselves, meaning their own language model, to your point. This is what you're squarely going after from what I can tell, right?
Andrew Feldman
>> We are going after people who have interesting data sets and who wish to find insight in them through the use of models they design, through use of models we help them design or through third-party models that they pay for. I think those are absolutely the bullseye for us and we've done really well with them.>> Okay, so you sold me and I love the big chip. So I'm a believer, obviously been for a while. I think that's the right way to go. So I think the enterprise is a huge opportunity.
Andrew Feldman
>> So does Nvidia, by the way. I mean Nvidia, in 2015, their chip was 400 square millimeters. Five years later, their chip's 800 square millimeters.>> They're big chips. Absolutely.
Andrew Feldman
>> They're big chips. Now they've gone from one to two, trying to recreate what we've done with our big chips.
Dave Vellante
>> Big chips with big , which brings other challenges.
Andrew Feldman
>> They're big chips. >> Big chip game. It's a big game with big chips.
Andrew Feldman
>> That's right.>> Okay, so I'm sold the big chips. So I buy that and the architecture. Now the conversation now goes into the power law of big, and then you have medium and small chips and small language models. So you start to see the evolution, with distributed computing. I can't put a big chip on a camera to do computer vision. So how do you see your vision, as you look at the architecture emerging, where I'm going to have to do inference at the edge? How does that system work? Take us through the vision because, again, you mentioned earlier things are connected. What's the vision there?
Andrew Feldman
>> I think there will be a tremendous opportunity for little chips at the consumer electronic edge, in your phone. We're going to put a little bit of inference there.
Dave Vellante
>> Those are big chips too, by the way.
Andrew Feldman
>> These are bigger chips.
Dave Vellante
>> Apple. That's big chips.
Andrew Feldman
>> We're going to do a little bit of inference here.>> Compared to the big, they're small.
Andrew Feldman
>> That's right. Maybe in the car, you can have a little bigger chip because you have a bigger battery.
Dave Vellante
>> Okay.
Andrew Feldman
>> But what we saw in the arm at was the rise of compute in the phone didn't diminish the rise of compute in the data center. In fact, it accelerated it.
Dave Vellante
>> Absolutely.
Andrew Feldman
>> That people wanted to use apps on their phone, but when heavy work needed to be done, the call went back to the data center. And the exact same thing was going to be happening with the car. We're going to do inference in cars, as we are already with the self-driving cars. But that's not where the model's going to be trained. That's not where the QA is going to be done. That's not where new things are going to be developed. Those are all going to be worked on in the data center. And so I think they work together, and far from sort of nibbling at each other's pie, they make the pie bigger. The more applications that are at the consumer side that use AI, the more need there is to train models in the data center. The more need there is for inference at the data center as well. It's a flywheel.>> Yeah. -
Andrew Feldman
>> And that's why, one other thing that's really important, that you're able to do both training and inference. I think if you want to be a player in this market, it simply isn't enough just to do inference. Your customers are going to want, once they've got their model going, want to fine tune it. They're going to want to improve it via training with new data. So you need both.
Dave Vellante
>> And that's why it's not an either/or. By the way, Jensen agrees with you.
Andrew Feldman
>> I think we agree on many things. Some, we don't.>> Yeah, and one thing that you both agree on, the software coming to be GenAI to be on the winning side of history is you got to have the machines and the architecture. And this is a big point around the systems thinking
Andrew Feldman
>> This is a system problem. I mean in 2015, 2016 when we started the company, we were going to build systems from day one. This is my fifth startup and all previous companies were system companies. We love that. We're system builders. I don't want to sell chips, we want to sell systems. Now at that time, all right, everybody else was selling chips or chips on PCI boards. Nvidia saw the light. They moved to building systems like the DGX. Recently, what did AMD do? AMD acquired ZT, a system company. Because once you build a race car engine, you don't want to just hand it to some random car maker and say, "Make me a race car." You want to build the entire race car. And if you do, you can make it fast.
Dave Vellante
>> Yeah. But you can optimize it.
Andrew Feldman
>> You can optimize every aspect of it, every aspect.>> Clustered systems, welcome to the new era. Andrew, thank you so much. I wish we could do an hour on the podcast. We'll have you back on for podcast appearance on our Friday show.
Andrew Feldman
>> Happy to be back anytime. It's really fun talking to you guys. Thank you so much.>> Yeah, we are living in a systems revolution. The new era is here. The old era is gone. The new way is coming. Architecture, systems, software, and it's going to repeat itself in a virtuous circle. So theCUBE will have all the covers here on theCUBE. Thanks for watching, here live from Supercomputing 24. I'm John Furrier with Dave Vellante. We'll be right back. >> .