We just sent you a verification email. Please verify your account to gain access to
theCUBE + NYSE Wired: Zero Trust Cyber Series. If you don’t think you received an email check your
spam folder.
Sign in to theCUBE + NYSE Wired: Zero Trust Cyber Series.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For theCUBE + NYSE Wired: Zero Trust Cyber Series
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for theCUBE + NYSE Wired: Zero Trust Cyber Series.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
theCUBE + NYSE Wired: Zero Trust Cyber Series. If you don’t think you received an email check your
spam folder.
Sign in to theCUBE + NYSE Wired: Zero Trust Cyber Series.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to theCUBE + NYSE Wired: Zero Trust Cyber Series
Please sign in with LinkedIn to continue to theCUBE + NYSE Wired: Zero Trust Cyber Series. Signing in with LinkedIn ensures a professional environment.
Groq specializes in AI inference, offering ultra-low latency and affordability. They cater to diverse customer profiles and use cases, from control systems to real-time applications like stock trading. Groq's LPU chip sets new industry benchmarks for performance speed. They offer deployment options from cloud access to on-prem solutions. Their business model includes SaaS options with pay-as-you-go pricing. Groq emphasizes innovation and experimentation in AI applications, focusing on real-time conversational AI. They believe in enhancing human capabilities w...Read more
exploreKeep Exploring
What does LPU stand for and what is its main specialization in the field of AI applications?add
What are some reasons that many developers are starting to look towards cloud access and having their own API?add
What are some cool things that have been seen with Groq experiments and the freedom to play and build?add
What is the importance of the open source community in influencing technology development and ensuring safety measures are in place?add
>> Hello, everyone. Welcome to theCUBE back here at the NYSE. I'm John Furrier, your host of theCUBE. We're here all week. Dave Vellante was here. He went back to Boston, had to take care of some business there. I'm breaking it down. Trump was here today kicking off the opening bell. Kept our schedule back, but we're back and we're going to keep all day, 12 straight more interviews all day back to back. Mark Heaps is here from Groq. The chief technology evangelist, Jonathan Ross, was on theCUBE, SC23. You guys have been on a rocket ship growth since, congratulations.
Mark Heaps
>> Thank you so much.>> And thanks for coming in. Not bad seeing you here.
Mark Heaps
>> My first time here, it's really, really impressive. A little overwhelming.>> Well, today was a pretty busy day because of Donald Trump coming in but.
Mark Heaps
>> Right.>> In general, the New York Stock Exchange is where the center of capitalism is. You guys are on a rocket ship and just the overall gen AI inference game, you guys were really early, if not first to really come out and declare inference as the killer app. In fact, I think it was on theCUBE, we talked about that two years ago. Now everyone's talking about inference. Okay, training, yes, check. There'll be reinforced learning, I got that. Still a lot of people doing the training, but as more data comes in, it's going to be reinforced. The inference kicks in, it's going to be integrated. You guys have been winning the game on benchmarking speed.
Mark Heaps
>> Yes.>> Doing a lot. Give us the update on Groq because, again, it's been a while since you've been on theCUBE. About a year and a half.
Mark Heaps
>> Yeah.>> About a year? I think it's been a year. Yeah, about a year. Give us the update. I know you got a lot of racks. I saw the social media.
Mark Heaps
>> Yes.>> The big irons moving out the door, the big clustered systems. Give us the update on Groq.
Mark Heaps
>> Well, I think it's been a really exciting year. We always knew that we had something special as far as the chip, the LPU itself as a technology. But as you said, we spent a lot of time trying to educate the world that inference was going to become a new market category for technology. We were looking at the numbers yesterday. Actually, in February, we had 60 people that had signed up for inference on the LPU via a cloud service. As of yesterday, we had 650,000.>> Explain LPU for the folks don't know what that means.
Mark Heaps
>> Yeah, sure. Most people are familiar with a CPU and a computer. You've got a GPU that also gets used. LPU stands for language processing unit, and it was a chip that was originally created by the team with our CEO, Jonathan Ross, as you mentioned. What it specializes in is inference for AI applications. There's two categories in this space. You usually have training for machine learning, and then you also have inference, which is when someone's actually deployed a model and they're really using it in production and running their business on it. Jonathan had this vision eight years ago to say, "I think there's going to be a next level of compute that will put a demand," that could not be delivered at that time. Because of that, now AI applications are getting so much more complex. People are looking for novel solutions, and we were in the right place at the right time,>> There's been a lot of dialogue around NVIDIA's moat.
Mark Heaps
>> Yes.>> People, you're starting to see that you don't actually need GPUs all the time.
Mark Heaps
>> Not all.>> And it's the combination of the right chips and system elements explain this important dynamic because I think this is going to be the key to the system architecture for these new clustered systems, whether it's powered in the cloud with some application or on-prem, which we see connected to the cloud is one scenario for sure, but the growth of on-premise activity is all time high because, hey, if I'm JP Morgan Chase, I'm going to, although I was on stage at Amazon re:Invent, I've certainly got a lot of stuff on-prem regulated industry.
Mark Heaps
>> Sure.>> And they got a 17 billion dollar annual IT budget, so I'm sure they're going to be a customer if not already a customer.
Mark Heaps
>> Yeah, I think there's a really great ramp to deployment optionality right now. I mean, obviously GPUs are great and they've been the industry standard, but you're seeing a lot of folks look at novel chip architectures like Groq to say, "Hey, we have a small business startup that might just need cloud access with a simple API key for their application," all the way up to those people that have to deal with regulations where they need on-prem solutions. And then there's everything in between that's somewhat hybridized, whether they want to have a dedicated provision access to Groq systems, or if they say, "Hey, we want to put something on-site in a data center, but you manage it," you can cover that entire spectrum today.>> What is the use cases that you guys have? I know, again, I love that image on social media. The big box is moving it. That's moving off the docks so to speak.
Mark Heaps
>> Yeah, so exciting.>> You guys are shipping a lot of gear.
Mark Heaps
>> Yes.>> A lot of systems.
Mark Heaps
>> Yes.>> What's the customer profile? What's the use cases? What are you seeing on the market that's leveraging the Groq technology?
Mark Heaps
>> Well, ultimately, the advantage of people that are building applications with Groq are folks that want ultra-low latency. So now you're starting to talk about this movement towards control of systems with conversational AI, something that's real-time. Obviously stock trading, streaming data is going to be key to things like that. You were just showing me one of your latest tools with transcription from video. All of that is going to be happening in real-time today, whether you're talking about autonomous vehicles down to the mobile app that you have. This is where Groq is advantaged in that ultra-low latency. But if you're going to have that sort of speed, it also has to be affordable, and we have an advantage there as well.>> Yeah. What are the latest speeds and fees? Because I know a lot of people, there's no yet standards and benchmarks, so trying to get experts like yourself to lay out how customers should look at performance, because it's not like the old-school days of TPC benchmarks back in the day.
Mark Heaps
>> Right. Definitely not.>> Not because the apps are, I mean, they're almost custom systems. In fact, custom systems every time. There's no one vanilla benchmark.
Mark Heaps
>> No.>> How should people think about benchmarks as obviously power is important.
Mark Heaps
>> Yes.>> Token context, window token speeds. Where's the inference benchmark laying out now in terms of what's settling in for at least de facto standards?
Mark Heaps
>> Yeah, normalizing right now, I think most people in the industry have moved away from the standard benchmarks, especially for this space. Really the first one that if you're not familiar with this space, you should be familiar with tokens per second. And then the second after that is maybe to first token. These are areas where Groq really helped define that new benchmark and set that standard. Other players are coming along to that sort of standard today, but ultimately you want to be able to see what is the performance speed as your applications get more complex, and every time you introduce a new layer to your application, you're introducing latency. So the speed at which you can generate tokens across the end-to-end application really matter for the experience your customers are going to have.>> Can you talk about the aspect of scale? Because I talked to a lot of folks that have kicking, I don't want to say kicking the tires, that's an old term, that's an old school term, but they're playing around. They're really evaluating. They're putting some evaluations, they're doing some workload testing, but there's some use cases that you can put together pretty quickly, RAG's one of them, we see that all the time.
Mark Heaps
>> Sure.>> And things of that nature, it could be computer vision and other things, but then you hit a scale point where things change, and then the dynamics of the benchmark change. You mentioned application latency as you add more applications. This brings up the where do you start to think about where the benchmarks matter and where does it change? Does the scale of my app that the scale of my infrastructure change the requirements for those needs on a price performance? Or how would you? I don't know, how do I ask the question, but I guess I want to ask is that when scale hits, what's different that might be missed if you're looking at a benchmark or evaluation of what looks like a steady state, but it's really not scaled yet.
Mark Heaps
>> Actually, you've hit the nail on the head of why there is a change in benchmarking. Traditionally, when you were benchmarking in the semiconductor industry, you were looking at a single chip instance, and what were the speeding feeds on that singular chip? But when you look at someone like Jonathan, our CEO, who comes from a background of building massive distributed systems at Google with their chip, he understood that performance and efficiency changes at scale. Whether you're talking about the volume of users or the power draw that you have at scale was obviously, that's obviously on everybody's mind right now of how sustainable is this with this demand for compute. The LPU system was actually designed to be utilized at scale. In fact, we tell people, if you want one or two chips from us, you shouldn't buy from us, you're going to go with a standard in the industry, an incumbent. But when you start talking about serving tens of millions of users and not needing to create lots of micro systems to serve that volume, you're going to be going to want to be on a chip that was designed to handle that scale.>> When you talk about deployment optionality, I like that phrase you mentioned before, you guys are hitting that use case on the large scale. See the systems, it's not a onesie-twosie deal. What about enterprises that might be diverse in or have different optionality on their architecture? It could be maybe distributed, does Groq have a play there? Is it just large scale? Is there a version where I can come in and then scale up and add more Groq machines? Take me through because I might say, "Hey, I got this workload here."
Mark Heaps
>> Right.>> I might not need the full system, what do you guys say to that?
Mark Heaps
>> I think this is why we're seeing so many people actually start looking more towards cloud access and having their own API. I mean, you could start today with Groq completely for free, get an API key. And like I said, of the 650,000 developers, so many of them are startups that are just building apps and trying it, right? But now as they start reaching out to us saying, "Hey, I've hit my rate limit, I've hit my ceiling."
Well, there's another tier level, eventually you're going to get to a tier level where you say, "Well, maybe I should provision a dedicated system. Maybe we should actually put something on-prem."
Our biggest customer that's over in Saudi Arabia right now, and we've had lots of press announcements about that, we're shipping racks, we're doing all this work over there because they intend to deploy for these large enterprise services in the region.>> The business model is come in with the SaaS, play with the rate limits and use pay as you go, or large scale, we'll send you systems.
Mark Heaps
>> Correct. That's exactly right. So this is why we tell people when they say, "Well, are you this company or are you this company?"
And I just say, "Yes.">> Okay, so when you said to me we could use Groq, you didn't mean get a system. You meant use your cloud.
Mark Heaps
>> Basically use the cloud and go from there. We've got some really great announcements that are going to be coming in first quarter next year of some of these larger enterprises and what they're building on us. We're very lucky as well to be partnering with Meta and the open source community with the models that finally the enterprise are actually finding some real value in.>> I've been saying for a couple of years, obviously about clustered systems. This year I've been shouting top of my CUBE lungs I could say, is that the open source growth has just been so phenomenal, unbelievable. And you've been seeing the developer frenzy. It's a bit of frenzy.
Mark Heaps
>> Yes.>> And then you see the rapid pace of performance, price performance on tokens, token size windows, context windows, price per token, your system is getting faster so we're getting to this crescendo where it's the tipping point is coming.
Mark Heaps
>> Oh yeah, for sure.>> Once those developers are unleashed.
Mark Heaps
>> Yes.>> I think there's going to be a Cambrian explosion of just action. Take me through how you see that happening, because you guys are in the middle of it. What's the developer appetite for Groq? What are some of the cool things you're seeing that could support connecting that dot? Because that to me is they're just holding back. I mean, they're definitely working and I see work for sure, but it's going to be unleashed.
Mark Heaps
>> Yes.>> When I say unleashed, I mean full unleashing of new stuff.
Mark Heaps
>> Yeah, I think we're->> It seems a little bit restricted because of the costs, because of some of the cloud costs and getting hands-on gear has been a constraint.
Mark Heaps
>> Well, I think Andrew Ng famously was talking about earlier this year, he had a viral moment where he said the cost of tokens has gone down by orders of magnitude even since a year ago. Jonathan always had this vision from when I first met him, and we were discussing Groq that he wanted anyone with a credit card to be able to create applications and get started and building today. That wasn't the case five or 10 years ago, you had to provision for hundreds of thousands of dollars, cloud compute. And today, someone can literally dive in and start building an application and bring it to market. So when we think about what Groq has done, we stood up our cloud instance earlier this year and we gave everyone access for free just to see how they would build on it. What's really happened in that unleashing, as you say, is people pushed it to the limit rapidly, and then they came back and said, "We need more of this, we need more of that." So really the speed of iteration from them has been the speed of innovation for us.>> And it's good. You get to see what they're building because we want feedback.
Mark Heaps
>> Absolutely.>> It's a product requirement.
Mark Heaps
>> Greatest product requirement ever. Totally.>> Yeah, it sounds like, I mean, that was what made Amazon Web Services famous was that put your credit card down and then go, yeah.
Mark Heaps
>> Right, and launch.>> All right, so what's the coolest thing you're seeing so far? If you had to lay your top three coolest things you've seen with some of the Groq, I won't say experiments, but the freedom to play and build, what are some cool things, Mark?
Mark Heaps
>> The thing that I'm getting really excited about, and Scott Belsky from Adobe was actually recently talking about how we're moving into the control era. I think the way that people are interfacing with their applications is changing dramatically. Watching people who a year ago were building a chat app that was simply type in today, you can actually have a voice and a conversational AI with a real-time conversation with data is so exciting. And then we see some really abstract ones.>> What's the weirdest thing you've seen?
Mark Heaps
>> Okay, so this is the one.>> Cool, weird.
Mark Heaps
>> Cool, weird, so this one. We did a hackathon in San Antonio maybe a month ago. I was one of the judges there. A student group actually built an app that allows you to find in your general area a date based upon recipe cards and workout programs. You talk to your agent and you tell it what your favorite foods are, and it builds a recipe card and it builds a social profile. And if anyone in your area likes the same recipes and the same workouts, it will connect you and say, "Maybe you should get together and cook." And I thought, I don't know how that's going to work, but I love the creativity.>> I do like it. I mean.
Mark Heaps
>> And it was fully powered by Groq, which was great.>> You know you have a social network when there's dating and money making, so that's the sign of success.
Mark Heaps
>> Yeah, absolutely. I think the other exciting part is a lot of folks are scared of AI. There's a lot of intimidation about what it's going to do, what it's going to take away from people. But I think it's important to notice that you're having really great advancements in drug discovery, cancer screenings, language services, that all require this level of real-time inference. And because of that, I just think it's a really exciting future for everyone.>> So I got to ask you, if you had to look at the human in the loop piece, because I think I was talking to someone about the regulating tech in our CUBE pod on every Friday that Dave and I do, plug for theCUBE pod. Check it out every Friday, Dave and I go riff and share. We are so anti-regulation, let tech flourish. There's all this talk certainly outside Silicon Valley, around DC politics, certainly politics is coming into the tech scene. I said to this big impact investor out here, the talk, and I'm like, "You want to make a change? Change the consumer behavior." Because the companies will respond to that rather than trying to put handcuffs on people.
Mark Heaps
>> That's right.>> Change the user behavior, enable the developers to build the cool app. I wouldn't say weird app, that's a innovative idea.
Mark Heaps
>> Absolutely.>> Dating based upon preferences.
Mark Heaps
>> Sure.>> I mean, concept that makes sense.
Mark Heaps
>> Yeah.>> That takes a lot of energy to do that, that's a recommendation engine basically. Why not change the consumer behavior? Because if you focus your energy on regulating, the innovation could get stalled in a critical period. Let chaos reign and then reign in the chaos as Andy Grove used to say. And so what's your thoughts on that? Because we are in this human era where if AGI is coming down the pike, and we can debate when or where, if, when that happens, it's software driven for sure, but where's the human element, right? The human element. Is that your heart, your mind?
Mark Heaps
>> Right.>> What's your thoughts on this?
Mark Heaps
>> I think this goes back to the value of the open source community, right? Because an open source community doesn't work for one logo and doesn't represent one company. So what happens is you've got a shared trust of mind of what is the right way to use technology being built by people that best understand the technology. Because of those voices, like you said, the user behavior, you see companies like Meta AI and their team that recently released Llama Guard as a part of this open source model, which puts some safety in place for these systems, but that's heavily influenced by the open source community and what they believe as policy should be integrated into the technology.>> Mark, great to have you on theCUBE. Certainly love to do more of our videos with you guys, I know you're super busy spreading the word and also implementing some of those requirements that you're getting from the open source community developers as well as feedback from builders. Final question for you, more of a thought exercise as we think about the future, what does it mean to you to preserve your humanity in a digital age?
Mark Heaps
>> Well, I think if anyone's seen my LinkedIn profile, my mission statement is to keep the heart of humanity and the soul of AI.>> Explain.
Mark Heaps
>> For me, I think there's two ways of looking at AI right now, and a lot of people are talking about how it replaces humanity. I actually believe, like all advancements in technology, this becomes a tool to extend on human capability and human agency. Actually, I have a paper about this on the Groq blog called HumanPlus. And ultimately, we need to be asking the tools and the makers of the tools what would make our lives better. So again, we hear people right now talking about how AI can write poetry, but we don't want to replace the humans that write poetry, that's a beautiful part of humanity. What we do want is it to work faster on our spreadsheets so that we don't have to do that.>> So you write more poetry.
Mark Heaps
>> Yeah, we want it to do our taxes for us, right? We want it to do the monotonous tasks.>> Yes.
Mark Heaps
>> And that has been true all the way back to Henry Ford working on the factory line. We just need to really make sure that humans are speaking to what they want AI to do for them.>> Yeah, heart and mind in the software.
Mark Heaps
>> Absolutely.>> Some data input. I mean, if we're using synthetic data to fill the gap from corpus of current data, how's the new data coming in? What is that data? So it's a really good point. I really appreciate your opinion because something that we're really going to ask a lot of questions around and start thinking about is as we move into software and taking more inputs.
Mark Heaps
>> Right.>> What is the humanity angle? Certainly technology works for us.
Mark Heaps
>> Yes.>> Not the other way around.
Mark Heaps
>> That's the way it should work.>> Thanks so much for coming on theCUBE.
Mark Heaps
>> It's a pleasure, thanks so much for having us.>> We got Groq inside theCUBE. We are here at the NYSE for Media Week. We'll be up and running full time starting in January, February timeframe as our studio gets continually built out here. This is theCUBE action, I'm John Furrier, your host. Thanks for watching.