Sunghyun Park of Rebellions.ai joins theCUBE Research and NYSE Wired to examine
how purpose-built memory-centric silicon enables always-on real-time artificial
intelligence inference in next-generation data centers. Park outlines
Rebellions' approach to optimizing inference workloads for energy and cost
efficiency, describes rack-level system design and networking and highlights
commercial deployments with Korean telecommunication operators that validate the
architecture. They explain how production deployments with SK Telecom and Korea
Telecom demonstrate high API throughput and improved operational efficiency.
Park argues that the market shifts from training to inference, where performance
per watt and per dollar matter most. They emphasize that memory-centric
architectures, photonics-enabled networking and rack-level co-design are
critical to lowering total cost of ownership. TCO reduction enables diverse
deployments in cloud, edge and on-device environments. Rebellions reports
production collaborations with SK Hynix and Samsung Foundry on silicon and
manufacturing roadmaps and describes how rack-scale co-design and networking
choices reduce power consumption and operating expense. The discussion also
covers open-source software stacks such as PyTorch for inference optimization
and practical factors to consider when deploying AI infrastructure at scale.
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
theCUBE + NYSE Wired: AI Factories - Data Centers of the Future. If you don’t think you received an email check your
spam folder.
Sign in to AI Factories - Data Centers of the Future.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open the link to automatically sign into the site.
Register for AI Factories - Data Centers of the Future
Please fill out the information below. You will receive an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for AI Factories - Data Centers of the Future.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
theCUBE + NYSE Wired: AI Factories - Data Centers of the Future. If you don’t think you received an email check your
spam folder.
Sign in to AI Factories - Data Centers of the Future.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open the link to automatically sign into the site.
Sign in to gain access to theCUBE + NYSE Wired: AI Factories - Data Centers of the Future
Please sign in with LinkedIn to continue to theCUBE + NYSE Wired: AI Factories - Data Centers of the Future. Signing in with LinkedIn ensures a professional environment.
Are you sure you want to remove access rights for this user?
Details
Manage Access
email address
Community Invitation
Sunghyun Park, Rebellions
Sunghyun Park of Rebellions.ai joins theCUBE Research and NYSE Wired to examine
how purpose-built memory-centric silicon enables always-on real-time artificial
intelligence inference in next-generation data centers. Park outlines
Rebellions' approach to optimizing inference workloads for energy and cost
efficiency, describes rack-level system design and networking and highlights
commercial deployments with Korean telecommunication operators that validate the
architecture. They explain how production deployments with SK Telecom and Korea
Telecom demonstrate high API throughput and improved operational efficiency.
Park argues that the market shifts from training to inference, where performance
per watt and per dollar matter most. They emphasize that memory-centric
architectures, photonics-enabled networking and rack-level co-design are
critical to lowering total cost of ownership. TCO reduction enables diverse
deployments in cloud, edge and on-device environments. Rebellions reports
production collaborations with SK Hynix and Samsung Foundry on silicon and
manufacturing roadmaps and describes how rack-scale co-design and networking
choices reduce power consumption and operating expense. The discussion also
covers open-source software stacks such as PyTorch for inference optimization
and practical factors to consider when deploying AI infrastructure at scale.
>> Palo Alto Studio Connections, Silicon Valley and Wall Street. I'm John Furrier here, talking to you here with Dave Vellante, my co-host. Hello, I'm John Furrier, host of theCUBE here in theCUBE's NYSE studio. Of course, we have our Palo Alto studio, connecting Silicon Valley and Wall Street. This is our AI factory series where we interview the leaders who are building out the AI infrastructure that's enabling all the innovation and societal change and of course, economic advantages across the globe. It's a global opportunity. And we're here talking about purpose-built silicon. Sunghyun Park, Co-Founder and CEO of Rebellions here, a hot company, building large scale systems and deploying them and providing the products that we need to be successful. Thanks for coming on theCUBE, on our AI factory series.
Sunghyun Park
>> Thank you for inviting me here. It's historical place hear. Yeah.
John Furrier
>> The AI infrastructure is continuing to grow. The demand we're seeing build out, obviously we're seeing some of the supply chain challenges, but that's not going to stop the momentum. And it's changed. So, my first question is, what's been the big changeover that's made you successful in this market? Has it been the paradigm shift to large scale systems? Has it just been AI in general? What's been the big driver for your success?
Sunghyun Park
>> Right now, so one word, it's inference. Right now, market is from training to inference. And in AI perspective right now, moving forward from occasional AI to always on AI, real-time AI, and also running at scale, that inference billions of per scale per day. That's the key difference. Training wise, wall tops, interconnect is much more important. However, in inference, efficiency, especially cost and energy efficiency matters most.
John Furrier
>> Yeah. A lot of inference been going around. I want to get into some of the systems that you have and some of your strategy you guys have been executing on. But first, you guys have a purpose-built silicon that's the engine of the AI factory. Explain the purpose-built chip definition and the role and the importance of the role of purpose-built silicon.
Sunghyun Park
>> Basically, compared to general purpose GPU, for example, GPGPU, general purple GPUs. Our custom silicon and purpose build is 100% optimized a single workload. For example, our Rebellion case is its kind of inference. In terms of the caving cache, it's not a software solution at all. Right now, we are highly focused on memory centric architecture. It's 100% optimized for inference workload that include the caving cache here. Also, in terms of enable scale-up and scale-out, we need a specific hardware primitive. That kind of primitive also hardwired inside chip. We call that purpose build, not general purpose. So, compared to the general purpose, we a little bit compromised of sacrifice the flexibility functional diversity. However, in terms of intelligence per watts, per dollars, we can offer a way higher efficiency.
John Furrier
>> I love the AI factories because it's like everyone's cheering for you back here. They love that.
Sunghyun Park
>> What's going on right there? Yeah.
John Furrier
>> Big trades are going down. A lot of them heard our AI factory narrative. But the AI factory is fascinating because you have almost an inverse of the technical architecture mindset of the old generation. You build a server, everything's on the board. Now you have density. You have a lot of chips working in concert. It's a collection of systems operating as one system. So, you can almost say an AI factory is a super server. It's a super computer. What does that mean for architecturally for inference? Because we saw the training, throw everything at it, train everything. Now you have inference, which is much more broader opportunity, bigger opportunity.
Sunghyun Park
>> Exactly.
John Furrier
>> It's different use cases. How does the architecture need to be in place for inference? Is there a rule of thumb? Is there a general principle? Take us through the thinking around how to architect for an inference world.
Sunghyun Park
>> Exactly. From the market perspective, we have totally different physics and tokenomics. In training, the performance is much more important than any other metric. However, in inference, as long as you can meet some minimum requirements, such as SLO, minimum first token generation time, the token per second, nobody care about performance. They care the performance per watts, per dollars, because inference right now is being commoditized. Efficiency matter most compared to training wise. Training, basically, we have biggest, biggest part interconnect all the chip using the . However, in inference, we little bit different, little bit sacrifice. We compromise a little bit. However, efficiency is the most, most important metric in the inference era.
John Furrier
>> You mentioned on the intro, you talk about energy.
Sunghyun Park
>> Yes.
John Furrier
>> Energy is on everyone's mind. The entire scale is bounded by energy. How does the energy equation change when you start to look at the diversity of deployments? You're going to have the big mega centers. You're going to have maybe smaller footprints still doing maybe 10 kilowatts. Then you're going to have smaller footprints. You're going to need to do inference everywhere.
Sunghyun Park
>> Exactly. That's the key point. Basically, we call it edge data center and the backbone data, something like the gigawatts or something. Right now, most of the AI inference running at the data center. But in the future, near future, some of the edge data center and some of the on device, depending on the model size, the parameter size, also some system requirement. If you're running very latency critical application, you don't need to go to backbone data center, just edge data center on device, good enough. However, depending on the system recover and the model size, it diversify all the data center, depending on ... That's why inference is totally different physics and tokenomics.
John Furrier
>> Yeah. I mean, there's so many things that jump out. First, we talked about data. Got to get the data close to the processing units. That's key. Memory. Energy, always the discussion. Now, the new conversation, and this has been in the inner circles of the leaders, but now it's going mainstream, is the role of networking.
Sunghyun Park
>> Exactly.
John Furrier
>> What's your vision on how the importance of networking... Because when you add energy, moving things around requires energy. Data is being moved around. Networking seems to be something that's not talked about a lot. What's your vision and role or your view of the role of networking?
Sunghyun Park
>> Yes. Network is super, super, super important, but sometimes ignored because too much attack it right now. Somebody talk about the . So, scale-up, scale-out, even between the data center, not just the rack to rack, not just chip to chip. We definitely need such kind of networking. That's why everybody talks about photonics and optical. It's not a future. We almost, almost there in terms of commercialization. So, we need to get some low latency and the higher throughput, even lower energy to communicate the data, even at energy.
John Furrier
>> I ask dumb questions at these events. I look at the big Nvidia rack and I say, wow, that's a lot of product in there. Why so many switches? If you look at the number of GPUs, the chips, the switches take up a large portion of the racks. You guys are in the rack-scale business with Rebel. You have a lot of developers. Just what's your vision on how the racks will change? Because again, we're going to need to squeeze as much energy and footprint out of the racks and their deployment.
Sunghyun Park
>> Yeah. Basically, as mentioned, we are preparing the rack-scale solution, not just server, not just a please check, but why? We need the system level optimization, costly optimization. As you mentioned, a lot of Sweden storage, a lot of optical interconnect right there. So, it's not just the memory pool. We need some new memory architecture combined with the networking. So, in order to deliver lowest cost and lowest energy consumption, given the normalized throughput, the token throughput, we definitely costly optimization at the rack-scale. NVL72 from Nvidia is one example right there. We also need internet based standard. Also, each layer must be customized, costly customization. That's why we need to prepare rack-scale solution, not just .
John Furrier
>> And you're working on that now?
Sunghyun Park
>> Exactly. Yes. Right now, our rack-scale solution is working at SK Telecom, at Korea Telecom. Top two data center company in Korea improve our lower TCO, lower CapEx and OpEx simultaneously.
John Furrier
>> Talk about the role Korea is playing in the global scale. You have investors. You can name them. I think they're big memory players. Talk about the role of Korea in the global AI infrastructure.
Sunghyun Park
>> Yes. Especially, in New York start to change here. Right now, transformer itself right now. At the heart of the ad item right now, transformer is very memory hungry architectures. It's all about the memory. For example, as you mentioned, Rebellion is strategically factored by SK Hynix and Samsung Foundry. That's the good thing, especially it's really, really weird supply chain right now. But the point is here, without the memory, without economics of the waivers from the foundry, not able to deliver the custom silicon at all. Why we need a custom silicon? It's cheaper. Lower cost of the token, but without such kind of collaboration with the memory and the foundry.
John Furrier
>> Well, it's cheaper, but also energy efficient.
Sunghyun Park
>> Exactly. .
John Furrier
>> You get a lot of integration with that chip. And more will be custom than less in the future.
Sunghyun Park
>> Yes. So, memory right now is not a commodity at all. Memory also need to be customized each customer. That's the future of memory. That's why Rebellion is focused on memory centric architecture, collaborating memory vendor, optimize our logic, consulate and design the memory and logic.
John Furrier
>> We were having a conversation on X and I think maybe LinkedIn about this. When you have these constraints, engineers like to work around constraints. Have there been some new engineering directions? Because we don't want to be constrained on waiting for memory. Are there new architectures emerging that you're seeing that are promising to help facilitate faster deployment during a constrained environment like memory?
Sunghyun Park
>> That's cool. Maybe that's one possibility of the new innovation right now. Then we need a new material from the scale-up, scale-out solution, also some innovation on the logic. That's I think a good starting point of the innovation, but not very good storage to memory vendor. But again, let's see what's going on.
John Furrier
>> Well, people who have good experience with servers and rack-scale will certainly can maybe make changes. I think there's opportunity there. We'll keep an eye on that. I wanted to get that out there. Because we see entrepreneurs all the time. That's what they do. They'll innovate. What's the constraint? I'll get around it. All right. Let's talk about your business. How's that going? Share your momentum, because I think it's important to point to. You're starting to see the global distributed computing network. It's global in scale. Sovereignty is a big discussion both on cloud, infrastructure and AI. Talk about some of the momentum. How's the business going?
Sunghyun Park
>> Sure. Basically, we're a Korea company. We have quarter in Korea. We spent five years in Korea, collaborating SK Hynix and Samsung Foundry to deliver the memory centric architecture. Right now, we have commercialized our product eventually. Rebel software and the pub, in SK Telecom and Korea Telecom, top two data center company. Korea is not a thesis. It's not just a paper research. It's commercialized. Deployed more than 100 of the server racks.
John Furrier
>> You have product commercialized now, RebelServer?
Sunghyun Park
>> Yes, exactly. Right now, we are serving actual end users, commercialized... It's SK Telecom here. Up to 50 million API core per day using our chip. It's more than-
John Furrier
>> It's battle tested.
Sunghyun Park
>> Yes. Based on the reference, we like spend our business globally based on telco business, enterprise and surveying AI. A lot of definition of surveying AI, however, from higher perspective, it's telegenius company platform where Nvidia and non-Nvidia co-EPGs, depending on the recurrent. Also, SurveyNet doesn't want to lock in by Nvidia. So, Nvidia and Nvidia, TSMC and TMC, all company platform co-exist in order to avoid any supply chain issue in the geopolitical landscape.
John Furrier
>> Yeah. Interoperability is a concern. Open systems is another big one. What's your thoughts on and perspective on open?
Sunghyun Park
>> Open, especially in software size, a little bit sacrifice of performance optimization. We are fully relying on open source softwares ecosystem, like PyTorch, torch.compound and DRM and the hogging phase, instead of the building our proper software stack. That's the future of our direction and our second generation AI. That's exactly same as Linux moment in CPU. Why Intel beats IBM 20 years ago? Because of the Linux open source ecosystem. Exactly the same thing happened in inference era. DRM, PyTorch, and hogging phase. We are focused on the open source ecosystem.
John Furrier
>> Well, congratulations on the great success on the server. Rack-scale, you mentioned is coming soon. How's that demand curve look for you? Because right now, you're looking up and to the right and you haven't even shipped your rack-scale system yet. What's your thoughts on how the business will unfold? Assuming some constraints, but that'll get fixed quickly. Sooner than later, we hope, but it's happening. What's your view of the future?
Sunghyun Park
>> I hope to demand wise, as a CEO of a AI chip company, demand is huge. "Okay, Sunghyun, bring your chip. I don't need to evaluate chip. Just bring your chip, as long as you can supply your chip for a long time." But the current issue right now, as you mentioned, is supply chain. How to manage all the supply chain, not just DRM, not just HGPM. Even the PMIC chip, everything is struggling right now. But assuming that the supply chain is going down maybe in next two years or something, the real game is right now, how to efficiently map all kinds of inference workload at given the single platform.
John Furrier
>> My final question for you is around engineering and developing, getting close to the systems. Again, density is the feature. Tightly coupled interoperable systems, low latency, get that data fast as possible in the place it needs to be. We hear that all the time. What are developers and architects thinking about? The ones who are going low level coding, what are they thinking about? How do you guys help them? Is that something that you're doing a lot of work around? How do people build and integrate in Rebellions?
Sunghyun Park
>> Yes. Basically, that's why I'm talking about the open source sequencing. Even low level program wise, we open all entire low level program, like something like the CUDA eventually. That's why all our developers optimize our chip. Even not just the chipset, not just the PChark, as you mentioned. Our rack-scale solution is our single product. So, they always all partners. We can't do anything by ourself. We need a partner for optics, also need us some suite altogether, optimized together at the rack-scale solution. That's the vision of our Rebellion.
John Furrier
>> All right. Well, I got you here since you're such an expert. I have to ask you. I'm curious. What's your view on scale-up and scale-out? How much work can be done to make them better? I see scale-up. I mentioned some of the switches. I see opportunities to put more in a rack, scale across. I mean, scale-out, obviously connecting racks. Some people have multi-rack systems, some want one rack. What's your view on scale-up and scale-out? Is there more work to do?
Sunghyun Park
>> Basically, it's my personal opinion. It's just the beginning of our research entire domain here. Everybody talks about caving cache. But as Pfizer, there is no defector standard solution yet, even in the software side. First, we need some solution of the software side. It's going to be hardwired into hardware. And then we need a system level solution. Right now, the input and Apple SQL is pretty long from the Mini Mac 3.0. There is no defector standout at all. First, we need to find a software solution. Then go to hard wiring the hardware, eventually close to the optimization, upticks, and all the material into the single part, then do actual system-level optimization. From this perspective, we don't yet start anything. It's just the beginning of that journey.
John Furrier
>> Yeah. And there's a lot more to change. I mean, that's just on custom silicon, getting things on the chip-
Sunghyun Park
>> Exactly....
John Furrier
>> as tight as possible.
Sunghyun Park
>> Exactly. But combined with Asian AI, there are a lot of CP workload. Things changed a lot, going to be changed a lot. That's my view.
John Furrier
>> We're just at the beginning.
Sunghyun Park
>> Just beginning of beginning, combined with Asian AI. .
John Furrier
>> I love when you hear words like there's no standard yet on one of the most important things in these clusters is the networking.
Sunghyun Park
>> Yes. Yes.
John Furrier
>> Thank you so much. Congratulations on your success. A lot more going on. Obviously, the supply chain will get fixed. rack-scale era is upon us. Thanks for coming on theCUBE and NYSE Wired.
Sunghyun Park
>> Thank you.
John Furrier
>> All right. I'm John for our AI factory series, talk to the leaders, because like we heard, this is just the beginning. Things are going to get smaller to the system level. Software standards will emerge out in open source. All this will enable more and more change, faster agents, physical AI. The headroom is limitless in society, of course. We're doing our part here in theCUBE to bring that to you. I'm John Furrier, the host of theCUBE. Thanks for watching.
>> Palo Alto Studio Connections, Silicon Valley and Wall Street. I'm John Furrier here, talking to you here with Dave Vellante, my co-host. Hello, I'm John Furrier, host of theCUBE here in theCUBE's NYSE studio. Of course, we have our Palo Alto studio, connecting Silicon Valley and Wall Street. This is our AI factory series where we interview the leaders who are building out the AI infrastructure that's enabling all the innovation and societal change and of course, economic advantages across the globe. It's a global opportunity. And we're here talking about purpose-built silicon. Sunghyun Park, Co-Founder and CEO of Rebellions here, a hot company, building large scale systems and deploying them and providing the products that we need to be successful. Thanks for coming on theCUBE, on our AI factory series.
Sunghyun Park
>> Thank you for inviting me here. It's historical place hear. Yeah.
John Furrier
>> The AI infrastructure is continuing to grow. The demand we're seeing build out, obviously we're seeing some of the supply chain challenges, but that's not going to stop the momentum. And it's changed. So, my first question is, what's been the big changeover that's made you successful in this market? Has it been the paradigm shift to large scale systems? Has it just been AI in general? What's been the big driver for your success?
Sunghyun Park
>> Right now, so one word, it's inference. Right now, market is from training to inference. And in AI perspective right now, moving forward from occasional AI to always on AI, real-time AI, and also running at scale, that inference billions of per scale per day. That's the key difference. Training wise, wall tops, interconnect is much more important. However, in inference, efficiency, especially cost and energy efficiency matters most.
John Furrier
>> Yeah. A lot of inference been going around. I want to get into some of the systems that you have and some of your strategy you guys have been executing on. But first, you guys have a purpose-built silicon that's the engine of the AI factory. Explain the purpose-built chip definition and the role and the importance of the role of purpose-built silicon.
Sunghyun Park
>> Basically, compared to general purpose GPU, for example, GPGPU, general purple GPUs. Our custom silicon and purpose build is 100% optimized a single workload. For example, our Rebellion case is its kind of inference. In terms of the caving cache, it's not a software solution at all. Right now, we are highly focused on memory centric architecture. It's 100% optimized for inference workload that include the caving cache here. Also, in terms of enable scale-up and scale-out, we need a specific hardware primitive. That kind of primitive also hardwired inside chip. We call that purpose build, not general purpose. So, compared to the general purpose, we a little bit compromised of sacrifice the flexibility functional diversity. However, in terms of intelligence per watts, per dollars, we can offer a way higher efficiency.
John Furrier
>> I love the AI factories because it's like everyone's cheering for you back here. They love that.
Sunghyun Park
>> What's going on right there? Yeah.
John Furrier
>> Big trades are going down. A lot of them heard our AI factory narrative. But the AI factory is fascinating because you have almost an inverse of the technical architecture mindset of the old generation. You build a server, everything's on the board. Now you have density. You have a lot of chips working in concert. It's a collection of systems operating as one system. So, you can almost say an AI factory is a super server. It's a super computer. What does that mean for architecturally for inference? Because we saw the training, throw everything at it, train everything. Now you have inference, which is much more broader opportunity, bigger opportunity.
Sunghyun Park
>> Exactly.
John Furrier
>> It's different use cases. How does the architecture need to be in place for inference? Is there a rule of thumb? Is there a general principle? Take us through the thinking around how to architect for an inference world.
Sunghyun Park
>> Exactly. From the market perspective, we have totally different physics and tokenomics. In training, the performance is much more important than any other metric. However, in inference, as long as you can meet some minimum requirements, such as SLO, minimum first token generation time, the token per second, nobody care about performance. They care the performance per watts, per dollars, because inference right now is being commoditized. Efficiency matter most compared to training wise. Training, basically, we have biggest, biggest part interconnect all the chip using the . However, in inference, we little bit different, little bit sacrifice. We compromise a little bit. However, efficiency is the most, most important metric in the inference era.
John Furrier
>> You mentioned on the intro, you talk about energy.
Sunghyun Park
>> Yes.
John Furrier
>> Energy is on everyone's mind. The entire scale is bounded by energy. How does the energy equation change when you start to look at the diversity of deployments? You're going to have the big mega centers. You're going to have maybe smaller footprints still doing maybe 10 kilowatts. Then you're going to have smaller footprints. You're going to need to do inference everywhere.
Sunghyun Park
>> Exactly. That's the key point. Basically, we call it edge data center and the backbone data, something like the gigawatts or something. Right now, most of the AI inference running at the data center. But in the future, near future, some of the edge data center and some of the on device, depending on the model size, the parameter size, also some system requirement. If you're running very latency critical application, you don't need to go to backbone data center, just edge data center on device, good enough. However, depending on the system recover and the model size, it diversify all the data center, depending on ... That's why inference is totally different physics and tokenomics.
John Furrier
>> Yeah. I mean, there's so many things that jump out. First, we talked about data. Got to get the data close to the processing units. That's key. Memory. Energy, always the discussion. Now, the new conversation, and this has been in the inner circles of the leaders, but now it's going mainstream, is the role of networking.
Sunghyun Park
>> Exactly.
John Furrier
>> What's your vision on how the importance of networking... Because when you add energy, moving things around requires energy. Data is being moved around. Networking seems to be something that's not talked about a lot. What's your vision and role or your view of the role of networking?
Sunghyun Park
>> Yes. Network is super, super, super important, but sometimes ignored because too much attack it right now. Somebody talk about the . So, scale-up, scale-out, even between the data center, not just the rack to rack, not just chip to chip. We definitely need such kind of networking. That's why everybody talks about photonics and optical. It's not a future. We almost, almost there in terms of commercialization. So, we need to get some low latency and the higher throughput, even lower energy to communicate the data, even at energy.
John Furrier
>> I ask dumb questions at these events. I look at the big Nvidia rack and I say, wow, that's a lot of product in there. Why so many switches? If you look at the number of GPUs, the chips, the switches take up a large portion of the racks. You guys are in the rack-scale business with Rebel. You have a lot of developers. Just what's your vision on how the racks will change? Because again, we're going to need to squeeze as much energy and footprint out of the racks and their deployment.
Sunghyun Park
>> Yeah. Basically, as mentioned, we are preparing the rack-scale solution, not just server, not just a please check, but why? We need the system level optimization, costly optimization. As you mentioned, a lot of Sweden storage, a lot of optical interconnect right there. So, it's not just the memory pool. We need some new memory architecture combined with the networking. So, in order to deliver lowest cost and lowest energy consumption, given the normalized throughput, the token throughput, we definitely costly optimization at the rack-scale. NVL72 from Nvidia is one example right there. We also need internet based standard. Also, each layer must be customized, costly customization. That's why we need to prepare rack-scale solution, not just .
John Furrier
>> And you're working on that now?
Sunghyun Park
>> Exactly. Yes. Right now, our rack-scale solution is working at SK Telecom, at Korea Telecom. Top two data center company in Korea improve our lower TCO, lower CapEx and OpEx simultaneously.
John Furrier
>> Talk about the role Korea is playing in the global scale. You have investors. You can name them. I think they're big memory players. Talk about the role of Korea in the global AI infrastructure.
Sunghyun Park
>> Yes. Especially, in New York start to change here. Right now, transformer itself right now. At the heart of the ad item right now, transformer is very memory hungry architectures. It's all about the memory. For example, as you mentioned, Rebellion is strategically factored by SK Hynix and Samsung Foundry. That's the good thing, especially it's really, really weird supply chain right now. But the point is here, without the memory, without economics of the waivers from the foundry, not able to deliver the custom silicon at all. Why we need a custom silicon? It's cheaper. Lower cost of the token, but without such kind of collaboration with the memory and the foundry.
John Furrier
>> Well, it's cheaper, but also energy efficient.
Sunghyun Park
>> Exactly. .
John Furrier
>> You get a lot of integration with that chip. And more will be custom than less in the future.
Sunghyun Park
>> Yes. So, memory right now is not a commodity at all. Memory also need to be customized each customer. That's the future of memory. That's why Rebellion is focused on memory centric architecture, collaborating memory vendor, optimize our logic, consulate and design the memory and logic.
John Furrier
>> We were having a conversation on X and I think maybe LinkedIn about this. When you have these constraints, engineers like to work around constraints. Have there been some new engineering directions? Because we don't want to be constrained on waiting for memory. Are there new architectures emerging that you're seeing that are promising to help facilitate faster deployment during a constrained environment like memory?
Sunghyun Park
>> That's cool. Maybe that's one possibility of the new innovation right now. Then we need a new material from the scale-up, scale-out solution, also some innovation on the logic. That's I think a good starting point of the innovation, but not very good storage to memory vendor. But again, let's see what's going on.
John Furrier
>> Well, people who have good experience with servers and rack-scale will certainly can maybe make changes. I think there's opportunity there. We'll keep an eye on that. I wanted to get that out there. Because we see entrepreneurs all the time. That's what they do. They'll innovate. What's the constraint? I'll get around it. All right. Let's talk about your business. How's that going? Share your momentum, because I think it's important to point to. You're starting to see the global distributed computing network. It's global in scale. Sovereignty is a big discussion both on cloud, infrastructure and AI. Talk about some of the momentum. How's the business going?
Sunghyun Park
>> Sure. Basically, we're a Korea company. We have quarter in Korea. We spent five years in Korea, collaborating SK Hynix and Samsung Foundry to deliver the memory centric architecture. Right now, we have commercialized our product eventually. Rebel software and the pub, in SK Telecom and Korea Telecom, top two data center company. Korea is not a thesis. It's not just a paper research. It's commercialized. Deployed more than 100 of the server racks.
John Furrier
>> You have product commercialized now, RebelServer?
Sunghyun Park
>> Yes, exactly. Right now, we are serving actual end users, commercialized... It's SK Telecom here. Up to 50 million API core per day using our chip. It's more than-
John Furrier
>> It's battle tested.
Sunghyun Park
>> Yes. Based on the reference, we like spend our business globally based on telco business, enterprise and surveying AI. A lot of definition of surveying AI, however, from higher perspective, it's telegenius company platform where Nvidia and non-Nvidia co-EPGs, depending on the recurrent. Also, SurveyNet doesn't want to lock in by Nvidia. So, Nvidia and Nvidia, TSMC and TMC, all company platform co-exist in order to avoid any supply chain issue in the geopolitical landscape.
John Furrier
>> Yeah. Interoperability is a concern. Open systems is another big one. What's your thoughts on and perspective on open?
Sunghyun Park
>> Open, especially in software size, a little bit sacrifice of performance optimization. We are fully relying on open source softwares ecosystem, like PyTorch, torch.compound and DRM and the hogging phase, instead of the building our proper software stack. That's the future of our direction and our second generation AI. That's exactly same as Linux moment in CPU. Why Intel beats IBM 20 years ago? Because of the Linux open source ecosystem. Exactly the same thing happened in inference era. DRM, PyTorch, and hogging phase. We are focused on the open source ecosystem.
John Furrier
>> Well, congratulations on the great success on the server. Rack-scale, you mentioned is coming soon. How's that demand curve look for you? Because right now, you're looking up and to the right and you haven't even shipped your rack-scale system yet. What's your thoughts on how the business will unfold? Assuming some constraints, but that'll get fixed quickly. Sooner than later, we hope, but it's happening. What's your view of the future?
Sunghyun Park
>> I hope to demand wise, as a CEO of a AI chip company, demand is huge. "Okay, Sunghyun, bring your chip. I don't need to evaluate chip. Just bring your chip, as long as you can supply your chip for a long time." But the current issue right now, as you mentioned, is supply chain. How to manage all the supply chain, not just DRM, not just HGPM. Even the PMIC chip, everything is struggling right now. But assuming that the supply chain is going down maybe in next two years or something, the real game is right now, how to efficiently map all kinds of inference workload at given the single platform.
John Furrier
>> My final question for you is around engineering and developing, getting close to the systems. Again, density is the feature. Tightly coupled interoperable systems, low latency, get that data fast as possible in the place it needs to be. We hear that all the time. What are developers and architects thinking about? The ones who are going low level coding, what are they thinking about? How do you guys help them? Is that something that you're doing a lot of work around? How do people build and integrate in Rebellions?
Sunghyun Park
>> Yes. Basically, that's why I'm talking about the open source sequencing. Even low level program wise, we open all entire low level program, like something like the CUDA eventually. That's why all our developers optimize our chip. Even not just the chipset, not just the PChark, as you mentioned. Our rack-scale solution is our single product. So, they always all partners. We can't do anything by ourself. We need a partner for optics, also need us some suite altogether, optimized together at the rack-scale solution. That's the vision of our Rebellion.
John Furrier
>> All right. Well, I got you here since you're such an expert. I have to ask you. I'm curious. What's your view on scale-up and scale-out? How much work can be done to make them better? I see scale-up. I mentioned some of the switches. I see opportunities to put more in a rack, scale across. I mean, scale-out, obviously connecting racks. Some people have multi-rack systems, some want one rack. What's your view on scale-up and scale-out? Is there more work to do?
Sunghyun Park
>> Basically, it's my personal opinion. It's just the beginning of our research entire domain here. Everybody talks about caving cache. But as Pfizer, there is no defector standard solution yet, even in the software side. First, we need some solution of the software side. It's going to be hardwired into hardware. And then we need a system level solution. Right now, the input and Apple SQL is pretty long from the Mini Mac 3.0. There is no defector standout at all. First, we need to find a software solution. Then go to hard wiring the hardware, eventually close to the optimization, upticks, and all the material into the single part, then do actual system-level optimization. From this perspective, we don't yet start anything. It's just the beginning of that journey.
John Furrier
>> Yeah. And there's a lot more to change. I mean, that's just on custom silicon, getting things on the chip-
Sunghyun Park
>> Exactly....
John Furrier
>> as tight as possible.
Sunghyun Park
>> Exactly. But combined with Asian AI, there are a lot of CP workload. Things changed a lot, going to be changed a lot. That's my view.
John Furrier
>> We're just at the beginning.
Sunghyun Park
>> Just beginning of beginning, combined with Asian AI. .
John Furrier
>> I love when you hear words like there's no standard yet on one of the most important things in these clusters is the networking.
Sunghyun Park
>> Yes. Yes.
John Furrier
>> Thank you so much. Congratulations on your success. A lot more going on. Obviously, the supply chain will get fixed. rack-scale era is upon us. Thanks for coming on theCUBE and NYSE Wired.
Sunghyun Park
>> Thank you.
John Furrier
>> All right. I'm John for our AI factory series, talk to the leaders, because like we heard, this is just the beginning. Things are going to get smaller to the system level. Software standards will emerge out in open source. All this will enable more and more change, faster agents, physical AI. The headroom is limitless in society, of course. We're doing our part here in theCUBE to bring that to you. I'm John Furrier, the host of theCUBE. Thanks for watching.