We just sent you a verification email. Please verify your account to gain access to
theCUBE + NYSE Wired: Robotics & AI Media Week. If you don’t think you received an email check your
spam folder.
Sign in to theCUBE + NYSE Wired: Robotics & AI Media Week.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For theCUBE + NYSE Wired: Robotics & AI Media Week
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for theCUBE + NYSE Wired: Robotics & AI Media Week.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
theCUBE + NYSE Wired: Robotics & AI Media Week. If you don’t think you received an email check your
spam folder.
Sign in to theCUBE + NYSE Wired: Robotics & AI Media Week.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to theCUBE + NYSE Wired: Robotics & AI Media Week
Please sign in with LinkedIn to continue to theCUBE + NYSE Wired: Robotics & AI Media Week. Signing in with LinkedIn ensures a professional environment.
Exploring the Role of Graph Databases in AI Advancement
Jim Webber, the chief scientist at Neo4j, engages with John Furrier of theCUBE during the NYSE Wired Robotics and AI Media Week. This insightful discussion delves into the innovative application of graph databases in the realm of artificial intelligence and highlights key perspectives from experts within the field.
Webber shares their extensive experience with graph databases, emphasizing their pivotal role in AI application development. According to Webber, graph databases provide the ...Read more
exploreKeep Exploring
What are the advantages of using graphs over other types of databases in terms of compute?add
What are some examples of organizations that have built knowledge graphs of lessons learned and institutional knowledge?add
What approach is being taken by the team at NEO4j to build a completely novel database architecture?add
What are you looking forward to in terms of AI infrastructure advancements and what do you think the infrastructure hardware guys need to work on?add
>> Hello, welcome to theCUBE
here in our New York City NYSE >> New York Stock Exchange Studio. I'm John Furrier, host
of theCUBE. It's our East Coast location. This is where our access
point is on the East Coast. Of course, we've got Silicon
Valley and Palo Alto. We've got East and West connected, tech and Wall Street together. We've got a great week this week. The NYSE Wired community and theCUBE together doing a robotics and AI Media Week
all the leaders in AI. A lot of great action happening. Jim Webber is here,
chief scientist at Neo4j, a company we've covered
extensively on SiliconANGLE. com as well as here on theCUBE. Jim, thanks for coming
on remotely from the UK. >> Absolutely. My pleasure,
John. Thank you for inviting me. >> So, obviously, people know
SiliconANGLE and theCUBE. They know we cover extensively the Neo4j and, of course, the graph databases, which really is powering
this next generation of AI applications. Of course, the underlying
infrastructure behind it, data layers are super important. A big part of the
innovation cycle is the data and how data is structured and
graphs are a big part of it. You guys are doing that and leading that. And so, as the chief scientist, I got to first ask you, are you not surprised? Of course. I mean, you got to be pretty excited right
now about what's going on with graph and all the innovation. >> I think, not to be
hyperbolic, I'm not surprised. I've been plugging away
at graphs for a long time. And I think, like yourself,
I've realized the utility of graphs both for kind of
traditional what we think of as kind of database workloads
and analytic workloads, but more recently, with the emergence of particularly generative AI
using graphs as an underlay to provide extremely high-quality context. And then so that sort
of accuracy hasn't come as a surprise, and that's not... definitely not meant to be
egotistical or anything. I just think graphs are amazing. When people find them, they just get put to such good powerful uses that all of this seems just much more pleasing than surprising in a way. I think where I will be
surprised, yeah, I think where subsequent generations take this because I think I've reached
as just a computer scientist, not an AI wizard, I think
I've reached the limits of my current knowledge, and
I'm really excited to see what other people do with this stuff now. >> Yeah, and what's exciting is, too, we were chronicalizing this
on theCUBE about a decade ago. We saw kind of the shift
around multiple databases. We've heard the phrase, "No
database rules the world. " Obviously, now diversity of databases because the environments changed. You have different
architectures rolling out. But specifically generative
AI is very much graph central because things are generative. Things have to be structured
differently to get the latency, to get the security and also the context because graphs provide
that great pathways. And most people think of neural
pathways, neural nets, yeah, they could vector embeds on one side, but at the end of the day, you're starting to see graphs dominate. Could you share your vision on how people are using that today? Because the hype is high on GenAI. OpenAI just closed a 40
billion dollar financing. It's pretty massive. I
mean, just every day, you're seeing more and
more activity around that. So that's a big funding round. So that means there's more work going on, there's more investments. How are people... And there's enthusiasm is high and hype as well. But where's the confidence levels now? Could you share your vision
on where graphs are being used and kind of where they extend to? >> Yeah, of course. I mean, look, if you... you mentioned about the diversity of data. And this has been, I realize now, looking at the screen
here, that I'm an old man. I've got bits of gray in my beard and my hair, which is shocking. When I started this, it felt
like we were on the cusp of something really interesting in data. A decade and a half, two
decades ago, data started to become really prominent
with this idea of NoSQL and then big data. And that's kind of rolled in, I think, to today's wave in terms of AI. But I really think there's
a couple of things going on broadly in the use of AI. And it's really a tale
of kind of two cities. There are a bunch of folks out there who are really pushing
the envelope with AI and they are very keen to
take leading technology and push that into production
to get value from it. And according to IBM, that's about 30% of folks working with AI. And then there's the
other 70% of people in AI who are probably just
as keen to experiment and get value from this, but who currently aren't
getting that value. They're not getting dependable systems that they can push into
production and get value from. And that's a huge
difference, right, the kind of 70/30 split. And by no means all, but certainly, I think some of the reason that folks are getting
value from their AI is because they're able to
improve its accuracy. I converse with various AIs
on a daily basis in my role as a computer scientist. And a lot of the times, those interactions are very valuable. I kind of get very good rubber duck kind of response from a lot of AIs, but sometimes they absolutely tell fibs. So I asked... a few weeks
ago, I asked one of the famous AIs to summarize a paper that I know well and it actually gave me a
180-degree wrong answer. So it sort of makes you think,
right. I mean, if I wasn't, you know, sort of very
familiar with that literature, I would have believed it as well. I mean, I coined the term last year, the Boris Johnson problem,
which is something that's expensively trained,
speaks very confidently, and doesn't know when it's
lying which is in reference, of course, to the former
British Prime Minister Boris Johnson, who was caught out, shall we say. >> Yeah. - Now, I think one
of the ways that we can try >> to solve that problem, and
if you look at organizations, for example, like Microsoft,
who are doing well at this, is to provide better up-to-date context so that our LLMs can produce better answers. And right now, it seems that
the best way of doing that is by providing a context of a
knowledge graph, which is chock- full of semantically rich information, business semantically rich
information, your inventory, your customers, your
products, your contracts, your invoices, all of that kind of stuff might be thrown in there. But importantly, all of that
stuff is connected together. So compared to the previous
attempts at injecting context into LLMs, which was done
by vectorization, you'd say, "Are these two records near
to each other," which is kind of high school level trigonometry. Now we can actually do
something a bit richer, and we say, "Look, how are
these things connected? Are they in the same neighborhood? How far away are they
from each other in terms of hops in the graph? "
And it turns out by using that structure, using the connections or the associativity in the
data as well as the data itself, you get much, much better context and much, much higher accuracy from this. So it seems to me, and I
probably wouldn't have been brave enough to say this a year ago, but it seems to be now that all
RAG is tending towards Graph RAG and that Graph RAG is kind of eating this whole ecosystem. >> Yeah, I mean, that's a great point. I mean, in the computer
science side of it, vector embeds gets everyone's attention because it does really good search. Again, to your point, not 100% accurate, but it's better than keyword search. It's math-based. So as
you see graphs emerge, can you share specifically
what graphs are bringing to the table on that next step? Because RAG is certainly, people can see the benefits
immediately better search, great for data. >> Yeah. - It makes more compute
available and more storage. >> That's good for everyone in the industry. But where do the graphs shine? Can you... Are there use cases now, and
what's the... where does it go? >> Yeah, I mean, if you
think about vectors, not to at all put the boot into vectors because they're extremely useful, but vectors kind of give
you points in a vector space, and that's useful. And then we can kind of
noodle on that a little bit and think, "Well, what can I... how can I semantically
reason about these vectors? "
And I hear people say, "Well, for example, if you've got the vector for king and you minus the vector for man, and you add the vector for woman, you get the vector for queen. " And I think that's not
quite convincing to me. I think that depends an
awful lot on the kind of embedding function that you're using. It really probably isn't the case. Really in a vector space, all we've got is that two points are near to each other. So if I ran approximate
nearest neighbors, the kind of famous ANN algorithm
to find my context tokens for a RAG arrangement,
I get nearby tokens. And not to labor the example too much, but how do I know when I look for Apple that I'm not getting apple, the fruit, Apple, the tech company. >> Yep.
- I'm here in London, England, Apple, >> the music company, if you're old enough to remember them because vectors... there's no guarantee that
vectors will give you that in that kind of sub-symbolic relationship. Meanwhile, over here in graph world, that stuff's very clear, right. Because apple, the fruit will
be connected into a sub-graph that's very much about
farming and agriculture and fruits and all that stuff. Maybe its neighbors are lemons or oranges and they're connected in the
graph explicitly in a way. Meanwhile, over here, Apple,
the tech company, is going to be connected into place like Palo Alto and connected with its founders and it's going to be
connected with its products. And Apple, the music company, again, will be connected explicitly
with perhaps the Beatles and the albums they produced. All of this stuff becomes
a lot clearer when you stop messing around with big long numbers and start thinking about
symbolic representations of your data, which are
both transparent for me, the engineer, the database
person, as well as my... the people I have to do
compliance and audit with. And very straightforward
for an AI to work with. >> Yeah, the computer
science behind that's great. You're really talking
about context, right. Apple's a fruit, and so if
there's words around it, cues, if you will, this is kind of what ontologies couldn't
crack the code on back in the 90s, if you remember. And this as you look at now, okay, you got scale, and you got
compute, what's the impact? Because, obviously, that makes total sense because the graphs will have
all this extra information around Apple or whatever the word is or whatever the context is. What's the... How does that
factor into some of the compute? Because now we're
hearing things like first token out on inference. Okay, I can reason vectors. I get that. Check that's really elementary. But when you start getting into
really narrow path accuracy, context is important, but computes factor. The underlying infrastructure
has to support it. Is there advantages on
graphs besides the context? Does it have a compute benefit? >> I think it has a... I think, and this is slightly kind of informed opinion rather
than established facts, but I think because graphs
tend to be very quick to query and very accurate, again, because of that symbolic
representation, they tend to be quite lightweight
in terms of their compute. So in a given interaction with an LLM, you can actually explore an awful lot of your knowledge graph
and really do a good job of picking out the best connected context to then inject into the
interaction with your LLM. If you're going to compare that
to other kinds of databases in terms of compute, they
tend to be more heavyweight. For example, relational databases have to do quite heavyweight joins. Whereas in graph world, we're just following those relationships, which in our implementation
is just pointers. It turns out that
pointers are the one thing that modern computers, that all digital computers are good at. They're terrible at everything else, but in terms of following
pointers around the network, they're really fast at that. So, we can do a really fast,
really high-quality traversal through a graph because
it's just brilliant context to inject into the LLM, and it raises the quality of the overall system. >> That's a great point on the
pointers there. Pun intended. I got to ask you, let's go
hypothetical for a second. Let's just say, for
instance, I'm a customer. I want to implement a
graph database on theCUBE. I got all these interviews. I
want to put them in a graph. I got security videos, I
got all kinds of context, or I'm a big corporation. What does the deployment look
like? How do you guys engage? How do people deploy and manage graphs if they
already got all the SQL-NoSQL, unstructured data laying around? >> Well, you'd be following a reasonably well-trodden path, John. Thankfully, it turns out that you're not alone in
thinking about building knowledge graphs of the lessons learned and the institutional
knowledge that you have there. I've seen other companies do this. I've seen NASA do this in the US. I've seen militaries do
this around the world because people want to learn and improve. I've even seen universities
do this in terms of a citation graph and people who worked with people who wrote papers in subjects and so on. So the data model is naturally graphic. But of course if you've
got a bunch of data that's already existing and it's spread around your existing ecosystem, then you've got a couple of choices. One choice which is very
non-invasive, would be to use a graph kind of as a fancy index. So you leave the data where it is, but you create a curation
layer on top of that data and you use the graph and graph queries to be able to explore into it very quickly. And then the leaf nodes
of those graphs point to existing records in
your document database or a row in your relational
database or so on. And, of course, the flip side of that would be a kind of rip and replace. You decide that this
knowledge graph is so valuable that you want to host some or all of that data directly
within the graph itself because you intend for it
to be both a transaction and analytical workhorse. And you're going to,
of course, in this day and age, use the knowledge in that graph to feed the interactions with your LLMs. >> Awesome. Jim, the five
minutes we have left, I want to talk about what you're working on. What are the cool things you're
working on from a computer scientist perspective? What's getting you excited?
Where's the cutting edge? What's next on your plate?
What are you optimizing for? >> Well, I'm a full-tolerance
person by training, but actually the thing that
I'm most excited about in my team here at NEO4j is we're
looking at completely novel ways of building databases. So, for example, if
you look at Neo4j today and you look at, say, other
good databases like MongoDB or Oracle or those kind of things, and you squint it a little bit, you'll find they're
basically built from the same kind of patterns. It's because we've all read
the same academic papers on how to build databases. So we have a query planner
optimizer, we move tables around, would you believe, internally
and that kind of stuff. But at Neo4j research,
we're looking at taking programming languages as our inspiration because programming languages sort of have similar things going on. You take strings in at the top, and you produce executables
out of the bottom. But instead of using
planners and optimizers and all that kind of stuff, we
have compilers and runtimes. So one of the things I'm
very excited about is that we're looking at how to build a completely
novel database architecture based on effectively leaning on compilers to do the equivalent of planning- replanning query optimization. And we're in the early stages of that. It's a UK government-funded project. They gave us the equivalent
of a couple of million bucks to figure out this problem. And our early work is
indicating that we might be able to build something that's
an order of magnitude or maybe even a few orders of magnitude faster than the traditional way of building things. Still a few years to go in
the actual research work, but we hope to make a
significant impact down the line. >> Final question for you on the AI infrastructure advancements. We're seeing a lot of semiconductor work. Nvidia just had GTC, AI
factors being talked about, so there's a lot of work going on below you guys in terms of performance. What are you excited about? What's ready for prime time, as they say? What's the goodness going on? And where's the work areas
that the infrastructure hardware guys have to work on?
What would you say to that? >> That's a very fair question, >> and it actually ties into the
thing I'm most excited about because in our world now that
we lead on compilers to do all of the hard work for us, what I want from the
infrastructure guys is simply good compilers for all of their
new fancy electronics. And then the system that
we're researching will be immediately able to take advantage of any of the novel exciting
silicon that comes out, becomes a very straightforward
proposition to dispatch to a fancy GPU or a fancy FPGA, or even if the cost is
right, a plain old CPU. And we're looking forward to
being able to have that multi- silicon capability in NEO4j. >> Well, being a fault-tolerant
guy that you are, you got the coherency challenges. I got a compiler for the
compiler, I got AI for AI. I mean, is there a complexity
loop there that you see, or is that not an issue at all? >> I'm hoping that's not
an issue at all, I think. In the later stages, I hope to see that the AIs will actually help us to develop particularly some of our efficient intermediate formats to help feed these compilers. So there's a lot of cool kind
of nerdy stuff to be done. I'm very much looking
forward to solving it. >> Great to have you on.
Good point there. I love it. I mean, I think AI will do that. You're seeing already code
assistants being adopted, agents with authority and agency to do things. I mean, you just learn the compiler and speak compiler to each other, right. >> Absolutely.
- I mean, that's what's happening. >> Jim, thank you so much for coming in. >> I know it's a different
time zone over there. Appreciate you taking the time to come in to theCUBE here in our new
NYSE studio here, our build- out on the East Coast, our access point where we're kind of
building a network here. I've obviously got a
network effect going on, and we might have to put some graphs there around those date... around those nodes.
So, thanks for sharing. >> Sure thing. Thanks for having me, John. >> All right. I'm John
Furrier here at theCUBE. >> We are at the NYSE... theCUBE at the NYSE Wired
having ongoing programs around the key topics. This week, it's robotics. It's AI leaders who are doing the work, who are plowing the
fields, blazing the trails for this next generation,
infrastructure data, data platforms, databases, all
kinds of activity happening that will flip the script on AI and hopefully usher in robotics and all the goodness that AI will enable. Thanks for watching.