Uma Roy on Succinct, STARKs, and zkVM architecture

October 2024

Listen

Spotify |

Apple Podcasts |

YouTube |

Show notes

This is my conversation with Uma Roy, cofounder and CEO of Succinct.

Timestamps:

(00:00:00) intro
(00:00:57) origin story
(00:02:19) SP1 architecture
(00:09:43) STARKs, FRI, and hash-based cryptography
(00:15:09) recursion
(00:21:12) upgrading the proof system
(00:33:11) sponsor: Splits
(00:33:54) security in ZK systems
(00:37:46) converting optimistic rollups into zk rollups
(00:43:39) zkVM vs custom circuits
(00:48:48) ZK for scaling and interoperability
(01:00:24) the lifecycle of a proof
(01:06:26) hardware
(01:10:57) outro

Links:

Transcript

Sina: So, as a first question, I’m curious how you ended up working on Succinct. If you think back to that period in time, were there particular moments where you had that “aha” moment of, this is what I want to focus on?

Uma: Yeah, so I’ve always been into math. I remember the founder of Zcash, Park, who I believe you’ve had on your show—Mr. Zooko Wilcox, right? He’s a very good friend of mine, and he was telling me about zero-knowledge proofs and how you can prove a statement without revealing all the inputs to that statement. I thought that was incredibly magical. Then I dug into the math, reading all this material on the internet, and I found it super cool.

One day, I realized that ZK is not only technically fascinating with all this math that I love, but it also has a huge impact on crypto. It feels like the future of scaling blockchains depends on ZK. The combination of those two things—technical interest and real-world impact—was very powerful and got me really into the space.

Sina: I’d love to dive into SP1 and what you’re building at a deeper level of abstraction than one might usually go. This could be an opportunity to bring in some of the math without getting us lost in the details. There are people who might use SP1 as a library, but also folks who are generally interested in ZK and have been interfacing with it more as a black box. They want more of an intuition about what’s actually going on, the ground truth. So, this is a big question, but can you help me understand how SP1 works? What’s happening under the hood?

Uma: Yeah, I think at its core, it’s pretty simple in some ways. Implementing it is very difficult and complex, but what’s going on isn’t so hard to understand. Maybe it’s helpful to talk about what ZK used to be like and then what ZK is like with SP1.

Previously, when you wanted to do ZK, what does it let you do? It lets you prove that some function, when run with a certain set of inputs, results in some output. Optionally, you can hide the inputs to the function, which is really useful for privacy. Generally, the types of statements you’re trying to prove are things like, “I have a bunch of transactions, and when I run the EVM, the Ethereum Virtual Machine, and process all those transactions, the result is that Uma has this many dollars, and Sina has this many dollars.” These are the balances, the state of the world. That’s the type of function you’re trying to prove.

Historically in ZK, to make your function ZK-provable and generate a proof of it, you had to write it as what’s called a “circuit.” Basically, you had to write your function in this very specific form so that we could generate ZK proofs of it. In the circuit world, it’s complicated to describe, but it’s kind of like writing a very low-level assembly version of your function. You break it down into add gates and multiplication gates. So, you take your function and decompose it into pieces where the only sub-components you can use are addition and multiplication. That’s really brutal because usually, you’re trying to express some very complex statement, and the only tools you have are plus and times. That was the old model of ZK. When I put it that way, it sounds really bad, but in practice, you can express any computation using plus and times. You can have libraries and string everything together, but it was very difficult.

With SP1, what you can do is just write normal code. You write normal Rust, and then you can generate proofs of it. When I contrast it like that, it seems like a hundredfold improvement, and it is. Now, what’s actually going on under the hood is that you have Rust code. In the SP1 world, as a user, you take the function you want to prove, write it as normal Rust, and it gets compiled to RISC-V. RISC-V is this reduced instruction set—RISC stands for Reduced Instruction Set Computing—and it has definitely less than a hundred, probably less than fifty opcodes. Things like add, multiply, load, store to memory—very normal instruction set stuff.

SP1 proves the execution of the RISC-V ISA. What does that mean? When you have a program running in RISC-V—or any instruction set—you have this notion of the program counter, or PC, which tells you which instruction in your program you’re going to run. For every instruction, there’s a valid state transition for that particular instruction. For example, if I’m doing an add instruction, I need to load two operands from registers, add them, and store that value in a destination register. In our world, with SP1, we prove for every single instruction in the ISA that the transition was done properly. We prove that the program counter was incremented correctly, that all the register values that should have been updated were updated correctly with the right result after computing it, and so on.

Behind the scenes, the way we actually do that proof is by breaking down proving every single RISC-V instruction into these circuits that are just add and multiply operations. So, behind the scenes, we are still doing the old circuit-based stuff, which is very difficult and painful for us. But for our users, it’s really simple because they just write Rust, it goes to RISC-V, and then we can prove it.

Sina: That makes sense. I feel like instruction sets and virtual machines might seem kind of mystifying from the outside, but they’re actually very simple. There’s a list of instructions, you compile a program down into a set of instructions, which is basically what the binary or the bytecode is, and then you start stepping through them one at a time.

Sina: Yeah, and so let’s dig into how these instructions are transformed into circuits and the underlying proof system that you’re using, which is from the Plonky3 toolkit. How does that work?

Uma: Yeah, so basically, the particular protocol we’re using is known as the FRI protocol. In general, we’re using the STARKs line of work. STARKs are a way of proving these kinds of statements using hash-based cryptography. There’s also SNARKs, which people might have heard of. STARKs are a type of SNARK, but popular SNARK protocols include Groth16 and Plonk KZG. Historically, a lot of proving systems have used elliptic curve cryptography. Halo 2, or the Halo 2 fork that many in the Ethereum ecosystem use, also does this.

So, there are two major families commonly used: elliptic curve-based cryptography and hash-based cryptography. We do all our stuff using hash-based cryptography, so we use STARKs and FRI under the hood.

What basically happens is, for each instruction, we have a main CPU circuit. This circuit looks like a bunch of rows in a table. Each instruction is a single row, and you can think of a row as approximately the current state of the world when you execute that instruction. The next row is the next state of the world after executing that instruction. For every single instruction, we have constraints that say the transition from this current row to the next row needs to follow certain properties. We write a circuit to constrain those properties.

For example, for addition, we constrain that the next row’s destination register value is equal to the addition of the previous row’s input register operand values. We write a circuit to constrain that addition opcode. So, we say something like, when the instruction is an addition opcode, you apply these kinds of constraints. We do that for every single opcode.

Now, this is a really oversimplified version of what’s going on. In reality, behind the scenes, we have our CPU, which we call a table. If you had to put all the information for all the operands in that one table, your table would be super wide with a lot of information, and that’s very slow to prove. What’s interesting is you’re usually only dealing with one operand at a time. If you have all the logic in every single row to deal with every possible operand but you’re only using a subset of it, it’s not super efficient.

So, in practice, we actually have different tables for every opcode. For example, we have a different table or circuit for addition. That table’s pretty small; it’s not that wide. It just constrains that if I have two operands and I add them, it results in a third operand. It has all the logic and related information you need to do that constraint. Then, we use this lookup argument called the log-derivative lookup argument, informally called the LogUp argument. Basically, in our CPU circuit, we just look up that our two operands plus the third operand are actually additions. We look that up in our addition table.

The real benefit of this is that we save a lot of area. Our CPU can be pretty thin because it’s just doing these lookups to the particular table used for the specific operand at hand. That’s much closer to the reality of what’s going on. Then, underneath the hood, you take all these tables and do this variant of FRI called multi-table FRI, where you can generate one FRI proof for a bunch of these different circuits at once. At the end, you get one STARK proof for all these different tables that prove your statement. Does that make sense?

Sina: Yeah, that makes sense. And then you guys also do this sharding thing, right, for programs that are particularly large?

Uma: Exactly. If you want to run a program that is really long, there’s a limit to how long your table can be. You couldn’t have a billion rows in your table because, in the protocol, there’s an FFT step that’s n log n. You’ll run out of memory; you’ll hit very practical limits very quickly on your CPU and stuff like that.

Sometimes, we want to prove statements that have a billion rows, like a billion instructions being executed. That’s actually pretty common. For example, if you want to prove the execution of an Ethereum block, it can generally be a couple hundred million instructions to even a couple billion instructions in some cases. We want to be able to prove programs of arbitrary length.

So, what we do is break up each program into shards. We first execute the program, get a billion instructions, and chunk them into shards of about four million instructions each. Then, we generate proofs of each shard’s execution. It’s pretty simple: the program counter at the beginning of this shard was this, we run all the instructions, and at the end, the program counter was this. We do that for every single shard.

Then, we combine all the shard proofs together. Each shard gets its own proof. We take two shard proofs and use this recursion system where we verify those two proofs in a STARK itself. We verify that the program counter from the last program counter of your first shard is equal to the starting program counter of your second shard. If those two proofs are both true, then you know you have a proof that works with the first program counter of the first shard and the last program counter of the next shard, covering the whole range.

At the end, you get one proof. Basically, you take two STARK proofs, verify them both in a STARK, and then you get one STARK proof. We do this for many proofs.

Uma: For example, sometimes in certain programs, you’ll have like 100 shards or a thousand shards. You can kind of do this in a tree-like structure. So, you just do two at a time, get the next layer of your tree, do it again, and at the end, you end up with one STARK proof that represents the full execution of the entire program. It’s wild that this stuff is real. It was a lot of pain to implement; it’s very difficult.

But what’s interesting about the recursion thing is, normally you might think, “Okay, how would you implement recursion?” The most naive way is to write a Rust program to verify two STARKs, and then at the end, you just run that program to verify two STARKs and get a new proof. You’re basically using SP1 on itself recursively. But that actually doesn’t work because when you try to do that, there’s way too much overhead. The RISC-V instruction set ends up using way too many cycles to make it practical.

So, in practice, what we did was create a new VM. We made our own instruction set that’s specialized for recursion, specifically for verifying STARK proofs. We built a recursion VM where we verify two of these STARK proofs. We wrote a program in our own VM, so we had to build a basic compiler to write a STARK verifier in our own DSL, compile it to our own recursion ISA, and then write a bunch of constraints and logic to constrain our recursion VM.

So, not only did we build a RISC-V VM and all the circuits for that, we also had to come up with our own recursion ISA, build a DSL to write programs in it, build a compiler to compile to the recursion ISA, and then write a whole bunch of circuits to constrain the recursion VM and ISA to handle recursion. It’s actually very, very difficult, but that’s the whole system.

Sina: That’s wild. So, you’re also using Plonky3, though. Where is the interface between what you don’t do and what you’re pulling in from an external library?

Uma: Plonky3 is this really great library built by the Polygon team, in particular Daniel Lubarov and the Polygon Zero team at Polygon. It’s an implementation of FRI, specifically the multi-table FRI. We use Plonky3 as our implementation of multi-table FRI. We did all the constraining and recursion stuff ourselves, and then when we actually want to run the FRI protocol, we call Plonky3’s methods for that.

Sina: One thing about this general field is that it’s evolving very quickly. The performance characteristics are continuously improving. So, what’s involved with upgrading your proof system over time if you want to do that?

Uma: The field of zero-knowledge proofs does move really quickly, which, if you’re building a system, makes it hard to balance taking your existing system and squeezing all the performance you can out of it versus making a bold bet on upgrading to the latest stuff. I will say, within the field, there has been a consolidation around small-field, hash-based schemes.

Historically, a lot of teams worked on elliptic curve-based schemes, and the challenging part with elliptic curve stuff is that you generally need to work over a very large field, like a 252 or 254-bit field, which is pretty large. The reason a large field is bad is that, on a normal CPU or laptop, arithmetic happens over 32 or 64 bits. When you’re trying to run operations over a 252-bit field, it’s very slow on a normal CPU or even a GPU.

So, there has been a trend where people are making the field smaller and smaller, which results in really great performance. For example, going from 252 bits to 31 bits, you can think of it as roughly dividing the field size, and your proving performance goes up by a similar factor. It’s not exactly correct, but that’s the intuition. Our proof system is over the BabyBear field, which is a 31-bit field.

You can’t do elliptic curve stuff over a 31-bit field, but thankfully, the StarkWare people and others have been working on hash-based schemes. You can do hash-based cryptography over a small field, and that’s the technique that we and a lot of other people in the industry are converging on. For example, zkSync, Polygon zkEVM, RISC Zero, Valida—a lot of people have been using these smaller, hash-based fields. The Plonky2 library and others also use small fields with hash-based cryptography.

I guess I didn’t fully answer your question. The general architecture will remain the same, even though there’s this tension of squeezing as much as you can from what you’re building around the proof system. But even if you want to upgrade, it seems like the field is generally converging on one architecture, which might make it easier to swap things out in the future.

Another thing is, we use Plonky3, which is an open-source library, and there’s also a bunch of consolidation around that. SP1 itself is very open source; this proof system is fully open source. We’ve tried to make it a pretty nice developer experience. We’ve actually had a bunch of external teams fork it. For example, the Lurk team—I think they have a new name now, maybe Argument—they forked it, added their own precompiles, which we might talk about in a bit, and they’ve contributed back to SP1.

You get these open-source Schelling points around Plonky3 and SP1, and that also helps with upgrading. For example, there’s this new hash-based proof system called Circle STARK that uses a different field than the current one we use. Plonky3 is implementing it, and when they do, you can imagine that we’d just be able to pull it into SP1 and upgrade to it.

I find something really magical about crypto: everything’s open source, and you can have these really positive open-source collaborations that you don’t see happening in Web2 or other industries. I’m hopeful that this can help us stay at the bleeding edge and collaborate across ecosystems to keep everyone at the forefront. If a lot of teams are using SP1 in their applications, then a new proof system wanting to get distribution would do the work themselves to get integrated into SP1.

Sina: Exactly. I was talking with my friend Zach Obront earlier, who I think worked with you on a project. Before we get into the ZK rollup stuff, which I do want to talk about, one of the things he said was that it’s been really impressive to see how you’ve improved the performance by orders of magnitude. How have you been able to do that, and maybe this ties into some of the precompile stuff? What’s the intuition around that?

Uma: Yeah, so one of SP1’s biggest innovations was the system of pre-compiles. Philosophically, you can think of a pre-compile as a way to have what I like to call “ZK assembly.” It’s a very specialized circuit for certain expensive operations. For example, with the Keccak hash function, the SHA hash function, or verifying an ECDSA signature, you have a specialized circuit for doing that operation. Instead of paying for a bunch of instructions in your CPU, which could take, say, a million cycles to verify an ECDSA signature in normal Rust, our system uses this specialized circuit. It does a lot of the heavy lifting, inlining memory access and other specialized tasks. This turns those million cycles into just 40,000 cycles—a 20x improvement.

Intuitively, it’s very similar to assembly, which is why I made that analogy. Normally, when you and I write programs, we use high-level code. But underneath, especially in cryptographic libraries like Rust Crypto, there’s often pretty ugly, specialized assembly code tailored to the CPU architecture. It uses vector instructions and is hand-written to squeeze out every last bit of performance for computations that are called frequently. Our pre-compile system does exactly that. We hand-wrote circuits very specialized to these operations, and it’s been incredibly effective.

In practice, the reason SP1 is often an order of magnitude faster is that these pre-compiles are super effective. For example, a program that would normally take two billion cycles in SP1, when using pre-compiles, might take only 300 million cycles. That’s almost a 10x reduction, and generally, 10x fewer cycles means 10x less proving time.

One of our key insights—and I think some of the best insights sound obvious in retrospect—wasn’t clear to us at first. We thought, let’s implement pre-compiles for the use cases we’re interested in, like bridging and rollups, where most CPU cycles are spent on hash functions and signature verification. The rest of the business logic, like checking if a nonce equals the previous nonce plus one, is relatively less expensive in terms of instructions. So, we implemented these pre-compiles, and I remember a day when we were testing them. We saw the cycle count for one program drop from 800 million to 30 million. We were like, wow, this is incredibly effective.

That’s when we decided to build our architecture this way. It wasn’t obvious how effective it would be. We thought it might be a 2x or 3x improvement, but in many cases, it was 10x to 20x. It exceeded our wildest expectations. In hindsight, it seems obvious. Even on a normal CPU, when you run Ethereum blocks, a lot of time is spent on the Keccak hash function. So, it makes sense in a ZK context to write this ZK-specific assembly and hook it up to the main RISC-V part. But I don’t think this was apparent to many people because no one had built a system like this before.

Getting it working from a cryptography perspective was also pretty difficult. We had to come up with novel algorithms and techniques. For instance, we developed a global shared randomness system that’s pretty innovative, something no one else has. We also have a two-phase prover: in phase one, we go through the whole computation, generate it, and hash all of it to get a shared global challenge that we use across the entire computation in phase two. That’s necessary to make everything cryptographically sound. A lot of our engineering time was spent getting this pre-compile system to work efficiently—it’s not easy.

Now, you’ll see a lot of people talking about pre-compiles and adding them to their systems. It’s become almost obvious that this is the way ZK VM performance will match circuit-based systems and become practical. But even when we started, it wasn’t clear to us. It’s been magical to see it play out and work so well.

Sina: Yeah, that makes sense. You’re tuning the VM to the type of use cases you expect to see. If you benchmark the performance of these programs and certain functions are getting hit over and over again, it makes sense to optimize them. How do you think about security and testing of these systems? Hearing you describe it, it feels like there’s some crazy deep math involved, and you’re implementing these things. Ultimately, this machinery will be used to prove the execution of, say, a ZK rollup. You want to know it’s happening correctly. How are you thinking about building that security over time? What are your mental models around that?

Uma: Security is always a super important area, and it’s hard. It requires a lot of effort. There are a couple of things we focus on. First, by being fully open source and using other open source libraries, we get a compounding effect. For example, we use Plonky3, which the Polygon team got audited, and other people are using it as well.

Uma: And hopefully, the security story around that compounds. I also think security is a process. Saying something is 100% secure at a given point in time is kind of meaningless because systems are always changing. We’re optimizing stuff, coming up with new algorithms, and deploying them. Security itself is a living process. You have many people looking at the codebase, and then even more people reviewing it over time. We recently hired an amazing auditor to join us full-time, and his job is just to look at the codebase constantly.

So yeah, I think security is definitely more of a process than just getting an audit report and saying, “Okay, it’s secure now,” or it’s not. You never know for sure. For example, with SP1, we had a public competition for our core RISC-V VM. It was an open forum where people across the world participated and found bugs in the SP1 RISC-V VM. As the system gets used more, and more people are using it and forking it, I view a lot of our forks very positively. It means more eyes on the code, which can help find new problems. With more diverse teams and people looking at it and using it in production, you can feel better and better about the security.

Take Ethereum, for example. The EVM has been used for a long time by a lot of people, and now people feel more confident about its security properties. I view it as very similar. These things will get used more and more, get looked at more, and we’ll fix stuff as it comes up. In five years, hopefully, we’ll feel very, very good about the system. It’s like the passage of time with value at stake—something just gradually becomes more Lindy as time goes on.

Sina: Yeah, 100%. Okay, so maybe let’s dig into this rollup stuff now because this was so cool for me to learn about. I saw your presentation about it at Greenfield and Frontier years. But basically, you, apparently without much complexity, using SP1, WASM, and Kona, and a couple of Rust libraries that can execute Ethereum blocks, built a way to turn any OP Stack optimistic rollup into a ZK rollup. Can you explain how that works?

Uma: Yeah, so to cover the OP Stack—basically, Optimism runs Layer 2 on Optimism Mainnet, but they also have this OP Stack, which is open-source technology for anyone to deploy their own rollup. From day one, I think the Optimism team deserves a lot of credit for building the OP Stack in a super modular way. They made it fully open-source and designed it so you can swap in and out certain smart contracts or modify the rollup system. The code is really well-documented to make it easy for other teams to do this.

That’s actually what we did. We took the OP Stack and swapped out one smart contract. Instead of having their fault proof system, where transactions get posted in blobs on-chain and you have a smart contract with the state root of the rollup against which withdrawals and bridging are done, we made a change. Normally, the correctness of that smart contract is ensured by their fault proof game, which is an optimistic fault proof interactive challenge section. What we did was replace that smart contract with a ZK version of it. The difference is pretty small—it’s like 50 lines of code. Now, whenever you post a state root, you also post a proof alongside it, verify that proof, and then you know the state root is correct.

So, we replaced that smart contract with a ZK version that verifies proofs. Then, we took the off-chain agent that’s running and updating that smart contract and modified it a bit so it generates the proofs using SP1 and our proof API. It keeps that smart contract updated. It’s pretty awesome because, in the end, you can take any OP Stack rollup, deploy one smart contract on-chain, spin up one very lightweight service—since the service itself isn’t generating proofs, it’s dispatching out to our API to generate them—and it keeps that smart contract updated. Boom, you’ve fully converted your rollup into a full ZK rollup.

The beauty of using SP1 is that historically, if you wanted a full ZK rollup—like the ZK rollups that exist today such as ZKSync, Polygon ZK EVM, or Scroll—you’d need a very sophisticated team of, say, 30 cryptography PhDs coding a specialized circuit of the EVM for your ZK rollup. That’s a really long, complicated codebase, hundreds of thousands of lines of code, to implement the state transition function logic of your rollup. Those teams did it, and it’s possible, but it’s very difficult. It takes a long time, it’s hard to maintain, and when Ethereum upgrades, you have to change all your circuits. Overall, it’s not a fun time. Also, a lot of those rollups aren’t even Type 1, meaning they have differences with the EVM, often in how they compute the state root and other things.

With SP1, what’s really nice is you just take WASM, you take Kona—which is a Rust library that Optimism open-sourced that implements their state transition function in Rust—and we run all those programs in SP1 to generate a proof of the state transition function. It was literally just Zach and one engineer on our team who got it done in like a month and a half. They took Kona, put it in SP1, wrote about 500 lines of code, and boom, you get a ZK proof of Optimism’s state transition function that you can use to turn any OP Stack rollup into a ZK rollup.

By the way, it’s fully Type 1, so it’s fully EVM-compatible. It uses the same code, and when things update or change, we just do a cargo update Kona. It’s all very simple and nice. It’s really maintainable, really testable. If you want to customize your rollup, add new precompiles, or do something magical like make the balances yield-farmable or whatever, you can do that too. So, it’s kind of the best of all worlds. You get the amazing developer experience, the customizability, the maintainability, but you also get all the benefits of ZK. The withdrawal window isn’t seven days; it goes down to like one to two hours. Interoperability is a lot nicer. Yeah, it’s crazy. The fact that these pieces of software are just composable in this way is incredible.

Uma: Yeah, I think the custom circuit people might argue that ZKVM is still a bit expensive. To put some numbers behind that, if you do the things I described, the proving cost per transaction ends up being less than 1 cent. It really depends on your rollup and other factors, but it’s between 0.5 to 1 cent. SP1 has a lot of improvements in store, and there are optimizations we can do in the Kona program and even in the Optimism protocol itself. There are small design decisions Optimism made—not really their fault since they weren’t thinking about this use case—but very small tweaks to their protocol could make all this 2 to 5 times more efficient.

I believe all these improvements will stack up to bring the cost down by 10x. In the next six months, we could have a 0.1 cent proving cost per transaction for a fully ZK rollup. That’s as cheap as Solana, which is very cheap. So, I’d say it’s already very affordable and so much more maintainable. I really think the era of doing custom circuits for this stuff is over.

One important thing to understand is the intuition behind why a ZKVM can outperform custom circuits. I never thought that would be possible; it seems crazy. From a theoretical lens, if you hand-write a custom circuit for exactly your computation, it can have better performance than SP1. That’s almost definitionally true. But from a practical perspective, because SP1 is such a broad platform, we can always spend the engineering time to squeeze every last bit of performance out of it.

For example, we wrote an SP1 GPU prover and really optimized our cloud deployment of SP1 to pipeline efficiently. You run into weird bottlenecks—like downloading files sometimes becoming your bottleneck—and we’ve optimized every single corner of SP1 to be as fast as possible and are continuing to do that. In a circuit system, it’s really difficult to do that because you have a circuit for one specific use case. Are you really going to spend 10 engineers to write a GPU prover for it if it’s just benefiting your one company and one use case? And then, by the way, your circuit might need to be redone when a new proof system comes out.

For us, it’s a different calculation. When we make SP1 10% better, every single one of our customers benefits—any rollup team using SP1, any bridge using SP1. There are dozens of people experimenting and building with SP1 at this point, so all those people simultaneously benefit. For us, it was worth the significant engineering resources to build a GPU version of SP1. It was a no-brainer; we had to do it. Whereas for a custom application-specific circuit, the ROI just doesn’t make sense.

So, in terms of practical reality, it’s very difficult for custom circuits to live up to a general-purpose ZKVM. The platform is so general that, in the end, it can be much more optimized. I think that’s only going to become more true over time. More people are going to use SP1, we’re going to hire more people, and really optimize it further. It’s a flywheel that compounds over time. Already, running Rust SP1 is faster and cheaper than certain current ZK circuits specialized for the EVM, and I think that trend is only going to continue.

Sina: Yeah, that set of arguments definitely makes sense to me. How are you thinking about where SP1 gets used? How big of a focus within that are rollups, and what are you pushing on there? What are the other areas, let’s say in the next one or two years, where you think there’s real promise for ZK to make a difference?

Uma: I think today, rollups needed ZK four years ago, honestly. It’s a really burning, urgent need for rollups. If you think about it, decentralization has an inherent overhead because everyone has to repeat the same computation. This is why Ethereum is a world computer, but I like to say it’s a very bad world computer because it’s slow and expensive. ZK naturally solves that. The verifiability of ZK is the perfect complement to the overhead of decentralization. It just makes sense.

So, I think ZK is really the only way Ethereum is going to scale in a sustainable manner. You can do a lot of hacks—which I feel like our current system of scaling Ethereum is, just a bunch of hacks layered together. It’s like, we do this optimistic thing, but your money’s locked up for seven days. Or the optimistic teams are trying to do interoperability, but it only supports five chains, and it’s very permissioned. It’s kind of a mess. I think ZK is the only way we’re actually going to scale blockchains. It’s time to get serious and actually scale blockchains, and ZK is the way to do that.

Sina: How do you think about interoperability? I think Rotem mentioned that you have some spicy takes there about ZK being the only way interoperability is going to work.

Uma: Yeah, I think that’s kind of what I was saying. If you look at the optimistic rollup ecosystems today, there’s this infrastructure suite because there’s a seven-day withdrawal window, which is totally unacceptable to most people. You don’t get finality, and your money’s locked up. So then you have these liquidity layers for fast bridging in and out, but if you try to fast bridge out $100 million, they’re not going to be able to do that for you. That already doesn’t work for people who have a lot of money on-chain or for institutions needing that kind of capital available immediately.

Even if that kind of works, I’d argue it’s still pretty bad. If I want to build my own rollup and deploy it, now I have to go lobby all these bridging and liquidity companies to deploy on my rollup. Usually, they’ll say supporting a new rollup is very complicated and difficult. They’ll ask how much liquidity and TVL I’m going to have, and I’m like, well, I’m just starting my chain, so there’s this chicken-and-egg problem.

Uma: So, I think today deploying a rollup really sucks because there’s this whole set of infrastructure soup that’s needed to make this stuff work. Fundamentally, that’s just because the optimistic rollups are a clever way to securely bridge back to Ethereum, but they’re not actually scaling much of anything. All these actors running infrastructure still have to run a full node for the optimistic rollup. They’re not scaling; you’re still repeating the computation.

With ZK, you don’t have that. You just verify a ZK proof, and you eliminate the need for most of these actors. It just works. It’s actually scaling. You can verify the ZK proof, confirm the computation, know the new state root, and that’s it. So, each of these L2s, these rollups, will have state roots for all the other rollups, and those state roots will be computed as ZK proofs.

I think the endgame for all of this is that the Ethereum folks implement single-slot finality. This might take a while, but there’ll be real-time ZK proving for all the rollups. Then, it’s like you have a verifiable database settling to Ethereum with a real-time ZK proof that anyone else in the world, including other rollups, can just verify. It really feels like one chain where you can instantly move money across any ecosystem you want. I think the only way we’re going to get there is with ZK, done in a scalable, sustainable way.

Sina: That makes sense. So, what do you think the ask is for that? For generating, collecting all the transactions, and generating the proofs, some period of time still needs to pass, right? You’re not doing that instantaneously. What does that trend to over time?

Uma: I think we will definitely get to real-time proving. Obviously, we’re not super close to that today, but even with our systems, one interesting thing is, as I mentioned earlier in the podcast, we have this sharding argument. We take a really long computation, shard it, and one nice aspect is that you can start generating proofs for all those shards in parallel. So, you distribute your proof generation computation, using the same amount of resources.

From a cost perspective, it’s slightly more expensive because there’s an overhead to distributing the computation, but from a latency perspective, your latency goes down a lot. We already have situations where the total end-to-end time is like 30 minutes to generate a proof for one block. But if we fully distribute it, you could imagine getting the proof in a couple of minutes. And you could imagine that going down even further, to like 30 seconds, or at some point, maybe 12 to 15 seconds.

So, I think there are solutions to really lowering the latency that will come about as the proof systems get better and faster. For us, once we understand that the costs are cheap enough, we can laser-focus on latency. There’s a lot we can do on the algorithmic side to make it better. I’m not saying it’ll happen in the next six months. I’m very confident SP1 will get 5 to 10x cheaper in the next six months—that’s a huge given. But I’m not claiming we’ll have real-time proving in that timeframe. I don’t think it’s 10 years out, though.

Sina: That makes sense. So, you were saying there’s rollups, there’s bridging—these are two areas you’re really focused on. Is there anything else internally that you’re really trying to push on right now?

Uma: Yeah, I think there are two categories of ZK broadly that I’m really interested in. One is what I like to call “upgrade the stack.” We’ve built crypto, it’s been kind of crazy, and there’s a lot of technical debt. I view optimistic rollups as technical debt. Now, finally, ZK is really easy to use, it’s fast, it’s cheap. So, let’s go actually use it. Let’s make every rollup a ZK rollup, every bridge a ZK bridge. Let’s remove all the multisigs and bad stuff that’s happening that shouldn’t be, and actually fix these things that should use ZK, make them verifiable, and live up to the promise of crypto.

So, that’s upgrading the stack—taking existing things, adding ZK to them, making them work the way they should have originally worked, but we didn’t have the tools back then. I’d put rollups and bridges into that category. Every rollup should be a ZK rollup, every bridge should be a ZK bridge. There’s a bunch of other on-chain applications too—everything that can use ZK should use ZK. Now, integrating a lot of this stuff is almost a weekend project, I like to joke.

Sina: What are the other big categories of this “upgrade the stack”?

Uma: Good question. I think rollups and bridges are the biggest ones, but there are many different types of rollups, right? Different VMs for rollups, different constructions. Then, for example, there’s AggLayer, which is Polygon’s new interoperability protocol. It’s not really a rollup, not really a bridge, but an interoperability protocol. There are also multi-settlement rollup things—is that a rollup, is that a co-processor? Co-processors are another big one, where it’s off-chain compute, not quite a rollup, but similar to scaling. I think a lot of those applications should use ZK, and often already do.

The other category I’m really interested in is real-world ZK, where it’s like, how can we use ZK in a blockchain-adjacent context? For example, there’s been a lot of recent excitement around web proofs or ZK-TLS, where you can take arbitrary data provided by various internet companies, attest to it, and then selectively reveal information about it using ZK. There’s also self-sovereign identity, credentials, and attestations that I think are interesting. For example, you can prove that you have certain credentials and reveal a complex statement about it, like, “My age is over 18, but I’m not going to tell you my actual age.”

I also think the notion of a verifiable database is super important. For example, you have these credit scoring agencies that just update your score. There’s regulatory accountability, but these entities get hacked all the time. That’s an area that would really benefit from verifiability. When they update your score, they should show that it’s following some reasonable function, doing something reasonable, not just giving random information. So, those are the kind of use cases outside of crypto or blockchain-adjacent that I’m excited about.

Uma: I’m pretty excited about seeing some of these ideas flourish. I don’t know if it’ll happen in the next three months, but I think the rollup stuff could come together in a relatively short time frame. It’s really important for people to work on this longer-term stuff and explore how zero-knowledge proofs can help in the real world beyond just our crypto ecosystem.

This blends into other primitives like MPC and fully homomorphic encryption. Those are on their own improvement curves and unlock entirely different sets of applications. It’s very cool to see how this all ties together.

Sina: For sure. As a last question, I’m curious about the lifecycle of a rollup case. Let’s say I decide to apply zero-knowledge proofs to my state transition function. Can you walk me through the lifecycle of the proof being generated and verified? Who are all the players involved, from beginning to end, so I can visualize this process?

Uma: Generally, you have some end-user application that needs a zero-knowledge proof. This could be a rollup team, a bridge team, or if they’re decentralizing their sequencer, it could be whoever is responsible for posting it. So, you have a user or proof requester making a request.

Our dream is that they make one API call with their proof request information. They pass in their program and inputs, and then they get a proof in the shortest time possible for the minimal cost. Right now, we have an API where we handle that. We have our own prover, and we provide the proofs.

But we’re also working on a decentralized prover network. One day, it won’t just be us generating these proofs. It’ll be a protocol where anyone in the world can plug in their GPU or computer, provide capacity to the network, and serve as an actor in this ecosystem.

Sina: Why go for a decentralized network instead of just handling everything yourselves?

Uma: That’s a good question. First, generating proofs at a low cost is a different skill set from building the virtual machine or developing the algorithms ourselves. As a concrete example, you often need a data center with low-cost energy, set up prover nodes, and manage them. I don’t particularly want our company to get into the business of setting up and maintaining data centers.

You see this in AI too. Companies like OpenAI and Anthropic rent data centers from others instead of doing it all in-house. So, there’s a notion of having an open protocol where people can compete and provide capacity to the network. It’s not just bottlenecked on us having everything together.

Another interesting aspect is the competition dynamic. With Bitcoin, for example, the mining protocol created a global competition for the best mining ASICs. You saw tremendous gains moving from CPU mining to GPU, FPGA, and then ASIC mining. You could imagine a similar system for SP1, where global competition drives efficiency and optimization. If people can make it more efficient, they’ll earn rewards for it.

Finally, the thesis of crypto is about open protocols and systems where anyone can participate. Because it’s open, there’s less rent-seeking from a single entity, and you get better outcomes for everyone. Users get cheaper costs, and others can invest capital, time, and effort to earn rewards. Instead of us being the only ones generating proofs in a walled garden, we want to create an open ecosystem. It turns into a market with competition, supply and demand, price transparency, and all those benefits.

Since we’re building in crypto and many of our customers are in this space, I feel like that’s the right way to architect it. Having a network of provers participating in a marketplace is better for everyone from an end-outcome perspective. That’s why we want to build it this way. It aligns with the thesis of crypto as well.

Sina: So, the developer calls this API. Right now, you’re generating the proofs, but in the future, there will be this network. Basically, there are node operators, which currently is just you running some hardware?

Uma: Exactly. Right now, it’s a network of size one. We’re the only operator, and we fulfill all the proofs. But one of our big focuses is to make it so anyone in the world can participate. This also brings decentralization and liveness properties, which are important for many of our customers. They care a lot about that.

After the network generates your proof, you get it and verify it on some settlement layer. It could be Ethereum if you’re doing L3s, it could be Base, Bitcoin, Solana, or even your user’s phone. You verify the proof somewhere useful, and that’s the pipeline.

Sina: Do you see any promise in developing the software and hardware together, or custom hardware for the software, as has happened in other industries?

Uma: That’s essentially asking if we should build our zero-knowledge virtual machine in a way that’s super hardware-friendly. I’d argue we already kind of do that. For example, the small field we use is very hardware-friendly, and the reasoning behind it is tied to the hardware itself. So, we’ve co-designed SP1 for the hardware we use, which is CPUs and GPUs.

There’s also the question of FPGA or ASIC design. I do think, in the fullness of time, that will come about. But there are advantages to sticking with more general-purpose hardware in the short term. For instance, the algorithms are changing pretty frequently, and we’re coming up with new algorithmic ideas. If you’ve ever tried to write FPGA code or develop hardware, it’s really difficult. It’s nice to prototype and iterate quickly for now.

Uma: Algorithms on general-purpose, flexible hardware allow us to iterate quickly. At some point, a lot of this stuff will consolidate and converge, and then it makes sense to move to FPGA or ASIC for it. There’s a question of the time horizon—will things still be evolving super quickly for the next two years, or will it be five years, or just six months? I’m not entirely sure.

There’s also the issue of data movement. For example, a lot of the processing right now isn’t even necessarily compute-bound. The hard part is getting the data onto the device and then being able to run compute on it. If you look at something like Bitcoin mining, it’s perfectly compute-bound. You’re just hashing something, there’s no state. You put your initial seed onto the device, and then boom, you just do as many gates as possible. So for mining, an ASIC makes perfect sense.

Proving, on the other hand, is much more complex. There are many different phases and stages with a bunch of bottlenecks. There’s a lot of data being moved and passed around. For those reasons, it’s not as much of a no-brainer to go with custom hardware.

To give an example, in the machine learning world, a lot of people have tried to compete with NVIDIA and make custom chips for ML. Very few, if any, have succeeded. You could imagine a similar situation playing out with zero-knowledge proofs for at least a long time. Now people are talking about AI inference chips, and there’s this new crop of startups. From talking to my friends, some are excited about certain startups, but I think people are still very far from catching up. AI is also at a different stage of maturity—NVIDIA’s market cap is about $3 trillion. It might be a while in the ZK world before we get to that level, and that makes sense.

On the other hand, maybe in three months, there will be trillions of proofs and trillions of cycles, and we’ll be like, “Oh, we actually really need this.” I don’t have a super strong view on it myself.

Sina: Great. Is there anything else you want to jam on? I feel like this has been a full tour from the architecture to the use cases to how everything fits together.

Sina: I feel like we covered a lot of stuff. Thanks for the mini-lesson on how STARKs and FRI and everything work.

Uma: Oh, for sure. It’s fun to talk about because I think you’re one of the few podcast hosts who really did your research.

Uma: Well, I think at some point we’ve got to just pull up a whiteboard and work through the math. I’m not sure if this medium is very conducive to it, but it would be nice to do that.

Sina: Yeah, for sure. Thanks so much, Uma.

Uma: Thank you for having me.