YouTube’s first-generation Argos video chip made its data centers way more efficient, freeing up expensive processors for demanding tasks. If one is good, two is better.
YouTube has at least two new versions of the custom video transcoding chip in the works, suggesting the company is committed to producing the piece of silicon for the foreseeable future. The video coding unit, or VCU, came into being after Google figured out that Moore’s law — the predictable doubling of chip performance at a lower cost — had become an unreliable way to plan its data center construction.
A tech lead for infrastructure and Google fellow, Partha Ranganathan was at the heart of the effort to design and deploy the Argos chip. Ranganathan co-founded the project and was the chief architect of the chip. He serves on the board of the Open Compute Project Foundation, and prior to his time at Google, Ranganathan worked for HP Labs for more than a decade.
Ranganathan recently discussed with Protocol why Google decided to make custom chips, how it elected to pursue one focused on the compute-intensive workload of video transcoding and the future of hardware-accelerated video.
This interview has been edited and condensed.
Maybe a good place to start is at the beginning. Can you tell us in as much visceral detail as possible how this idea of a chip for video — for YouTube — came into being?
So in the role that I am in, I constantly look at how our infrastructure is evolving, and, about six or seven years back, we realized that Moore’s law was dead, and this notion that performance doubles for the same cost every 18 months. So every two years, we used to get double the performance for the same cost. Now it’s showing up every four years, and it looks like it’s going to get slower. And we said, “Well, I think we need to do something different.” We decided to embrace custom silicon hardware accelerators.
We built this accelerator for machine learning called the [tensor processing unit, or TPU], which was our very first stuff, and I was privileged to help out a little bit with that as well. And so we were doing that, and that project was really doing very well. And we were realizing that things that were not possible earlier were magically happening. We could go on Google Photos, [and] some amazing things that used to take months to train were [now] taking minutes to train, and we were creating new product categories.
So I have an equivalent to that, which is, if an accelerator lands in the fleet, and nobody uses [it], it didn’t really land.
So I was coming at it from the point of view of: What is the next big killer application we want to look at? And then we looked at the fleet, and we saw that transcoding was consuming a large fraction of our compute cycle. So we started off saying, “Hey, look, is there something here? This looks like an incredibly compute-intensive workload and it’s fairly well defined.”
Building an accelerator is not an easy undertaking; you need a strong stomach for it. One of the big things you saw in the 13-page paper is this: It’s all about co-design. So the hardware is really kind of the tip of the iceberg. It’s the entire surrounding ecosystem.
I have this quote that my colleagues find very humorous. You know the philosophy quote: If a tree falls in the forest, and nobody heard it fall, did it fall? So I have an equivalent to that, which is, if an accelerator lands in the fleet, and nobody uses [it], it didn’t really land. And the point really is you can build hardware: It’s not that complicated. You can build amazing hardware. But if you don’t build it in a way that our software colleagues can use it, and it can actually work and there is compilation and tools and debugging and deployment and so on — it’s a pretty big undertaking, right?
Were there any important “eureka” moments along the way to designing the Argos chip?
The first aha moment was that we needed another accelerator, and video seems to be growing. But one of the big epiphanies we had was that accelerators are not about efficiency. I think it’s kind of very contradictory, or counterintuitive. Because most people draw a pie chart of where the cycles go and say, “Hey, here’s 30% of my cycles, I’m going to accelerate it with hardware.”
What we realized was that accelerators are all about capabilities. Not only are we going to make all these [tasks] faster, much like machine learning, we’re going to create magical experiences that otherwise didn’t exist.
I’m thankful we used Google video conference [for this interview], because it’s running on the hardware we developed, it’s running on a VCU. And so this blurring of my image behind me is running on a VCU. And so you can do some really nice stuff with image processing. You could do 8K video, you could do immersive video, you could 360 degrees, you could compress video. So the bandwidth became faster, and you could get quality of service. You could do YouTube TV, you could do cloud gaming, right? So the capabilities, not the efficiencies — the new things that you could do — that’s when we realized we were sitting on something really interesting.
One of the unique things about the Argos chip is that it had, at least compared with a chip company, an unusual amount of collaboration between the hardware designers and software engineers. Can you talk a bit about how that worked?
I still remember the really exhilarating — but sometimes it was a very challenging — conversation about, “Where do we draw the line between hardware and software?” I know the paper was very dry and technical. But to me, it’s super exciting because some of the trade-offs we made were the first time in the history we’ve done these kinds of things. This is the first time on the planet we have warehouse-scale transcoding. And we’ve never done large-scale distributed transcoding. So what do we put in hardware? What do we put in software? How do we do schedulers? How do we do testing? How do we do high-level synthesis?
High-level synthesis is kind of an emerging technique. But this notion of using a software-centric approach to designing hardware was something that we really pushed on very hard in Argos. There’s a whole bunch of software things that are associated with it. So hardware is pretty hard. It’s complicated. And it takes a long time. So I work at Google, which is a software company, and we do some of the world’s largest software platforms, and something like web search or Android are huge, complex software, code bases, but we still kind of make updates to them fairly frequently.
When I go to my software colleagues and I tell them, “Hey, look, I have a hardware idea — I’m going to change your business model. I’ll come back in two years and I will get you something at that time.” They look at me and say, “Two years? That’s a long lead time for me to get something all right.” And they’ve often asked me, “Why don’t we do it the same way software people do? Why don’t you kind of do incremental things? Why don’t you do agile development?” And I always tell them, “Hey, look, hardware is hard. It’s different.”
And so one of the approaches, I think, in the post-Moore’s law world, is this notion of: How can you have software-defined hardware? The idea really is, can you use high-level synthesis techniques [or HLS]? And what was very notable was we actually did some nice innovations in that space as well, which we didn’t talk about too much.
What did using high-level synthesis for your designs achieve?
So what it does is we got hardware — there’s a term called PPA, P for power, P for performance and P for area, so that’s how you look at hardware. So we got similar power, similar performance, similar area, with maybe a little bit of trade-offs here and there. But what it allowed us to do was iterate much faster, and so the paper actually talked about [an] example of how you could look at a much broader design space. Because we could quickly learn it and simulate it, and see what happens. So you could do a much more systematic design-space exploration.
We use a lot of Intel, AMD, and Arm in our fleet, and if somebody delivers something magical, I will use it.
But you could also start to be very nimble about late additions, and we actually had an example in the paper of a last minute additional — we decided we need to add a little bit, the algorithm changed. And so we said, no worries, we can go ahead and compile the hardware. That’s what the whole HLS is about.
Were there any stand-out moments once the chip launched? For example, what happened during the early days of the pandemic when many people spent time at home? Did it push the Argos chip to the limit?
When the pandemic hit, usage just went up — like a 25% increase in watch time across the world in a 30-day period. The fact that we had an accelerator lying around that could really stand up to all the demand was pretty useful as well. So that was a very memorable moment for us.
The final step in the chip design process is called a tape out, dating back from the last century when designers literally taped a design together before it went off to the fab. It’s an important moment, even today. How did your team celebrate?
If you’ve designed hardware — so maybe the right analogy is to think about the most exhausting project that you have done. And it was an adrenaline roller coaster, and so on. And then you finished it, what happened? My suspicion is you’re going to sleep right after that. I was so exhausted. But at Google, we have the tradition that we always celebrate with ice cream and kind of have a party, so we did have all of that. But it wasn’t like that one moment where we pressed, the magic button popped up and confetti rained all over like a NASA launch. We hugged each other and did all of that stuff. I wish I could say there was that nice movie-worthy moment.
And so the person who did the tape out sent an email saying, “Hey, look, this is done,” and we have a flurry of congratulatory emails and then everybody’s exhausted. Nowadays it’s just a bunch of FTP files getting uploaded, and there is not that nice epochal moment where there is literally a transition of physical hardware. So you have to make do with emails being sent out and ice creams being consumed.
What does the future for accelerator chips look like? Does Google want the likes of Intel, AMD or Nvidia to start making custom video accelerator chips?
I’ve been working in this area for multiple decades, and this is by far the most exciting time that I’ve had — the number of opportunities that we’ve had: video, ML, network acceleration and security, data processing, there’s so many things to be done. And so when the dust settles there are going to be a bunch of big accelerators. Now, to me, video is easily going to be one of those category accelerators, and we’re just touching the tip of the iceberg. So video transcoding is this one small block, and things which we know for sure are very important, and we want to do this.
But if you start looking at how much video is central to our lives — and I think, for good or bad, the pandemic has made video even more central — and you saw that how many kids used YouTube and cloud gaming during the first few months of the pandemic, how videoconferencing is [the] default. I was at a conference last week. And it’s now the default to have a hybrid, right? And then I look at the number of IoT devices like cameras that are capturing the images, cameras and manufacturing that are looking for quality checking. Video is going to be teaching computers to see, and the computers are going to be everywhere. That’s why I think video will be an integral part of our lives.
So all of this is a long-winded way of saying there’s plenty of opportunities. And I really see a very, very vibrant cottage industry of us all figuring out how to use and accelerate video in the future. Is it OK, if Intel or AMD does that — I think part of the reason why we publish the paper is we would love for the entire industry to understand the importance of this problem and kind of build on top of that, because that is how the search works. That’s how innovation works, people built on top of each other, and again, at the end of the day, I’m looking for us to deliver magical experiences. And if somebody else delivers hardware — we use a lot of Intel, AMD, and Arm in our fleet, and if somebody delivers something magical, I will use it.
Because we are in the business of using hardware to build even bigger, magical experiences. On top of that, if it turns out we have awareness of a problem that needs hardware, and we think we can do it well, like we did with Argos, we will continue to do that. And I think I see a very, very rich road map, and have ideas in the future, that we can continue to accelerate.