The Generative Biology Revolution Podcast

The Generative Biology Revolution Podcast | Amgen

Generative biology is a revolutionary approach to drug discovery and development that leverages AI and machine learning to design novel protein therapeutics. It holds the potential to enhance the speed and efficiency of drug discovery.

In this four-part series, Ray Deshaies, senior vice president of Global Research at Amgen, discusses how generative biology is transforming drug discovery to make it more predictable, shorten timelines, and increase success rates of bringing life-saving medicines to patients who need them most.

Released Episodes

The Cresting Wave of Transformational Science with Alan Russell, Ph.D., Vice President of Biologics

Transcript

Episode 1: Generative Biology: The Cresting Wave of Transformational Science

Intro: Welcome to The Generative Biology Revolution, a special edition podcast series produced by The Scientist's Creative Services Team.

This series is brought to you by Amgen, a pioneer in the science of using living cells to make biologic medicines. They helped invent the processes and tools that built the global biotech industry and have since reached millions of patients suffering from serious illnesses around the world with their medicines.

Generative biology is a revolutionary approach to drug discovery and development that leverages machine learning and AI to design novel protein therapeutics. It holds the potential to enhance the speed and efficiency of discovery. In this series, Ray Deshaies, senior vice president of Global Research at Amgen, discusses how generative biology is transforming drug discovery to make it more predictable, shorten timelines, and increase success rates of bringing life-saving medicines to patients who need them most.

Ray Deshaies
Biologic drugs are revolutionizing disease treatment. They are made from living cells and include proteins such as antibodies. Identifying and optimizing biologics is a slow, iterative process, where scientists must constantly tweak potential therapeutics to improve their activity and safety. In 2021, the world changed for drug research and discovery when researchers published advances that used AI and machine learning to predict the structure of every human protein. With discoveries like this, scientists are launching the generative biology revolution where they strive to leave the guesswork behind and instead use computers to quickly tailor biological molecules for therapeutic purposes. 

In this episode, I speak with Alan Russell, Vice President of Biologics at Amgen. Together, we review what generative biology is and how it helps scientists understand proteins from their amino acid building blocks to their folded, three-dimensional structures. We also discuss how this new field improves the quality and complexity of biologic drug candidates and the speed with which researchers generate them.

Ray: Hey, Alan, it's really great to be with you here today. I look forward to having a really stimulating conversation. Why don't you tell us a bit about your background?

Alan Russell: I'm a protein engineer by training—started out right at the very birth of the field learning about how to engineer the structure of proteins and ended up in academia. And I spent a long time in the university trying to push the edge of science forward in protein engineering. I was at Carnegie Mellon University, and at the beginning of COVID, Amgen called me and talked about whether it was time to think about taking all of that experience and using it to help patients and make drugs. And there was just something so phenomenally exciting about that that I took this leap of faith and came to join the team.

Ray: We're here today to discuss a new and exciting field that is revolutionizing drug discovery, but let me first set the stage.

Proteins are made by linking amino acids together into a chain. The chain can be short, say 50 amino acids, or as long as a few thousand amino acids. Once formed, the chain folds up into a specific shape that enables the protein to carry out a function or activity. Drugs act by binding to proteins and changing their activity. Let's say that there is a genetic variant that increases the activity of a protein and that increase in activity triggers a disease process. If we can make a drug that binds to that protein specifically and reduces its activity, that drug could be used to treat the disease.

The biologics that your team makes include proteins such as antibodies that influence the activity of other proteins that cause disease. Biologics represent a powerful class of therapies because they can be very potent and very specific, so they can treat many different serious illnesses such as cancer and asthma. 

A big part of biologic drug research is developing proteins that bind their targets in optimal ways. We're now at the point where computational approaches are giving researchers insights into protein structure and function that they can leverage to make better drugs. These approaches are part of a field called generative biology. Alan, can you tell us exactly what generative biology means?

Alan: Generative biology is a field that seeks to extract generalized principles by which biological systems function. What we're trying to do is understand these general principles that allow you to figure out why a biological system actually works the way it works.

Ray: So generative biology then applies not just to protein design and protein folding, but across all functions that occur in biology from the cell to the molecule.

Alan: Yes. Biology works by proteins acting as machines that do things. And we're trying to understand how the sequence of those proteins—if you think of a pearl necklace, and each of the beads—how the sequence of the beads causes the necklace to fold on itself and how that leads it to be able to do its job. So, if we can extract those generalized principles by which biology works and functions, we should then be able to connect the sequence to the structure and the function. That now lets us use what computational scientists call generative models to predict new sequences with functions that we want them to have.

Ray: You've intimated in that description that in going from sequence to structure to function, the idea would be to use methods of AI or machine learning to achieve that. I'm not a computer scientist. So, give me a brief layman's description of what type of AI or machine learning is used to link protein sequence to its structure, to its function, and how do the algorithms work?

Alan: Computer science was very good at doing what are called discriminative models. A discriminative model is something that says, if I know what x is, then I can predict y. If we think about important properties of drugs, one of them is viscosity. So, is it a liquid like water? Or does it flow like honey? That affects greatly whether it's going to be a really good drug. A discriminative model that might be used in artificial intelligence and in machine learning would allow you to take a huge set of data concerning that molecule and predict its viscosity, which would tell us whether or not, for instance, we could inject it easily.

Ray: So could you take us a little bit further under the hood of these generative algorithms. How do they differ from discriminative models and how do they enhance drug development?

Alan: A generative model deals with statistical challenges. So, it's much more challenging mathematically. The easiest way to get our mind around it might be to reflect back on viscosity and whether or not we can we can use it.

If I had a drug, a discriminative model would predict what its behavior would be like, what its viscosity might be like. A generative model will do something even more exciting. It will say, that's the drug you've got and that's the viscosity that it's likely to have. But if you want a different viscosity, this is how you do it. So, in other words, it generates new solutions

If you have x, you can predict y in a discriminative model. In a generative model, if you have x, you can now change the value of x in order to get a different y. That's what's really so powerful because in our world where we're looking for new therapeutics made out of proteins, nature gives us a set of proteins, which we get to engineer and change a little bit. We can change things like viscosity, how they interact with the immune system, but we're kind of stuck with what nature gave us. These generative models allow us to say, we're not limited to that anymore. We can start there, but then generate a whole host of new proteins with functions that we actually need.

Ray: Can you talk about advances, particularly in the area of protein structure determination, that mesh together with these AI algorithms to help drive the work that you do?

Alan: When I talk about these generative models, for them to work as fast as they possibly can today, we'd like to know what the structure of the protein is that we're thinking about. So, it's incredibly useful in order to generate alternatives to have a starting point.

Some time ago, cryo-EM, cryo electron microscopy, was this huge advance. What is that? Basically, a really powerful microscope that works on stuff that's really cold, because when something gets cold, it freezes its shape, it freezes it structure. And if you have a powerful enough microscope, you can look at that structure. Now, we can develop these models that will predict function from known sequence and structure and continue to learn. Just like an iPhone, it looks at your face in order to open the iPhone. Over time, it gets to know what you look like in the morning so that it recognizes you when you first wake up. It gets better and better at recognizing you. And that's the same here.

Ray: There's this analogy to think about enzymes and proteins and how they recognize things called the lock and key model. The protein is like a lock and the thing that it's acting on is like the key. And the two of them have this very precise shape complementarity that enables them to interact in a very specific way, just like one key will only fit into the lock that it's designed for. A lot of the models are oriented towards understanding how do you get the shape of the lock that fits the particular key that you have.

With a lock and key, the whole point is not just to put the key in the lock, but then to turn it. And when you turn it, there are things in the lock that actually move, the tumblers, and that's what unlocks the door. Just like with the key in the lock analogy, for anything meaningful to happen in biology, often the protein has to move, just like the tumblers in the lock move, so that it can have the desired effect. So how do you think about that in terms of AI?

Alan: Motion is really hard to figure out and to predict. I like to think of motion from the perspective of ballet. I'm a big fan of ballet. And if you just for a moment put your mind watching a pas de deux between a ballerina and a ballet dancer on stage, they're moving all the time. When they come together and they intertwine, change shape and create that incredible beauty together, that's when magic happens, right? That's the same here. We have to remember the proteins as they move around inside cells and inside our body. All proteins are moving just like the ballerina and the ballet dancer. But once they find themselves, they can come together.

Cryo-EM, we freeze a structure, we lock it in place. Well, that's only one structure. What about all the other structures? Think of that ballerina moving around, there's all sorts of motion. There's a sort of average place where they be, and then there's a bunch of other movement.

We've always been focused on what that average is. If we could understand the motion and understand when that's important and what is moving in a protein, it opens up the door to a whole host of new approaches to create drugs that intervene.

Now, this has been accomplished to a large part already, in terms of thinking about the interaction of small molecules with proteins. That's because computationally using something called molecular dynamics, you can simulate how a small molecule moves around inside a large molecule. Even in those models, Ray, we can only predict maybe a few nanoseconds of time, maybe a few microseconds of motion. This is an area where we don't just need new software, we probably need new hardware, new types of computers. It's a very active area of biology, and it will open the door to new ways to create drugs.

Ray: When you arrived at Amgen, you kicked off this large effort you named Biologics NExT that's taking a lot of these fundamental principles we've been talking about, linking protein sequence to protein structure to protein function, including things like protein movement, and using those fundamental principles and applying them to making therapeutics that would have beneficial impact on human disease. Can you tell us a little bit more about how you think about Biologics NExT and how you see the integration of all of this knowledge to really help us make drugs?

Alan: We've talked a little bit about these incredible waves of science that are breaking: computational science, automation, robotics, biology, structural biology, molecular dynamics. And if you ever watched people surfing, you have a choice, right? You can either stand and watch the people surfing the waves, or you can just walk away. You can even look at the waves. That's kind of cool, right? You see these? That's what most people do in science is watch the waves breaking and they get excited. We can't just sit there watching these waves. We have to figure out what are the surfboards that will allow us to surf that wave and to harness the power and the energy behind this new science.

What the team did is they simply said, okay, for each one of these waves, what are the surfboards? How do we build them? What do we need to put in place? How can we do our experiments in different ways? How can we create the systems that will generate the data that will power the algorithms? How do we deploy the automation? And what's the foundation upon which we can build? For the last 10 years, we've been building the automation platforms to be ready, so that once the computers were able to use that data usefully, we'd be ready to produce the data. Bringing those all together, the team is getting on that board, on those massive waves, and we're having a heck of a lot of fun doing it.

Ray: When we're designing things, we might have a target structure we're aiming for, and we might end up predicting a sequence to make that structure. We're usually using a combination of biology and the computer. We're making hundreds, if not thousands of different structures and empirically testing them to find which one really achieves what we want it to achieve. When is it going to be the case that I walk into your office and say, Alan, I really want this structure, and you literally hit a button, and then in the next hour, day, the computer spits out and says, make this sequence? And if you make this sequence, you'll have that structure. Is that something that's going to happen in the next two years? Five years? 100 years?

Alan: Already today, the computer's good enough at including the right one amongst a whole bunch of wrong ones. I think what you're getting at is, when will it just be right? Five years sounds reasonable when you've got a very manageable number of predicted molecules. It depends a lot on the complexity of the molecule, the size of the molecules, but I think five years you're going to see a different world than you see today in this kind of science.

Ray: Well, Alan, this has just been fantastic. Your optimism is exceeded only by your intelligence. It's really a pleasure to talk to you. Thank you for being with us today. I look forward to the next time we can get together, sit down, have a beer, and chat about the future of biotechnology.

Alan: Sounds good, free tomorrow. Take care.

The Protein Structure Prediction Problem with Mike Nohaile, Ph.D., Chief Scientific Officer, Generate Biomedicines

Transcript

Episode 2: Generative Biology: The Protein Structure Prediction Problem

Intro: Welcome to The Generative Biology Revolution, a special edition podcast series produced by The Scientist's Creative Services Team.

This series is brought to you by Amgen, a pioneer in the science of using living cells to make biologic medicines. They helped invent the processes and tools that built the global biotech industry, and have since reached millions of patients suffering from serious illnesses around the world with their medicines.

Ray Deshaies
To build better biologic drugs, researchers need to understand exactly how amino acid building blocks interact with one another and fold into functional proteins. This knowledge provides insights into how to engage a drug target or develop an optimal therapeutic. Determining a protein's structure is a laborious process in the wet lab, but thanks to machine learning, scientists can now use various algorithms to predict structure.

In this episode, I talk to Mike Nohaile, Chief Scientific Officer at Generate Biomedicines. Since early 2022, Amgen and Generate Biomedicines have been collaborating to discover and create protein therapeutics across several therapeutic areas and multiple modalities, including monoclonal and bispecific antibody drugs. We discuss the challenge of predicting a protein's structure from its sequence and the steps drug developers are now taking to create novel structures with therapeutic potential using generative biology.

Ray: Mike, it's really great to have you here today. We're going to talk about the relationship between protein sequence and protein structure. Proteins have functions: they either can serve as scaffolding or they can carry out processes inside of cells. You could think of proteins like tools, like a hammer, or a screwdriver, or a plier, where the shape of the tool really demarcates what it can do, it underlies the function. Given the importance of shape, of structure to proteins, what are the major ways scientists determine protein structure?

Mike: There are three methods. The initial method is x-ray crystallography, where you take your protein and you crystallize it in a very regular array, and then you scatter x-rays off it. You determine what the structure is from that.

About 30 years ago, it became possible to add hetero-nucleolar, multi-dimensional NMR to that, which gave some new information. That's a tricky technique to make work for very large proteins, but it shows you more the dynamics of the protein.

And then the new kid on the block is cryo-electron microscopy. That's really exciting because it's very fast. You don't have to put them in crystals, you just put the proteins in their native state on a plate, you fast-freeze them, and you use transmitting electron microscopy to take pictures of them. It's been really revolutionary in the speed and pace at which you can get new structures, but all three techniques have their place.

Ray: One of the grand challenges in all of biology has been this protein folding problem, where you try to understand how you get from the linear sequence of amino acids that comprise a protein to its three-dimensional structure. And you describe what I would call wet techniques for doing that; methods like NMR, crystallography, cryo-EM. What would be awesome is if you could do this in a computer, right? The problem is, at each position in a protein you could have any of the 20 amino acids. Even in a small protein, you might have 100 positions, and each amino acid can adopt a different conformation relative to the amino acid that preceded it. So not only could you have any of the 20 amino acids at each position, but then each amino acid can have a different angle relative to the prior amino acid. Computationally, that leads to an enormous number of possible configurations for even a pretty small protein. Can you elaborate on the computational complexity of understanding how sequence translates into structure?

Mike: The problem blows up, computationally, extremely rapidly to insane places. I'll give you an example, I was working on a protein that had a partially buried tryptophan. It's W on the 20 amino acids and it's got the largest sidechain. It's very, very long, has a couple of rings. I was struggling because that first angle coming off the backbone, if you just moved it a little bit, it moved the hydrogen at the very end of the tryptophan a huge amount. You're trying to computationally move that tiny little bit, then you want to recalculate the whole protein, right? So, it blows up to orders of x to the 100, x to the 200 very rapidly. There's no computer the size of the planet which can calculate that fully. It blows up on you in some computationally intractable way.

Ray: You've described an atomist, or geometric approach that is dependent on bond lengths and angles, to computationally predicting protein structure. Will you tell me about another in silico approach—and analogy-based approach—where scientists discern a protein's unknown structure by comparing it to related proteins whose structures have been previously solved?

Mike: We've made some traction with analogy-based approaches. So, you look at things and say, these are in the same family, let's thread it into that and try by analogy to get something that works in there. Obviously, it doesn't allow you to design new things particularly well, but it does sometimes allow you to pull out things off of primary sequence and at least get a sense of what they're going to look like.

There's all kinds of ranges in between fully atomistic and those analogy-based methods. There's a new kid on the block with a new computational technique that is neither the analogy-based nor the fully atomist, which is modern machine learning, which has been revolutionary in the capability to do this.

Ray: There's been tremendous excitement in the drug discovery field over the advancements in computational protein structure prediction. How did this come about and what is your take on it?

Mike: Sometime around 2010, there was a huge change in the computer science field, as modern machine learning, convolutional neural networks, came to the fore. Those techniques allow you to take very complicated datasets and find patterns in them in a computationally tractable way. That's changed image recognition, shopping, speech to text, natural language processing. That has been now successfully applied to this protein folding problem: I have the primary sequence, I know what those sequence of amino acids are, but what I want to know is what does it fold to? So, these techniques set new standards that have never been met before, and now people are trying them on all sequences that are out there, and they're producing databases: we just took everything that's ever been sequenced, and we're folding all of it, here's what it is. It'd be really great for docking small molecules, things that have been hard to crystallize. It will be a real revolution, but it is just starting.

Ray: If I go in into one of these programs—RoseTTAFold, AlphaFold—with a sequence where there's not a pre-existing structure, so analogy is not possible, or there's not a homologous protein for which there's a pre-existing structure, what percentage of the time do they get the structure right?

Mike: The overwhelming percentage of the time they get pretty close. They'll give you expectations around different pieces like we're confident of the beta barrel but this big loop on the end, we're a little less confident on. Now, the one downside is often that big loop on the end is the thing you care about the most because it's the thing that actually binds. And that's often the place that has the lowest confidence, because it's the least constrained by the dynamics, but still, you get at least a reasonable quality prediction that you can start with.

Ray: Can you give us some insight into how people are using the folding algorithms to understand how proteins identify interacting partners and understand how two different proteins might interact with each other?

Mike: One way to think about modern machine learning is it takes an enormous quantity of data and it abstracts out a very complicated set of pattern recognition tools. The initial tools were for how to fold a protein. Now people are saying, let's build larger datasets that have these multimeric proteins in them and let's learn the rules of binding. Let's see if we can predict which ones bind others. Part of the challenge is, you're going to try a lot of partners, so then you've got to be extremely efficient in your computation. If you're trying to pick a single sequence to a single structure, you can spend a day of computation time on that computation. If you want to look at several hundred possible interactions, you cannot spend a day per interaction to see what's going on there, you have to have something that's a lot faster, obviously. If you already know they interact and you just want the details of the interaction, then that computational thing falls away. But even there, you need examples of these multimeric proteins to do that, and there's fewer structural examples out there than individual proteins. But people have been pushing hard to try to figure out how does that work. What are the rules? What can we learn? And eventually, how can we design our own?

Ray: So, is the protein folding problem solved?

Mike: It is partially solved. There's still refinement, and we will get better as more data comes forward.

The protein folding problem is actually a predictive problem. I have a sequence, what does it fold to? And that's interesting. But what we're really interested in is, I have a structure I want, give me sequences that fold to it. I spend most of my time actually on that second generative problem.

Ray: I know you're itching to talk about starting with a structure that you conceive and then trying to figure out what amino acid sequence would give this structure. Can you explain for me, what does generative biology mean to you?

Mike: So, it is back to this ability to not just say, hey, this is what's in nature. Can we explain it? Or, we find a new thing in nature, we can predict what it's going to do. Can we get enough control that we can say, here's a problem that we think would be neat to solve biologically? We want to interdict a particular target inside a cancer cell. We're going to create something brand new that's never been seen before and actually generate a new biological entity that has the functional characteristics. I think of it as going from sequence space to function space.

Ray: How are you thinking about applying these principles of generative biology? Where do you see Generate Biomedicines going? Where do you see the field going?

Mike: We want to use the new ability to generate novel biological entities, particularly proteins, to start considering new problems that have never been solved before, maybe levels of molecular engineering and complexity that have never been seen. So, we really want to pioneer the ability to generate these new things, and that capability will open up areas we haven't even considered.

We see the opportunity to pioneer the ability to work at a scale that just hasn't been possible before. If you're making antibodies, you're running it through a humanized mouse immune system or something, and it gives you what it gives you. What's nice about computers, you can say I want what I want. If you look at the SARS-CoV-2 spike protein, you'd think there are thousands of places to bind antibodies there, but you can only bind antibodies to a few places. Some of the really interesting places you want to hit are not the places the immune system binds to, and I don't think that's random. The ability to computationally say, I'm going to go to a highly conserved domain that the vertebrate immune system doesn't give to you, and I'm going to be able to optimize it exactly the way I want—the binding, the half-life, the immunogenicity—allows you to dream completely differently about what you can do for interdiction It's not just proteins. You can imagine interactions of all kinds of molecular entities in complicated ways that allow you to do things you couldn't do before.

Ray: What do you see as the low hanging fruit right now for your business? Where do you see the greatest opportunities for applying generative biology to problems that exist today that you can solve in a short timeframe and really bring value to the development of new medicines?

Mike: There's a lot of modalities in proteins right now: monoclonal antibodies, bi-specifics, other stuff. So, working in monoclonal, which are well understood, gives you a lot of the lower hanging fruit; there's a lot of data to learn from. And then there's lead generation and lead optimization. If I start with lead optimization, there's a lot of antibodies that are good, but they're not great. Maybe you want them to bind 10 times more because you want their half-life to be longer, or they're good, but they're giving you immunogenicity and so you can't make them into drugs because patients make antibodies to the antibodies and wipe them out.

We've had good success in optimizing proteins very rapidly in a small number of months around these properties and taking things that were maybe marginal or not even clinically valid and saying, let's make this into a really great antibody, let's do it relatively quickly, and make something that is now clinically useful. And then, lead generation, where you say, maybe there's an epitope that hasn't been hit. Say you have to hit this protein in a very specific way to turn it up, and I'm struggling to get that out of the mouse or the yeast display. With a computer, you can say, give me 1,000 designs only to that place. We've had some good success hitting new epitopes that way.

Ray: What do you think the field is going to look like? You've seen how it's progressed in a year. How do you think it's going to look in 10 years?

Mike: First of all, I was surprised that this technique works so well for protein folding and design. I didn't think it would actually, if you'd asked me several years ago. I think in 10 years, this is simply going to be the way that, at least on the protein side and small molecule side, things are done. It's all going to be computationally driven, you're going to need a tight wet lab/dry lab where you do a design and test it to make sure it's doing what you think you're doing. I'm not so sure you're going to see humanized mice and yeast displays used in the same way that they use now because I think this is going to completely replace it.

What has struck me is, if I were young scientists, I would go super hard at the computation on the data side, and I would say, I don't have to be a data scientist, I don't have to be a computational biologist, but I better understand what it can do and where it's at because I should be trying to exploit it everywhere I can. Given how powerful these techniques have been, I should be on the cutting edge of applying it to my problem.

Ray: Well, Mike, this has really been just a phenomenal discussion. I always enjoyed talking to you, you bring a lot of passion, you bring a lot of excitement, you bring a lot of vision. I really look forward to our collaboration with your group at Generate and I look forward to watching what you guys are able to create as you forge ahead building up this new science. Thanks so much for joining me today.

Mike: Ray, it's been great. Thank you. I really look forward to our collaboration and let's see what we can do.

The Scientist: Thank you for listening to The Generative Biology Revolution, and thanks again to Mike Nohaile, Chief Scientific Officer at Generate Biomedicines. To dive further into this topic, please join Amgen scientists at the Generative Biology Q&A webinar discussion on July 20th, 2022. Register for the event at the link provided in the episode notes.

Machine learning and AI are ushering in a new era for predicting protein structure in drug discovery research. In the next episode of The Generative Biology Revolution, we'll talk with David Baker from the Institute for Protein Design at the University of Washington about the future of protein design. To keep up to date with this podcast, follow The Scientist on Facebook and Twitter, and subscribe to The Scientist's LabTalk wherever you get your podcasts.

Protein Design from Scratch with David Baker, Ph.D., Institute for Protein Design, University of Washington

Transcript

Episode 3: Protein Design from Scratch

Intro: Welcome to The Generative Biology Revolution, a special edition podcast series produced by The Scientist's Creative Services Team.

This series is brought to you by Amgen, a pioneer in the science of using living cells to make biologic medicines. They helped invent the processes and tools that built the global biotech industry, and have since reached millions of patients suffering from serious illnesses around the world with their medicines.

Generative biology is a revolutionary approach to drug discovery and development that leverages machine learning and AI to design novel protein therapeutics. It holds the potential to enhance the speed and efficiency of discovery. In this series, Ray Deshaies, senior vice president at Amgen, discusses how generative biology is transforming drug discovery to make it more predictable, shorten timelines, and increase success rates of bringing life-saving medicines to patients who need them most.

Ray Deshaies: Naturally-occurring proteins have evolved over millions of years to perform specific functions based on their sequences and folded structures. As our understanding of science advanced, researchers began designing proteins from scratch to solve new challenges that modern societies face.

In this episode, joining me is David Baker, director of the Institute for Protein Design at the University of Washington and one of the creators of the RoseTTAFold protein structure prediction tool. We talk about how to design proteins with sequences and structures that impart novel functions and how designed proteins will revolutionize drug development.

Ray: David, it's really fantastic to be here with you today to talk about the future of protein design. One thing that's quite different about your career arc from other people that I've met in your field is that you have diversity in your background; you majored in physics at Harvard when you were an undergrad, you came to Berkeley and you worked on cellular biochemistry, then you went on to do a postdoc where you did more biochemistry. Now, you've gone in this direction of combining different types of science to study how sequence yields structure, and then how to predict what structure a protein would have by coming up with different sequences. How do you think that background has influenced your career?

David: It's been really important. For protein design, the computation is just part of it. There are other very important aspects: the number one is critical experimental evaluation. It's very easy to design things on the computer that look like they should solve whatever problem you want them to solve. But it's entirely another thing to test them experimentally. Protein design isn't done in isolation; one has to think about the applications. And certainly, having a good background in cell biology has helped me think about the areas to pursue. I have an idea of the right questions to ask.

Ray: There's different labs in the protein design space, and people have their different algorithms and their different approaches that they take. What are the major approaches that are currently being taken in the general area of protein design?

David: You sequence a given a protein of interest and a particular site on that protein, design a small protein which binds very, very tightly to it—it's sort of like having a particular lock, design a key that fits in it. This is an important problem biomedically, because having proteins that bind very tightly together could be the basis of therapeutics, diagnostics, sensors, and so forth.

The first approach is a traditional physical model-based approach. The goal is to design an amino acid sequence that will fold up into a structure, which fits against the target—it makes shape-complementary and chemically-complementary interactions. The whole problem is framed in terms of energy; these proteins fold to the lowest energy states. The designed amino acid sequence should have as its lowest energy state the designed monomeric structure, and then that system of the monomeric structure and the target should have this as its lowest energy state, that bound complex. In developing the RoseTTA program, which we use for these calculations, we've sought to make those energy calculations as accurate as possible, describing hydrogen bonding accurately, and so forth. The new approaches instead involve deep learning. It's now pattern recognition and a deep understanding of sequence-structure relationships that's inherent in networks like AlphaFold and RoseTTAFold. These methods are proving to be very powerful.

Ray: You mentioned binders—proteins that bind to a specified target. Are you able to design protein binders right now? What can you design in terms of protein functionalities?

David: It's case dependent because there are some targets which are very difficult for us to design binders to, but we've had considerable success recently designing systems which self-assemble into, for example, completely de novo drug delivery vehicles or vaccine platforms, sensors which undergo conformational changes which can be read out by luciferase activity, proteins that bind to small molecules, and proteins that catalyze chemical reactions. So, one of the exciting areas about protein design is the possibilities are really wide. And that's exemplified by the very wide range of functions that proteins carry out in biology, which really illustrates how versatile proteins can be.

Ray: What's still very hard for you to design?

David: If we have a protein surface which is highly charged, very polar, it will interact strongly with water molecules, and it's hard for us to design binders which bind to those surfaces. For enzymes, chemical reactions which there are multiple steps where the enzyme has to compromise between facilitating each of those steps are still very hard.

Ray: My understanding is when you're designing a protein, you're trying to come up with a sequence that will adopt a particular fold. An enzyme induces catalysis by stabilizing the transition state of the reaction. So, you can design your target fold to the transition state, for example. But we know in the course of catalysis that amino acid side chains and sometimes even the backbone have to move to go from a configuration that binds substrate to one that's in the transition state to one in which the product is formed and now gets released from the enzyme, and then cycle back to the state that could bind substrate. Is that something you can do in a deliberate way now, to design not just the target fold, but a target fold that will go through this molecular choreography?

David: One of the things that that makes enzyme design much harder than binder design is in binder design, you need to design a protein whose lowest energy state is perfectly complementary to the target. But an enzyme has to first bind the substrate, then has to selectively stabilize the transition state. And this transition state can often only vary in subtle ways from the from the substrate. And then finally, it has to release the product. The protein has to be able to move to accommodate those different states, and it has to compromise between them. Those are things which we're not very good at yet. Those are those are current challenges, how to model those dynamics. Another problem that we're working on now, which is related is the design of molecular machines, where it's important to couple chemical energy to mechanical work, and that involves similar trade-offs. Motion is really important.

Ray: Let me ask you a bit more about designing protein sensors. Something like a glucose sensor could solve the problem of monitoring and insulin injection for diabetics. Could you make an artificial device or engineered cell that binds glucose in the body, senses the molecule, and releases insulin on demand?

David: Glucose is a very polar molecule that interacts very strongly with water. So, the problem is not designing the sensor, it's designing the binder. For other types of compounds, like Coronavirus, which are not as polar, we've been able to design molecular devices that have a closed state and an open state and a binding element module for the target is caged. The thermodynamics of target binding cause the system to open at which point it emits light in the case of the luciferase-based sensor. So, once we can solve the binding problem, it's not hard for us to transition to a sensor.

Ray: We've focused a lot so far on protein function, but there's other aspects of a protein that ended up being really important from the point of view of developing a therapeutic. Can you design the half-life of a protein? Can you design the biodistribution of a protein? Can you design a protein so that it's fully synthetic and non-human, but it's not immunogenic?

David: Those are all very important properties. You can incorporate all the properties you want in your design effort. For example, we've been designing a new class of compounds which are smaller and made out of unnatural amino acids to get across biological membranes. We can design classes of compounds now using computational methods whose properties we can control. One of the biggest question marks is immunogenicity. There we can incorporate properties that are likely to reduce immunogenicity. You can make proteins that are very stable, very soluble, and relatively small in size, so they're unlikely to be presented efficiently on dendritic cells. But we can't completely rule out the possibility of immune response, so design proteins will need to go through the same sort of testing for safety and immunogenicity that any new drug candidate would.

Ray: How do you design proteins that cross biological membranes, which isn't something that they can naturally do unassisted --how does that work?

David: There are peptides, like cyclosporin, an eleven-residue cyclic peptide, that gets across membranes very effectively. There are a few examples of peptides like this in nature which are thought to perhaps switch conformations to enable this traversal. And we can now robustly design the whole class of those molecules.

Ray: Do you see this driving towards a future where you could have orally bioavailable biologics? That would be a game changer because right now, biologics, you have to inject them subcutaneous or infuse them into a vein. If you could cross membranes, you could formulate them into a pill and then just ingest it. Do you see design going in that direction?

David: There are two types of membrane permeability: there's passive permeability, where the biophysical properties of the protein make it permeable. That's going to be restricted to compounds which are quite small in size, like the cyclic peptides. There is facilitated transport that makes use of cellular uptake mechanisms and the size range is much larger. There also are things like diphtheria toxin, which have figured out how to get proteins across membranes. There are certainly opportunities for orally-available biologics, but the approach I described earlier is going to be limited to smaller compounds where you can engineer the outside in at least one state to be largely nonpolar.

Ray: One of the real big stumbling blocks for biologic therapy has been the blood brain barrier. Biologics, in terms of the best-selling drugs in the world, they've largely taken over that realm, in part because of their tremendous therapeutic efficacy coupled with their considerable safety advantages. But that has not happened in neuroscience, particularly diseases of the central nervous system. Now we're talking about getting these design proteins across membranes or taken up into cells. Do you see a role for protein design in potentially cracking an intractable barrier for biologic medicines?

David: We're working hard on blood brain barrier traversal, and we're focusing mainly on the receptor-mediated approach. So, we have designed small proteins which bind to the transferrin receptor, away from the transferrin binding site, that are showing promise in shuttling compounds. And we're designing now a series of small proteins to other targets at the blood brain barrier. Those would be more like shuttles; you would attach a cargo to them. The approach with cyclic peptides, the passive diffusion approach, is very interesting. But the compound itself has to be the drug because if you start elaborating it by fusing on a large additional drug moiety, you're likely to impede passive permeability. So, this is a sweet spot for protein design to help solve the blood-brain barrier traversal problem, but there's still a lot of work to be done.

Ray: One thing that I've been really interested in is targeted protein degradation by hetero bifunctional molecules that Craig Cruz and I came up with. We call ours PROTACs. The way these work is they bind the ubiquitin ligase with one hand, and they bind a target with another hand and serve as a bridge to link a target to ubiquitin ligase. And then ubiquitin ligase puts ubiquitin on the target, the target gets degraded. The ones that are most effective, what you see is that when you bring the target and the ligase together, they fortuitously find some interaction surface, such that you have a cooperative formation of a ternary complex between the three molecules: the ligase, the target, and the hetero bifunctional compound. Let's say I have a target and I want to figure out which ubiquitin ligase might it interact with so that I can make a PROTAC to join them together. Is that a problem that's addressable using this approach?

David: It's really a fascinating protein design problem. You have three rigid bodies that are moving at the same time. That's what makes the design of molecules which induce these ternary complexes challenging. As far as taking advantage of the new ability to predict protein-protein complexes, these methods still rely to some extent on coevolutionary information. So, they're taking advantage on how residues are co-fairing at the interface, which means that they aren't going to be anywhere near as effective for complexes which don't actually assemble. Physiologically, we found that the methods are quite effective for design proteins, but they're binding to targets. We need a little bit more improvement in protein-protein docking or prediction methods to get to the point of designing complexes that are stabilized by a small molecule. It's somewhat harder than predicting the structures of naturally-occurring complexes.

Ray: We discussed the major types of design and you brought out two different schools: the physics-based school that's focused on forces and the information-based approach that's driven by artificial intelligence. As you're projecting to that future, which of those do you think is going to play the biggest role?

David: There's going to be an integration between them. For challenges where it's entirely protein systems made out of 20 amino acids, I think the deep learning methods are very powerful. But as you start getting to small molecules then physical models are going to be important. We also talked about dynamics and enzymes, and their physical models are going to be very important as well.

Ray: What's your vision of where you see the field of protein design going? In the next 10 to 20 years, what's the boldest thing that you have in mind for this field achieving in that timeframe?

David: The things that we're doing now, I don't think I could have even conceived of two or three years ago, and so projecting 10 to 15 years out is hard. I'm optimistic that 10 to 15 years from now, medicine will really be transformed by protein design for the reason that you alluded to earlier, that you can incorporate all of the properties that you want in an ideal medicine, in terms of functionality, side effects, biodistribution. And the more we understand, the more we can encode in the therapeutic or vaccine. Whereas approaches that don't involve design, there's always going to be a lack of control. For example, if you have to use a library or an animal to make an antibody, you're going to have a fundamental lack of control over what the actual properties are of that selected binding interface. I always hope that the most exciting applications will be things that I can't currently conceive of because new paths have been opening up at an astounding rate over the last several years. It's really what makes science exciting and fun.

Ray: David, this has really just been fantastic. I just want to close by thanking you for taking the time today to come talk to us about protein design and where you see the field. Thanks so much, Dave. It's really been a pleasure.

David: Thank you, Ray

The Scientist: Thank you for listening to The Generative Biology Revolution, and thanks again to David Baker, director of the Institute for Protein Design. To dive further into this topic, please join Amgen scientists at the Generative Biology Q&A webinar discussion on July 20th, 2022. Register for the event at the link provided in the episode notes.

Protein design and structure prediction are making big waves in the pharmaceutical industry. In the next episode of The Generative Biology Revolution, we'll talk with Suzanne Edavettal, the executive director of Biologics Optimization at Amgen, about the practical applications of protein design for drug development. To keep up to date with this podcast, follow The Scientist on Facebook and Twitter, and subscribe to The Scientist's LabTalk wherever you get your podcasts.

Accelerating Drug Discovery with Suzanne Edavettal, Ph.D., Executive Director, Protein Engineering

Transcript

The Generative Biology Revolution

Episode 4: Accelerating Drug Discovery with Protein Design

Intro: Welcome to The Generative Biology Revolution, a special edition podcast series produced by The Scientist's Creative Services Team.

This series is brought to you by Amgen, a pioneer in the science of using living cells to make biologic medicines. They helped invent the processes and tools that built the global biotech industry, and have since reached millions of patients suffering from serious illnesses around the world with their medicines.

Generative biology is a revolutionary approach to drug discovery and development that leverages machine learning and AI to design novel protein therapeutics. It holds the potential to enhance the speed and efficiency of discovery. In this series, Ray Deshaies, senior vice president at Amgen, discusses how generative biology is transforming drug discovery to make it more predictable, shorten timelines, and increase success rates of bringing life-saving medicines to patients who need them most.

Ray Deshaies: The ability to design proteins to perform desired functions will transform drug development. In particular, with AI and machine learning, scientists gain the ability to engineer antibody-based drugs, including multispecifics which engage multiple targets. By altering existing protein structures or developing proteins de novo, biologics will become more effective and specific.

In this episode, I speak with Suzanne Edavettal, the executive director of Biologics Optimization at Amgen about how protein design affects drug development and success rates in the clinic today and in the future.

Ray: Suzanne, it's really great to have you with us today. Can you start by telling me, how did you end up becoming a protein engineer?

Suzanne: I like to think I began my love affair with proteins in graduate school. This is where I found tools like x-ray crystallography, which allow us to study the structure of proteins. Proteins have this remarkable diversity from a very small number of building blocks. There's 20 amino acids that make up a protein, but they fold in vastly different shapes. And it's all of those different shapes that result in this diversity of function we see in proteins. It was at that point in my scientific training that I got deeply interested in trying to understand how those shapes, how that structure, how that foldedness imparted different characteristics to the molecule. I realized that I wanted to get into the pharmaceutical industry. My motivations were a desire to take the thing I love doing and use it to help patients. My first part of my career in industry was doing structure-based drug design on the small molecule side. And through that process, I realized that the people who seem to have the most fun were the chemists, the people making the molecules, and I knew I wanted to be a molecule maker. Now, because I'm a protein chemist and not synthetic organic chemist, that meant I needed to make a transition to biologics. That's how I ended up switching over to large molecule discovery, taking all of the tools and training that I knew from my time doing structural biology of proteins, but now applying it to make proteins into drugs.

Ray: Given that you've had your foot in a small molecule camp, in the biologic camp, can you tell me about different types of drugs that you could make that are based on proteins?

Suzanne: We have peptides, and these tend to be less than 50 amino acids—insulin is the best known of these drugs—and often used to replace a peptide that exists naturally in people that is in a disease state missing or under produced. There's enzyme replacement therapy used in individuals who typically for genetic reasons are deficient in a particular protein. We can supplement those proteins; we can make them in a test tube and then give them to patients. And then, we end up with antibodies who are designed to modulate the activity of a protein in the human body. We can have antibodies that block functional pathways, we can have antibodies which bind to a protein and clear it.

Ray: Antibody-based drugs are very prevalent in the industry right now. What makes antibodies so special and useful as drugs?

Suzanne: Antibodies are quite large proteins, and they have a Y shaped structure. At the tips of the Y, we have a hypervariable region that is able to accommodate a whole variety of different sequences, which means it's able to accommodate a whole variety of different shapes. We can exploit that diversity and shape to modulate or to bind to different types of proteins in the human body. Most other proteins, if you start making mutations to them to elicit a particular function, the whole protein falls apart. But with antibodies, we can make vast different combinations of sequences and achieve the shape the structure that we want.

Ray: Why are proteins and antibodies in some cases better drugs than a small molecule like aspirin or acetaminophen or something like that?

Suzanne: If you're assessing the druggability of a molecule for small molecule intervention, you need a small place on the surface of the protein that your small molecule can bind. Many proteins have these, and that's why small molecule drugs are very popular and prevalent in the field. But not all proteins do. Any time we want to modulate the activity of a protein, and the surface is very large or very shallow, we run into a limitation on what small molecules can achieve just through molecular interactions. This is where large molecules shine. They're designed by nature to achieve protein-protein interactions in these large shallow spaces on the surface of proteins.

Ray: That's the good side of protein-based drugs. What are the limitations?

Suzanne: Most small molecules can be formulated for oral delivery, so you can take them as a pill. Whereas most large molecules have to either be injected subcutaneously or IV, which is a much more complicated delivery mechanism. And they also have to be produced by cells. There's a lot of variability introduced by the fact that we're using a biological system to produce our drug, and it increases the complexity of manufacturing the molecule. It also means that the molecules have to be stored differently. They're not stable at room temperature, at elevated temperatures. Your pharmacist or your doctor's office has to have a freezer to keep them. Small molecules can be formulated into a pill, packaged into a bottle, and shipped all over the world without incident.

Ray: So, somebody in a therapeutic area comes to you and says, we want to make an antibody against this given protein because we think it'll be a good drug. What are the series of steps that you and your team go through to turn that idea into that protein-based drug?

Suzanne: It is a fairly long process on the order of years for us to go from what you just described to a product, and the first step in the process is immunizing animals. We tend to use transgenic mice or rats for this, and these are animals that have been engineered to produce human antibodies. Following an immunization, tens of thousands of different antibodies will be screened for binding and functional properties, which results in hundreds to thousands of leads. Then the work really begins because we need to characterize those sequences not only for function, but also for a variety of physical and chemical properties, ensuring that has all the biology we need the drug to have, but at the same time, we're able to make it into a manufacturable product. This is where we end up spending a lot of resources trying to screen and understand the molecules' physical and chemical properties at an early stage. Protein engineering works to not only tailor that sequence to achieve the function, but also the stability that that we'll need downstream for manufacturing.

Ray: You've got your workflow for going from an idea to an antibody-based drug. Where do you see protein design fitting into that, the ability to go in and use computational approaches to impinge on this process? How do you see it solving some of the problems that you identified earlier?

Suzanne: It's particularly challenging to characterize a number of the chemical and physical properties at the early stages in the discovery process when we have a large number of candidates and the antibodies are present in very low quantities, low concentration, low levels of purity. Because it takes us so long to get to that stage where we can start measuring these qualities, having the ability to better predict and design around them at the beginning would be hugely advantageous to us as an industry. Viscosity is one of those physical properties that's absolutely critical to a molecule's success in the clinic in terms of how we deliver the molecule through a syringe to patients. So, when a molecule is highly viscous, something like honey, it's much more difficult to push that through a needle than when a molecule is not viscous, like water, right? You can imagine the very different kind of patient experience that you would have if you're trying to inject yourself with honey as opposed to inject yourself with water. For us to understand a molecule's viscosity, we have to make large amounts of protein concentrated up to very high concentrations, which requires it to be very pure, and then physically measure viscosity. We often don't do this until the very end of a project, and then when there's a surprise, where a molecule is extremely viscous, it sends us all the way back to the beginning.

So, for this reason, we have devoted a lot of effort into trying to develop computational methods to predict viscosity, and within the last year, we have an algorithm that does this with fairly high fidelity. We deploy it right at the beginning of our engineering stage, so instead of having to wait all the way to the end, we can now predict which sequences will have high viscosity, which ones will have low viscosity. We can either deselect those sequences that we predict to have high viscosity, or proactively engineer them at the beginning of our process, so that the end, we have no surprises when it comes to viscosity.

Ray: Do you see these computational, in silico methods increasing the success rate with which we can bring a clinical candidate forward?

Suzanne: When it comes to quality and success, I see a very strong place for these computational methods. We think about attributes like immunogenicity. This is the space that's just rife for innovation when it comes to computational methods to predict. Immunogenicity is one of those aspects of a molecule that's extremely difficult to characterize, it's extremely difficult to predict. You often don't know until you start dosing patients. Bringing to bear the full power of machine learning to take the totality of our clinical experience and use that to better predict which sequences are going to have less immunogenicity in the clinic will ultimately benefit patients, will ultimately benefit our ability to produce drugs, that they're viable and have the functions we need them to.

Ray: You described earlier how drug developers typically produce antibodies by immunizing animal models. This is a long and imperfect process. What can we do to improve upon antibody drugs with our current computational methods? Will we ever be able to make fully in silico-designed antibodies that are as good as the ones that come from immunization?

Suzanne: The physical chemical properties will be tackled much sooner than de novo design because the regions of the antibody, the business end, the tips of those Ys, are loop regions, which are very difficult for us to model. Part of what makes them remarkable is that they can adopt all these different shapes. But that, of course, comes at a cost of flexibility, which is very difficult for us to predict and to model accurately using our current techniques. The whole structure of the molecule are actually playing a role in the shape in which those loops take. We need a way to model those very small loops in terms of the entire structure. I have optimism we're going to get there because we see papers coming every day talking about improvements in modeling protein-protein interactions. It's that foundation we see growing at an exponential rate, and I anticipate we're going to be at this place where de novo design of antibodies is a real possibility. We can move from screening and vast experimentations to a place of designing molecules with purpose.

Ray: You are in charge of running the biologics group at Amgen. In the past year, Amgen has done a couple of really interesting deals. First, the acquisition of TeneoBio and, more recently, a partnership with Generate Biomedicines. Those are designed to launch a strategy in generative biology. How is this impacting what your team does on a day-to-day basis? How do you see this changing how we develop protein-based drugs, including antibodies?

Suzanne: The goal of generative biology at Amgen is to take our experience in high throughput automation and protein engineering, combine it with machine learning and algorithm development to deliver complex multispecific medicines against a variety of diseases. We haven't spoken much about multispecifics, but that's where the field is headed. That's where we take protein, like antibody therapeutics, and instead of binding to a single target, we engineer them to bind to multiple targets at the same time. This is critical for modulating the biology.

The Teneo acquisition is a huge advancement in our ability to build these complex multispecifics. The core of the Teneo technology is a transgenic rat, which was called the Uni rat. It produces heavy chain only antibodies. These do occur in nature; llamas, alpacas, camels make these molecules. They're more simple structures than what we see in people, in rodents. Typically, the Y portion of an antibody consists of both a heavy and light chain, it's a heterodimer. And in the Uni rats, we have heavy chain only, so we vastly simplified the molecular complexity of the molecules. Now we have a building block on which we can produce these multispecifics. You can think of multispecific engineering like you're trying to build a house. An ideal house would be built with very solid bricks, and we know that antibody and antibody fragments are not those solid bricks that we'd like to build our house with. But heavy chain only antibodies are extremely stable. They have ideal physical chemical properties. They allow us to put them together in such a way that we can not only get the biology we desire, but of all the physical chemical properties we need for manufacturing.

Ray: How do you see that playing out in the next 10 years? What do you think your job will be? Will it be different?

Suzanne: The pace of science is tremendous, and it's a really fun time to be in this field. We are starting to learn generalizable principles about multispecifics. So, we're understanding which architectures we need to achieve specific kinds of biology. Instead of having to relearn this rule every time, every molecule, which is a very resource intensive process, we're now understanding that specific formats are able to do things like engage T cells. We've learned the molecular geometry we need to achieve that. Because we now know the rules, we can design the molecules from the beginning intentionally to have the properties we're looking for. As we see these immense advancements being made in protein-protein prediction algorithms, we're going to get even better at being able to look at the distribution of proteins we want to target on the surface of the cell and be able to design the architecture that we will need to achieve that biology.

Ray: What would your one wish be for protein-based biologic drugs?

Suzanne: Biologics have this exquisite specificity. They largely do what we design them to do, and the next step is making them do what we want them to do, where we want them to do it. Computational methods are what we're going to need to achieve that. Something I see coming much more rapidly is our ability to design molecules for manufacturing so that we spend less of our time working on having molecules with the right physical chemical properties, and then we can spend more of our time on the biology. If we can have those computational methods in place so that is no longer something that requires a major investment of energy, I think we'll see these huge transformations happening in the biology side.

Ray: You've sketched out for us a future where there's going to be more computation involved in drug discovery and development for biologics. What's the implications of that for who are contemplating a career in biologics for the future? What's going to be needed? What are the skills? What are the abilities? What should they be looking towards in terms of educating themselves so they will be primed to make to make major contributions in the future?

Suzanne: What we really need are scientists to understand protein engineering and how we do our work today but are also fully competent in the space of data science and machine learning. Hybrid individuals who can speak both languages. I understand how to engineer a protein empirically, I understand how to design those experiments today. But that's not what we're going to need for the future because we know that kinds of datasets that are best suited for human learning are not the datasets best suited for machine learning. To expand what we're able to do today and see the explosion in our ability to use these tools will come from scientists who understand how to make data for machines, instead of making data for people.

Ray: Suzanne, with this compelling vision that you've laid out today, I look forward to seeing what your team generates over the next decade as you are innovating and incorporating these new technologies. Thanks so much for joining us and educating us about the future of protein-based drugs.

Suzanne: Thank you, Ray. It was a lot of fun. My pleasure.

The Scientist: Thank you for listening to this final episode of The Generative Biology Revolution, and thanks again to Suzanne Edavettal, the executive director of Biologics Optimization at Amgen. To dive further into this topic, please join Amgen scientists at the Generative Biology Q&A webinar discussion on July 20th, 2022. Click here to register for the event.