← Hidden Layers: AI and the People Behind It

AI Is Designing the Next Cancer Fighter | EP.53

Hidden Layers: AI and the People Behind It · 2026-05-14 · 42 min

Substance score

63 / 100

Five dimensions, 20 points each

Insight Density12 / 20

Originality12 / 20

Guest Caliber11 / 20

Specificity & Evidence15 / 20

Conversational Craft13 / 20

What our scoring noted

Our reviewer’s read on each dimension, with quotes from the episode.

Insight Density

12 / 20

Dense with genuine scientific explanation (protein folding, AlphaFold, diffusion models, CAR-T mechanisms, the verification dilemma) that a curious listener would learn from, though much is educational background rather than novel operator-relevant insight, and there's notable repetition and meandering.

it's sort of like running cloud code, except you can never run the code

actually like half of the designs didn't even like results in really like functional T cells at all

Originality

12 / 20

Some fresh framing (the code-can't-run analogy, generated proteins looking unlike natural sequences, the honest pessimistic note about low hit rates and lack of a 'CASP for design'), but much of the protein-folding narrative is standard popular-science recounting.

these generated proteins look very different from like regular proteins in terms of the sequences

there's no casp there's no like critical assessment of these methods you know head to head

Guest Caliber

11 / 20

Guests are PhD candidates who actually organized and ran the BitsToBinders challenge—genuine hands-on practitioners on the specific topic—but they are early-career students rather than senior operators who have done this at scale.

all of whom are PhD student candidates at the University of Texas at Austin

very young PhD students maybe in their first year who are interested in this space

Specificity & Evidence

15 / 20

Rich in concrete detail: exact counts, success rates, sequence constraints, named companies and named winning teams, specific proteins and timelines—well above average for evidentiary grounding.

we tested 12,000 sequences in pooled screens

the best hit rate was nuclear UK London and they had a 38% hit rate

Conversational Craft

13 / 20

Host asks several pointed, well-structured questions and follows up (the digital-to-real gap, hardest step, student-led vs. corporate constraints), though the tone is largely celebratory with little genuine pushback, and one guest had to volunteer the skeptical counterpoint himself.

What was actually the hardest part? Was it what you thought it was going to be?

How big is that gap? Like what did you learn... is the gap bigger than you expected?

Conversation analysis

Computed from the transcript - who did the talking, and the verbal tics along the way.

Filler words

like211so144right52you know42kind of24um21sort of17actually16uh11I mean11basically5honestly2

Episode notes

What if AI could design proteins to help your immune system find and kill cancer cells? That's not a hypothetical — it's what 28 teams across 40 countries attempted in the Bits-to-Binders Challenge, an open-science competition organized by PhD students at the University of Texas at Austin. In this episode, Ron sits down with three of the organizers — Clay Kosonocky, Daryl Barth, and Aaron Feller — to unpack how they pulled off one of the most ambitious student-led experiments at the intersection of AI and biology. Together, they submitted 12,000 AI-designed protein sequences to bind to a cancer target called CD20, then validated the results in real biological assays. The conversation covers the 100-year history of protein folding, how AlphaFold changed everything, why AI biology can't just rely on benchmarks, what a CAR-T cell actually does, and what a 7% hit rate tells us about where the field really stands. Plus: open source science, the verification gap between digital predictions and wet lab reality, and why a global team of strangers working together might be the most hopeful signal of all.

Full transcript

42 min

Transcribed and scored by The B2B Podcast Index.

Welcome to Hidden Layers, where we explore the people and technology shaping artificial intelligence. I'm your host Ron Green. Today we're going to talk about a quiet revolution happening apology, one that I am personally extremely excited about. Namely the use of AI to design new proteins to fight disease. In AI, we're used to benchmarks. We measure models against known answers, compare the scores, and often treat the benchmarks as the test. That can be useful, but biology has a way of humbling clean benchmarks. A design protein can look promising on the computer screen and still fell when it synthesized, expressed, folded, placed inside a cell, and asked to do something useful. That's what makes the bits to binder challenge so interesting. This challenge, organized by the University of Texas as Austin BioML Society, asked teams to use modern AI tools to design small proteins that could help immune cells recognize and attack tumor cells. In plain English, the goal was to see whether AI could design a new biological binding upon it for a CAR-T cell, one that could find the right cancer target and help trigger an immune response. The designs had to do more than look good on a computer. They had to express, bind to the target, and activate the associated CAR-T cell in a real biological test. 2018's spread across 40 different countries used open source AI methods to design and submit a combined 12,000 protein binders. Then those designs were tested in vitro, moving from evaluation to biological reality. That alone to make the challenge notable, but the open science approach they took is just as important. All of the software is open source, and all the designs and methods are being made publicly available with no intellectual property claims, so the results can help move the scientific field forward. Joining me today to talk about the challenge are the organizers, Clay Kinozki, Darrell Barth, and Aaron Feller, all of whom are PhD student candidates at the University of Texas at Austin. Together, we'll explore what happens when AI design biology moves from prediction to real world experiments. I'm unbelievably excited to have you all here. I've been waiting for this episode for almost years. This is going to be a blast. All right, so let's start at the beginning. I teed up why this challenge was so important in the intro, but I didn't really talk about why our protein is important and why is protein folding prediction so important? What did alpha fold do that change to the game? Let's just start at the beginning there. Right, so I think one of the most important things to address is that proteins are kind of the basis of biology in terms of interacting with the material world. There are micro machines that make the materials in our body, in some cases, or perform all of the functions that that need to happen, so it's really important, and oftentimes the structure of something will dictate its function, so it's really important to be able to look at a protein and say, like, oh, it does X, and we can figure that out because of the way that it looks. Because we know the shape, we can predict more accurately how it will behave. Exactly. Okay, yeah, okay. And so that's been kind of the dark matter of understanding proteins for a while. It's like we could figure out what the amino acid sequence was, what it was composed of. Because there's a sequence of amino acids, and then they get folded, converted into a protein structure. Yes. Okay, it happens reliably over and over, right? So you have the same sequence that folds into the same structure every time. Actually, before the recent Nobel Prize for protein folding and protein design, you had, I think, in 1923, Anfinson's dilemma. So he proposed this idea of, oh, we think these amino acid strings fold the same way every single time. I didn't know them back to the 20s. So by proposing he won a 70s, all right. Okay. He won the Nobel Prize for pretty much proposing the question and finding some of the first information that they fold the same way. Right. So it's a 100-year problem. When he proposed that, was that controversial? Do you know? Not sure if it was controversial, but his idea was that all you needed is the amino acid sequence. Because I could see somebody saying, like, yeah, but you would need the context. In the context, the environment might play as big a role as a sequence to it. But it turns out, for the most part, that's, it's probably somewhat true. There's some crazy research out there where they're testing out how proteins work in like different solvents, but that's like really complicated. And then you also have disordered proteins, which don't take on one canonical structure, and they like change confirmation based on what's around it. But, and that's just entirely. Okay. Okay. That's like the cutting edge right now. Okay. Okay. So what is alpha fold? And like, why did that affect the game? Yeah. Also, I feel like, so we had ambence and dogma. And it was like, isn't amino acid sequence all that you need? And then that really launched the field. And then I think in the 90s, that's when this competition called the critical assessment for structure prediction or protein structure prediction cast came about. And that was like the Olympics for protein structure prediction. And every two years, they would have structures that weren't released to the public yet. So somebody did x-ray crystallography or cryoam or NMR and they solved the structure of a protein. And that may have taken years to figure out that one protein structure. Exactly. Yeah. And so they kept those from being released into the public. And they had everyone use the cutting edge models that they created at the time to try and like figure out what that protein structure was of the ones that we already knew were solved. And so that happened every two years since the 90s. And there was like moderate progress that was made over time. But it wasn't really until, I think 2018 or 2019, when deep mind released alpha fold that it just like fundamentally changed the game and blew all of the previous models out of the water. Right. Right. Because if you think about it, it's like it's a physics. If you're trying to like model exactly where all of the atoms and a protein are, it's like, I still think we can't computationally fathom, but that would look like a protein. Right. So you needed a different approach like a transformer to be able to like look at it from a different perspective of can I combine my knowledge of already existing structures and the amino acid sequences that go with them to really get there. Right. Right. And it wouldn't have been possible without the like decades of work that people had done getting these protein structures already and like figuring out. I think that's a really important point because it wasn't like deep mind and created this in a vacuum. They used all of the hard one protein structures that had been built up over decades as training data for this. And also protein sequences too. So there's this other, so we have the protein data bank, which is it's like several hundred thousand protein structures like all crystal NMR and so on structures. And then protein sequences have been since the like advance and DNA sequencing technology. We've been able to figure out all these different possible and like probable protein sequences out there. And people have made models to predict what these do just on the sequences themselves. And then so from all of the sequences out there, there there's like slight variations in these. And then because if two things like co-vary and like change always together, we can be somewhat like we can suspect that these might be kind of like close in space because they're interacting with each other. And thus one mutation is infecting the probability to other one there. And so alpha fold really it's combining the the structures and learning on the structures with also learning on the sequences and the co-evolution of how these different amino acids are evolving with each other. Okay. The co-evolving residues can sort of like staple those close in 3D space because they're interacting right there. They're co-changing. You change a positive charge. On one, you now need a compensate in the other. That makes it complete to it. Okay. So alpha fold really showed that this was possible. The bits to binders challenge that you guys organize is shockingly ambitious in my opinion. I mean, it's like we go from, hey, is it possible to predict amino acids sequence to protein folding too? Yeah, there are some tools that can do that too. Let's go see if we can synthesize, hold a competition, pure open source software and go build a protein to accomplish a very, very specific task. That to me, it felt like a giant leap moving from saying, hey, can we predict protein structures to or protein folding to creating new ones that work and do in accomplished biological tasks? Okay. So my question is like, is that where the field is now? Is it very common at this point? We went from 2018 to this and are you just seeing explosion there? I think so. I think structure prediction still is not solved. I think we've gone very far and it works very, very well for cases for many, many cases. I think there's still room to go. So I think cast is like it's not shutting down cast like we actually like image net and the whole image net competition. Like they don't do that anymore because it's like a solved problem basically. But like we still haven't solved the protein folding problem. Okay, so cast was still very much showing. Yeah, yeah. And it's a lot of like a transformer based models. Yeah, the full sort of launched its own. I would best show. Yeah, folks are now like closing the loop and re-injecting physics into these models, which was the old style. The transformer took over in one, but now they're like edge out transformer only. You need to bring physics back into the equation. We're seeing a lot of that in other fields as well where it's sort of physics based or physics constrained based modeling. Right. You know, if you're trying to predict how a physical object would move, you can strain it. Well, you know it can't go faster than the speed of light. I mean, there are simple things like that as an example. Gravity exists. Exactly. Okay. But to jump back in, yeah, to like this sort of leap or attempting to design, there exists a lot of open source software for this, right? It's a hard problem. If you look at academic institutions, folks like us tend to like hard problems and want to crack them, right? It's like cast has been so successful at pushing the edge. So, so you see a lot of open source models that are coming up built on sort of the predecessors. So how do you combine these different workflows? How do you screen, identify, you know, use the structure prediction as part of the modeling? So a lot of tools existed. A lot of manuscripts have been published on the topic of do not put a sign. So, you know, the field is growing. We sort of have caught this wave, right? You see folding sort of be solved as people say. And then, you know, this growth of a lot of attention, a lot of attraction, the Nobel Prize, and a lot of labs trying to, trying to work on this. So it felt right to run a competition to see how well these systems worked. Yeah. And I think so in 2024 is when we really formulated this whole thing. And at that time, we had several ongoing projects with people in the lab and other labs where we were trying to design proteins to solve a given task using the models that were open source in state of the art at the time. And I think one thing here is that diffusion models were really becoming a big thing at that time. And so that enabled protein design instead of just protein structure prediction. And so that was kind of, yeah. Totally. Yeah. And what's actually funny is that a lot of the new protein structure prediction models are also diffusion models now. So it kind of like has, you know, it's influenced everything. Okay. But yeah, so like around that time in 2024, we like didn't really know it worked best. And we like wanted to find out like for our own sake, in some sense, for the projects. And I think a lot of people in the field are wondering the same thing. And we kind of just like, so we have the bio and all society, which do want to go into the origins of all that. Yeah. I've got the organizers right here. Yeah. Let's see. How do we start? I mean, it was a group of students who were interested in this space. This was before chat GBT came out, right? So the world at large didn't, you know, didn't have a focus on AI. So we were interested wanted to apply it, had seen Alpha Fold 2, I think is what drew my interest. And we were sort of isolated. There wasn't a lot of knowledge around. And so we had to focus on these specific labs, working on these problems and read the papers and try and like wrap our head around what they were attempting and how these models worked. So we really started as a group to teach each other and improve it at modeling in this way totally. And so we started this in January of 2023. And then we initially just had some guest speakers come by and like I'll just like talk and try to learn with each other. And then in the fall of 2023, we started a lecture series actually on applying machine learning to biological problems. And we were all lecturers in the series and as well as some others too. And that was super fun. And we were kind of just like learning as we went and we attracted some people to the organization and like how to really fund base that started the forum. And then in the spring of 2024, we were kind of just thinking about ways to engage the community again in some way. And then I really don't remember who came up with the idea at first. But someone was like, we should do a hackathon. And then we're all like, sure, like didn't really know that meant. Yeah, I don't know. When I heard about the hackathon, I was blown away because I didn't know that the open source tools that could even do these types of things. I knew that I knew about Alpha Fold, but I didn't know about the other tools sets out there. We've talked about this before, but sort of the skepticism within biology maybe around the use of machine learning tools or AI tools. I want to talk about that a little bit, meaning I remember talking to one biology student a couple of years ago. And I was saying, what's the reception of computational biology within the biology field? And he said something like, well, get on board because the trainees leave in the station. That is a future. But it's not always been necessarily well received by professors with lots of years of experience. They may have some skepticism. What's it like right now? It's 2026. I remember when Alpha Fold came out, some people didn't believe that it was accurate or it could be improved. Is there a general embracement of computational biology? Or are there still a lot of skepticism and resistance within the biological domain? I feel like it can be on either side. A lot of are like the bubble that we live in. People are pretty receptive to it. The biggest thing is they want to see whether it actually works in reality. So you need to like print it into carbon as a statement that one of our PIs says a lot. And so there's a lot of skepticism if someone just publishes a model and there's no experimental data to back it up. That seems to be one of the biggest ones. But I think people are really impressed at the speed at which you can come up with potential hypotheses. So that seems to be a thing that pretty much everybody is on board with now. And people are trying to incorporate more AI into their work for sure with this very skeptical hat on of like we want to see behind the scenes and make sure all of the code is working as we want to do it in that type of thing. Right. Right. Well, all right. So that's a perfect segue because in my world, when I'm building AM models, for the most part, we don't really have to worry about the difference between the digital domain where they're being trained and where the data is being fed in and then some transition to reality, right, the real world. We can essentially treat the benchmarks as the final test, right? Because they work really well. That's not the case with biology, right? It is entirely possible that you could develop a new amino acid sequence. It looks like it's going to fold and bind and do really well, but the reality has to actually be tested. So a couple of questions there. How big is that gap? Like what did you learn as a part of this challenge is the gap bigger than you expected? And is it in to close that gap? Is it really a question of just getting better at being able to simulate biological systems in the digital domain? And then you can see a path, maybe I don't know if it's a year or 10 or a hundred, where you do all of your your research and development purely digital. That's a very big question. Let's have I get to get a shot at that. I mean, so going back to this concept of like we have to go print it into carbon, we can't use the benchmarks, right? So in modeling for proteins, you have certain evaluations you can do in silico. And this is like model perplexity during generation of that amino acid sequence that would fold or how confident the model is. So alphabet is like a confidence metric of how confident it is of that structure. So you can look at all these data points and try and say, yes, this is a good sequence. It'll fold this way and it'll perform this function. But it's sort of like running cloud code, except you can never run the code, right? You're just saying, okay, it generated these tokens in this order. Here was a perplexity during that generation. We'll pick this script and we'll ship it, right? You need to run the code to evaluate whether it's working or not. So that's sort of the wet lab side of things is running the code, if you will, we have DNA code, right, that gets converted into proteins and then they get tested in this real world scenario. So going full computational, I think things will get better, we'll have better hit rates, better success rates, right? The field it ranges based on model and task from, you know, less than let's say one in a thousand up to, I mean, some papers report like a 70% success rate on let's say improving an antibody's binding, right? But it depends per target, per antibody, start with, there's so many factors. There's so many variables. Yeah. So that's, I mean, that's what we were trying to assess. There's one, there's one target for binding in particular, called TNF alpha, that like most models cannot make binders to. I think the only one group try to has been able to actually make binders to this as far as I've seen. And it's just like a weird target that forms like a trimer, it's called with each other. So it like has this very strange interface. And for some reason that just lends it to be very tricky to make protein binders against. And is that because that type of structure is uncommon. And so the models just haven't seen enough of it. Or is there something deeper going on? Or is it hard to know? I think it's hard. I think it's hard to know. Yeah, I think it's good to know. Because I think there's so many objectives in biology. And we don't have like amazing suites of data for each of the objective. So it's hard to really tune that. But to kind of go back to your question, a lot of, there's a lot of funding going into building virtual cells now. And like creating these types of digital environments where you can model. So that's like definitely the direction that we're going in. But I don't know how long it's going to take. It's funny. I've lost track of the times I've talked to people who are in sort of AI biology startups and the goal. I'm like, what's your goal? And they're like, well, we're trying to figure that out. It was like just start and then we'll figure it out. You know, one of the things that we're seeing more and more at the bleeding edge of AI is there's something called sort of the verification dilemma. It's like coding agents, AI coding agents that can actually write programming code are far ahead of pretty much every other domain because it's verifiable. You can generate some code and then generate a test to see if it works or have tests beforehand. And we just don't have that in biology right now. You have to go through all of the complexity of the wet lab, right? And as you said, you know, actually creating the carbon. The challenge was over five weeks. Is that right? And then it took almost a year to synthesize everything. So I mean, that right there just speaks volumes. Yeah. So I think we challenge was in August of 2024. We collected all the sequences in October, I think. So like five or six weeks after. And then we, you know, filtered them, sent them off to the DNA synthesis provider. I think we got the sequences delivered to UT in January. And then we sent them off to the partner company that we worked with Leo Labs. And then they were fantastic. And they worked so hard on this. It was amazing. Yeah. And just to say this validation, we tested 12,000 sequences in pooled screens. So because of the assay that they had set up and what we had available, we could test 12,000, right? And I think the cells, so we would test 12,000 in a pretty high-throughput assay, but it still takes like weeks for these cells to grow. And like you have to do in like multiple replicates in case there's weird things happening in the biology. So you have to like make it consistent. And then like we did the pooled high-throughput screening as the first main filter to see how things worked. And I think in the end we had, I want to say a six or seven percent success rate roughly in across the board. Some teams had kind of like lower success rates. And then some teams had, I think up to like 38 percent success rate from the high-throughbrit screen, which was very, very high. And then so we took after that like the top 10 designs total in terms of performance in this assay. And we tested them across a broad range of different like immunotherapeutic functions, which should we actually just go more into? I would love that. Yeah, let's listen to that. Cool. I guess I'll lead that. So basically the format for the competition was in this immunotherapy called CAR-T therapies. And this stands for chimeric antigen T cell, sorry chimeric antigen receptor T cell therapy. And so our bodies have two several types of immune cells, one of which are called T cells. And these are sort of like the policemen of the body. Like they're like going around these to these different cells and trying to figure out like if these other cells are infected or not. And if they are infected, then they're going to, or like not doing well in some way, they're going to try to eliminate them and try to like, you know, make the body overall very healthy again. And we can take a few decades ago someone proposed that we can essentially engineer these to recognize not infected cells through the normal mechanism, but instead to recognize any arbitrary cell or protein. And they do this traditionally by attaching antibodies to the cells to kind of guide them and have them stick onto different parts of different protein surfaces. And so this is like a huge therapeutic field and it works actually really well to cure certain types of blood cancers. And so it's like very exciting and promising and people are very much wanting to explore all this thing, all this stuff further. And so Lea Labs, their company, I think they're mainly applying this to cure blood cancers and dogs. And so it's like super just like awesome stuff. And they developed this high-through pet assay to discreen like thousands of these different like CAR-2 cells at the same time by figuring out when they when they when they detect that the target that they expect. Is it going to cause them to grow and proliferate more than a than a control than not than something not being there? And so the reason why we expect this is because the body has this mechanism for these cells where if it detects the target then it's going to make more copies of that cell to then better mount a response because you have more opportunities to actually have everything work correctly. And so we can measure all that with DNA sequencing technologies. And basically gets some like pretty clear like plot that is like if it worked it's on this side if it didn't detect it then it's on the other side. Okay. Yeah. Okay. And the goal would be ultimately to essentially elicit an immune response that that kills those cancer cells. That's the goal. Yeah. So the cancer cells contain a specific protein called CD-20. These specific cancer cells we were looking at. And the goal of the competition was for people to design proteins that can bind to this other protein that when we integrate them into the CAR-T cell it'll guide the CAR-T cell to that specific cancer target. So when you're thinking about cells in the body a lot of them have like kind of protein depeptides. So those are like small sequences of proteins that stick out on the cell surface. And so a lot of like existing cells they have the MHCs. So they're kind of this we don't need. Major histic compatibility complex. Yeah. Yeah. Major histic compatibility complex we can probably cut this. No, no. This is way too. No, we're totally leaving that. No, we're going to leave that in. You're going to keep going. This is awesome. But it's essentially like a homing device. And so when you have six cells you can in the MHC they're going to put out peptides that are going to be like I am sick. Okay. Maybe. Yeah. But if you have like a blood cancer or something like that you don't have the I am sick signal. Just set it down as part of their progression for me. Oh, that makes sense. Okay. That makes sense. So you're essentially trying to create that signal that missing peptide indicator. And so rather than targeting the peptides the MHCs hold on the surface we target CD-20 which is another peptide. Another peptide. Yeah. Or protein. Yeah. Another protein that makes total sense. Okay. So I know everybody had to submit amino acid sequences and they had to be 80 amino acids in length exactly. Not more, not less. Why? Yeah. I think that was mostly just chosen to essentially just make it really scalable because we would just have like the set sequence length everything else could be synthesized by twist in terms of like just the blocks and then they could like assemble them together. Okay. And it would be consistently sized and everything would be would work out well that way. Okay. But I think 80 is at least we thought it would be enough to create binders that could target it. So it's long enough to be long enough to probably have the complex functional behavior you want to but not so long that it got to to difficult to do the wet lab processing or whatever downstream. Okay. And then sorry. I'm kind of curious. What was actually the hardest part? Was it what you thought it was going to be? You know, I look at this and like was it making something that folds? Was it making something that binds? Was it making it express in the cellular context or was it that last step of making a trigger of the right T cell response? Yeah. I think I'll talk about this one. So after we did this whole competition, we spent like months analyzing the results. And we were at first looking at just whether in this high throughput screen, whether the cells divided more proliferated more or not. And we were checking that's basically like our true positive signal. Like these probably worked. And then we spent a lot of time looking at that. And then we realized that actually like half of the designs didn't even like results in really like functional T cells at all. And so they kind of just didn't produce like a proper protein. And so they just didn't even express correctly. And so basically my answer to your question is that all of the above are still technical. And we're still trying as a field to like disentangle all the rules that that made these proteins either express better or you know target correctly and try to just like bump up the success rates at each of these different steps and also be soluble and also not have and like most people have even explored this whole topic of these proteins not having immune responses and stuff. And so like this is a great example of the multi objective. There's so many failure modes and it's hard to disentangle. And so then because it's hard to disentangle from the assays that we're getting, you don't have the validation data to throw back into the model to like then train on those objectives. We have like to some extent we know which ones didn't like proliferate. And like so you could I don't know. I guess there's a couple reasons why they wouldn't proliferate as well too. So yeah. Right. They didn't fold. They didn't express correctly. They weren't presented on the surface of the cell. All these processes have to succeed for even the protein to sit on the surface. Okay. Now you have a protein there. Now does it bind its target? Okay. Once it binds, does it activate a response in the T cell that causes the proliferation? So that interaction like sends a signal to say, hey start proliferating. And what's even crazier about that is that if it binds to tightly in T cell biology, it can actually cause the T cell to like kill itself because it's like because it's like we don't want to have superhyper overactive T cells. And so that's like a natural like dampening mechanism in the body. So they had to like they define the full delogged zone for it to actually work, which is like just simple. Thousand things. I mean it's sort of this thing where okay let's say like I mean the success rates were maybe you said seven percent I think just for proliferation right. So no no. So yeah sorry sorry that's totally correct. Yeah. So let's say it's 10% and you've got four hoops to jump through and they all work at 10% you go 10% 1% point. Now you're getting into small exactly. Yeah. Yeah. And so we a big part of this was afterwards we wanted to find out like what caused some things to work better than others and can we disentangle some of these things. And I think the main findings that we found which we discussed more on the papers is that there's these these generated proteins look very different from like regular proteins in terms of the sequences. And our hypothesis that we come to is that the sequences contain like different amino acids that are causing you know weirdnesses with the rest of the biological system that maybe is influencing them not expressing properly. Okay. Oh and so you would you would you would you really wouldn't find these naturally for that reason. Some of them. Yeah. Okay some of the stories probably more complicated. Yeah exactly. Yeah. Okay. And yeah. Yeah. Okay. I wanted to ask you about you know this was a student-led competition. I found it to be just astonishingly ambitious. I already mentioned that I didn't know that there were you know so many open source packages capable of achieving these protein fold prediction results. And I know each team got to you know choose their own. Do you think that you guys were able to move so quickly and be so ambitious because it was a student-led organization? Do you think like if you were in a lab or if you were in a company setting there would have been more constraints would you have to have moved slower or is it the exact opposite that you were you were hamstrung because of you know maybe you just didn't have the same amount of of money and resources to spend on this. Oh that's interesting. I think yeah. I was going to say I don't think we realized what we were getting into. That's how great stuff is accomplished. You don't know how big you're suiting it's how your team can't turn back. Yeah well we got to finish. Yeah we were really fortunate so much partnership with different companies. DNA synthesis I think was half covered. We had Leo Labs working with us with their assay you know putting hours in small little startup company. Yeah you know I can't think of the other way we had reagents donated like so many different pieces so like money I say money was an issue right we're an economic lab. We didn't like have our own funding for this year when it so we kind of had support of of the community you know mentioned earlier like the professors around us really supporting the whole ML push I say ML AI you know at UT here so we're fortunate in that way and they were on board with that so so like special place special time we had the club and then being in an academic space you know I feel be holding to do things open source right I'm funded by taxpayer dollars I feel like everything I work on should be released for for public use and there's there's some exceptions where you need patenting for things to move into production but as far as exploration of ML and generally. That's one of my favorite parts about this challenge was it was completely open science I'm just really kind of striking everything everything was open source from a software perspective yeah to compete you had to make no intellectual property claims on any of the sequences or the methodologies right I mean it was just all about advancing the scientific field there's a very intentional throughout the whole design process. And I think that enabled us to get a huge diversity of ideas and models that you probably wouldn't be able to get if you were in a company like you could pull open source models off the shelf but the order in which you put them together or you know like the other bespoke platforms that people had put you know all across the world yeah yeah you wouldn't you wouldn't get that and we we had to we were just lucky and we had a team that was like someone from Boston someone from Uganda someone yes okay yeah it was all over this crazy diversity of ideas that's crazy in Spain and India one team yeah okay so I have to ask who were the winners of the competition yeah so we had two main categories for who won the first one was in the high-throughbit screen and this was more this is more like like who had the best hit rate overall in this whole thing and so from the high-throughbit screen the best hit rate was nuclear UK London and they had a 38% hit rate of their submitted sequences which was like pretty much a stunningly high honestly we were pretty shocked by that and then after that so from that screen as I was kind of mentioning earlier I think we were we took the top designs from some of the teams we took 10 different designs and then tested them individually in a series of different like T cell functional assays and so this includes things like cytotoxicity like whether it can specifically kill the cancer cells compared to like a muscle cell control it included things like whether they released the proper cytokines in the red amounts and whether they can expand and proliferate and like do these things that we expect healthy functional carti cells to do and so from this we took like a weighted average of these different individual functional assays that of course the elabs were the ones who did all this amazing work and uh from that the top team was the Perez Lab Gators and then they're from the University of Florida and then close behind them was this team called Amigo Assets which is a fantastic name and they're the team that had all the different competitors from around the world which was super cool to see this team kind of like spread all these different countries around it was super cool and then the third team was the the shoulder lab and they're from Germany and they were also like had a very strong design so it was some good success oh yeah trophies some 3D printed protein trophies yeah so we we worked with this other uh this person from I think the major lab is what their name was and they they they created these awesome trophies uh we made trophies of cd20 the protein with each of their binders like binding to it which was the classic ribbon image of the backbone of the protein and then we sent them to them like a month or two ago yeah that's awesome okay final question uh what do you take away from the competition did you did you leave it on a personal basis more excited about the future of AI intersecting with biology or was it was it a disappointment um I just any final thoughts on the experience you want to start sure yeah um I I left pretty excited about it I mean the fact that we got anything to bind and to grow and like you know um go through this really hard task I thought was really exciting and I think for me personally it was just so much fun to be a part of this international competition and see the excitement and the buy-in and um I know like you know the internet connected everybody and so this this was kind of a biological cognate of that in a way you know um because I feel like a lot of times when you're in wet lab you're interacting with someone five feet away so this sort of enabled um yeah like collaboration across borders so I think yeah for me personally I just loved seeing people tune in from everywhere and um having them form teams that are so multidisciplinary and seeing all of the variation of ideas that came in yeah that part was really really cool and I'm really looking forward to meeting all the people that entered the competition at conferences or just like around um and I've met a few of them uh in the past who have like happened to be an Austin at the Simon so we just like caught up and got coffee with someone who entered the competition which is super fun um I think I agreed there I'm I'm very excited about the field from this competition and I think we I honestly think we've like uncovered just a lot of different questions on like how to make this better and like questions that we need to address to really like bring this as to become a more mature kind of field overall um like what even is like what are these weird protein sequences that don't even look like biological sequences doing like are we like what how is that impacting biology what are the mechanisms behind what's going on like what are the mechanisms behind these things working or not working um and like we only uh really talked about and really measured protein binding and I'm really excited for the whole future of uh like enzyme design which is it's really starting up making uh proteins that can catalyze reactions uh and I'm I think it's extremely bold but people are even trying to make uh like new to nature reactions that they're trying to design until like catalyze like brand new materials into existence which is super super cool and then like even further than that there's this whole arm of like making proteins that have dynamics and that like change in making like full-fledged protein machines and I think we're I think there's a lot of work to be done before I think it's super cool yeah wow can you final thought yeah absolutely um I totally agree with both of you all's points um yeah seeing people come together seeing like very young PhD students maybe in their first year who are interested in this space join a team submit some designs we have the work through the workflow to provide a space where they have like an end goal and there's wet lab validation I think was like for me one of the the coolest things of running the competition um I'll throw a little pessimism in y'all right um I'll be the one to throw some pessimism but you know our hit rate was quite low none of our I think none of our designs uh outperformed the positive control I say art designs but the submitted designs right and we chose cd20 because there are clinically approved cartee therapies for this rate so we had a positive control to benchmark against right we didn't reach that but you know imagine one team five years down the road and they've they can do 12,000 from their design that's improved you know from the data that we collected this competition or similar things are just like the large global growing scale um so I see a future I see definitely like this contributing to that but like right now everything's a little bit scattered um you know there's so many different methods each lab or each researcher benchmarks their own method on their chosen targets right there's you know there's no casp there's no like critical assessment of these methods you know head to head on single tasks one by one so you know we did this one competition I could see something like this growing other folks taking up the mantle and and running something that repeats so that um you know the kind of open source academic community can really hone in on yeah what is working how to make it work better and then you know across all these checkpoints of folding binding proliferating different e-values different filters so yeah it kind of told us where where things stand today I love indigo on a realistic note there because I think you know with much of AI right now we can see just how good it's going to be but we're also living in the world where you know it's still prone to failure in in ways very frequently that's kind of unpredictable right it almost has that sort of human element of unpredictable failure points right um but I really do believe that the competitions like that that you that you put on are so incredibly important for science because you had so many countries and so many teams involved in that experience um is going to give everybody involved the optimism that they can do these types of things going forward and and just being willing to step up to the plate is half the challenge in life um so unbelievable job major kudos you should be really really proud and I can't thank you all enough for being on the podcast say this is awesome I've been looking forward to this for a long time awesome also thank you for helping us like ide early and the whole competition I don't know if you've mentioned that you were talking to us early on about yeah I picked up for office hours yeah I couldn't have been happy to be involved this is this is just an absolute joy thank you all yeah thank you thank you for listening to hidden layers this series is hosted by kung fu AI a management consulting and engineering firm focused exclusively on artificial intelligence if you have any questions or thoughts about today's episode or if you know someone we should feature please visit us at kungfu.ai

All Hidden Layers: AI and the People Behind It episodes →