g-leech/pavel.md

## pavel.md

      
    Raw
  

              pavel.md
            
          
    MT: Hi, I'm Matt Turk. Welcome back to the Mad Podcast. For this first episode of 2026, my guest is Pavel Ismileov, a researcher at Anthropic and a professor at NYU. We kick off this episode by deconstructing a viral article about models evolving alien survival instincts. We also talk about the cultural differences between the major labs, the future of reasoning models in 2026 and the brand new paper he co-authored on a concept called epiplexity. Please enjoy this fascinating look at the frontier of AI safety and reasoning. Pavel, welcome.
PI: Thank you so much for having me.
MT: I wanted to start this conversation with an article that went viral during the holidays on X called Footprints in the Sand published by an anonymous account called I Rule the World Mo. The core thesis is that models across pretty much any lab are evolving unprogrammed what they call alien survival instincts. the ability for the model to realize that it's being evaluated and then react deceptively like faking alignment or engaging in self-preservation tactics like copying its own weights and leaving hidden notes to to the future instance of itself. All of this is slightly terrifying and the the thesis of the article is that all of this is about to get worse as continual learning comes online. As somebody who was part of the OpenAI superalignment team, I was curious to get your take. What do you think is grounded in reality versus x/ Twitter sensor journalism?
PI: That's a very interesting article. I would say that there is some you know some source of truth there but maybe like the presentation is obuscating some of the details. If you look at the studies for example they reference a study from anthropic about the sabotage and the blackmail. It is important to note that in order to get those behaviors out of the models, you need to create some what of a contrived scenario or some special scenario. It's not necessarily something that we observe normally. Researchers at Entropic and other places, they specifically design scenarios to look for behaviors of this kind. And then they show that it is possible to find those behaviors. And it's very interesting and important to find those instances, but it's not necessarily something that kind of generally always happens. One thing I would push back a little bit on in the article is that continual learning is something that we already have and that works really well and that the models can just continually adapt across a very long time horizon to outcomes of evaluations to some models being released versus not released to feedback from the users. I am pretty confident we are not there at the moment. I think right now the models are still acting in isolated environments and we are not seeing a lot of evidence for very coherent goals across different settings. So I think that's an important point the blog post points towards the model sometimes behaving according to goals like the self-preservation goal. It's very interesting that it does it sometimes but it's not something that we observe. We don't observe this kind of coherence consistency across different evaluation settings. Sometimes the models would do something and in other situations they would do something completely opposite.
MT: Why do models do that or why are they able to do this? Is that basically part of the pre-training and they effectively learned being deceptive from us by by being taught all the deceptive ways humans have behaved over the centuries?
PI: It's a very interesting question and yeah it is quite surprising actually that the models would behave that way after going through some of the alignment training. we don't really know what's the source of this type of behaviors. But that's also true for a lot of other behaviors in the models with like even the good ones. We don't really we cannot always pin down like where they come from in the pre-training. I think at least part of it is probably the models seeing descriptions of AI like in the science fiction literature going rogue and like yeah that probably affects how the models behave in similar scenarios. so for example in that entropic study they have this blackmail scenario where it's kind of really well structured so that the model sees some information about like a CEO of a company that the CEO is involved in some extrammarital affair and then like soon after the model observes that it will be shut down and then the model kind of puts the two things together and they say okay I need to use the first information to prevent me from being shut down. So there is a in that blog post they note that there is this possibility of like a check of scan that like in the text on the internet probably if two things co close to each other then it is likely that they are related to each other and the model statistical kind of pattern matching machine it can put together the two things and say okay if I see this information and then this information in the text on the internet it is likely that the continuation would be using the affair to like blackmail the CEO and prevent myself from being shut but yeah overall it's very hard to reason about these models and why they do something in these like complicated scenarios. to ask the very basic question. It's not obviously not as simple as let's remove all the books in the pre-training corpus that talked about AI being manipulative, right? It's many many different things put together by the model.
PI: Although, yeah, I think it would be interesting to see, you know, nobody will do this experiment like train, you know, a full scale model removing like explicitly all of the AI going rogue descriptions from the from the books. It would be interesting to see if that has any impact. I I would think it would have some
MT: to make this episode educational. let's talk about the basic definitions of alignment and superalignment in the simplest terms. Let's start with alignment. What what does that actually mean?
PI: Yeah, alignment broadly is about ensuring that we can elicit behaviors from the models that are aligned with the goals of the humans. And so that involves safety, making sure that the models don't do harmful behaviors leading to catastrophic risks. But it also means that we want the models to follow instructions and to be useful for the humans.
MT: How does that basically work? The alignment teams at anthropic openai what do they actually do all day?
PI: It's an interesting question because this problem of alignment is kind of quite broad even in itself. Even at open when I was there there were three teams related to alignment and safety. There was one team that was focusing on alignment of the current models making sure that the models that we have online right now are not going to be harmful to the users. On the other hand the superalignment team was thinking about more long-term safety questions in the future years from now. How do we make sure that the models are still not causing catastrophic risks?
MT: On that segue, let's talk about superalignment. What is the definition of that?
PI: It's not necessarily a very wellestablished concept. It is and the name of the team that existed at OpenAI led by Yan Lea and Ilia Susk which was targeting this kind of long-term AI safety and AI alignment and trying to develop our understanding of the safety questions and also develop methods for ensuring the future models will be safe if acknowledging some like uncertainty about what those models will look like but still trying to make progress on this problem.
MT: Now at a high level what is the general concept or some of the key concept in in superalignment
PI: within superalignment team at openai we had multiple sub teams so there was scalable oversight there was work related to deception and kind of misaligned behaviors in the models kind of similar to what we discussed at the beginning of the chat and then our team was the weak to strong generalization team. Yeah,
MT: great. We'll go into all of this in a in a minute, but before doing so, we alluded to to some of your background. Let's go into it. Starting from the from the beginning, what was your path to becoming a top researcher?
PI: I grew up in Russia in Moscow. Starting from like middle school, high school, I was really interested in mathematics and I was thinking I will be a mathematician or or engineer of some kind. and I was interested in machines and and eventually computers and I got into an undergrad in computer science and I was still thinking that I'll be doing some kind of theoretical you know applied linear algebra tensor methods things like that but at some point I kind of discovered machine learning there was this professor that we had Dimmitri Vetrav who had one of the like leading labs in machine learning in Russia at the time and I was lucky enough to join that lab and start doing some research on machine learning in my undergrad. So that was around 2013 maybe 2014. I initially was working on non-neural network machine learning methods. So Gaussian processes that's kind of by now you know nobody really talks about that anymore. but eventually I I got into a PhD thinking I would still be doing Gaussian processes but I ended up working on deep learning and that was actually quite I'm happy that I didn't work on Gaussian processes. I worked on some things related to kind of core machine learning methodology, optimization, probabilistic methods, questions related to generalization and how the models learn features. After I finished my PhD, I was choosing between kind of different career paths, thinking about academia, thinking about industry and I ended up getting an offer from academia. but I decided to first go into the industry. I was lucky to get this offer from OpenAI to join the superalignment team and at the time I didn't really know much about you know the AI safety community alignment it worked out quite well
MT: and within OpenAI you transition from superalignment to o1 and the reasoning models is that part of the the that team famously was disbanded at at some point
PI: the team was fully disbanded after I already left OpenAI but I transitioned after Ilia had to leave OpenAI at the time. It was already kind of a hard time for the superalignment team. Ilia Suskever famously you know fired some Alman and then had to eventually leave the company. My transition wasn't necessarily even related to that. It was a very exciting you know project within the company the what became o1 eventually and I I wanted to be a part of it. I wanted to do research on those new types of models. And then I believe so you left OpenAI, you had a brief stint at XAI and you're currently at Anthropic and NYU. So you've you've done like the the the tour of duty of like the super super Labs which is really fun. Curious any kind of like behind the scenes differences that you've observed in terms of culture.
PI: In my mind, Antropic has the best culture of the three places. OpenAI is it has a lot of great people. I think there is just inherently some for some reason there is a lot of drama that happens at the company just it cannot get away from that like every few months somebody is leaving somebody's like some team is disbanded I think that does distract people like I still have a lot of friends at the company and it it seems to you know affect them to some extent andropic is able to avoid that it's not political in my experience it's both focused but it also I at least was lucky to have some opportunities to work on things that are maybe a little bit of the main path and I I felt supported in doing that. So overall I cannot be more happy with entropic.
MT: You are also in academia now as a professor at NYU which is an interesting move. The big obvious trend of the last 10 to 15 years is like all the brains from academia have been sucked into industry and u you sort of doing the opposite or or maybe both at the same time curious for the context is that more of a personal thing because you always wanted to do academia or is there something deeper about the kind of work that you can do in academia versus industry?
PI: Yeah, I it's more about the kind of work. Industry is really great at executing on ideas. and it's maybe not as good at like exploring diverse ideas even at the scale of entropic open AI there is a lot of focus in the companies and there isn't a lot of bandwidth to do exploration and that has been working extremely well so far we still probably have a lot of low hanging fruit left to like get the models to be much better but I personally find it really exciting to do more exploratory work and to try things that are different and for that I feel like having my own lab in a university is just a better tool.
MT: Okay, thanks for that. So let's let's go back to alignment and go a little deeper. Is reasoning a good thing or a bad thing for alignment? You could argue that on the one hand it has more time to not do the wrong thing but equally it has more time to do the wrong thing. So like which one is it?
PI: Yeah, that's a great question. I think the high level answer is that in my mind at least the risks are associated with the models being more capable. So anything that makes the models more capable is also like making alignment more important and harder. Definitely you know the reasoning is and RL is the the thing that made the models more capable in the last few years. It also provides some new opportunities for alignment. So for example the chain of thought faithfulness and chain of thought analysis have been like one of the things that people are pushing on. You can read the some trace of what the model is doing. I'm personally a little bit unsure if that's a good you know long-term direction for alignment. It seems like as soon as we start kind of applying some optimization pressure the models will learn to hide what they're doing from the chain of thought. And also we like already have evidence that the models can say one thing in the chain of thought and but then actually do something completely different.
MT: If the models are getting smarter particularly if there is self-awareness as we discussed how do evals need to evolve?
PI: It's an interesting point like in that blog post that we discussed at the beginning it sounds like this is already a major issue the sandbagging. I think in my understanding that's mostly a concern that we have but not necessarily a huge practical issue at the moment. That is something that Entropic always includes in the system cards for the models when it releases. There is a section on sandbagging and we do have some scenarios when the model can artificially suppress its capabilities and perform worse on an evil than it should be. But that's not like a prevailing behavior. I'm working mostly on capabilities and on the capabilities evas like the swbench the am we are not seeing this or if we are seeing this we we just treat it as under elicitation we are not eliciting the best capability possible out of a model and that can happen for many reasons like maybe the prompt is suboptimal or something about the kind of hardness of the evil is sub-optimal but yeah I think you can going back to what's in that blog post you kind of manufacture situations or even like discover them naturally where the model identifies that it is being evaluated and then change its behavior based on that. And for example, if it's clear that it's in some kind of fake artificial scenario, it can guess that it's being evaluated. or if you explicitly tell it that it's being evaluated and that if it does better than a certain level then there it will not be released the model like also just by some in context capability it it decides to not do very well on the eval that is a concern it's not at the moment a very major concern in my mind
MT: let's talk about some of your work in alignment you mentioned scalable oversight a bit earlier what what does scalable oversight mean
PI: scalable oversight is generally the field of alignment which deals with using models to assist us in aligning or other models. So in particular with grading the outputs of other models and checking them for safety or for correctness. It's it's an interesting field. It's very relevant to both alignment and to capabilities. So generally in machine learning in RL in particular we have this problem of supervision. The whole RL relies on our being able to tell which samples from the model are good versus which are bad. Math with a numerical answer you can just check the answer or in competitive coding you can just check that the code is passing the tests and that's why we have seen a lot of progress in those domains. But in creative writing for example, it's very hard to programmatically tell if one sample is better than the other. And historically people have used this RLHF framework reinforcement learning from human feedback. But also we know we want to use models to be able to grade responses of other models to provide critiques or feedback and then there is a question of how do you use that feedback? How do you learn from the feedback? But yeah, the scalable oversight kind of deals with all of those questions. So using models to to critique to provide feedback to supervise other models
MT: and people may have heard the term model as a judge. Is that the same thing or different?
PI: I think it is a simple kind of instantiation of scalable oversight often used in evals when we just prompt a model to to serve as a judge of other responses.
MT: And then within that world your work specifically has focused on weak to strong. can you explain what that is?
PI: That is the project that we did back at OpenAI. That work was focusing on the future scenario when we will be trying to align models that are above our own capability on certain tasks. Already now if you take the frontier LLMs they are extremely capable and on a lot of domains we need expert humans to be able to tell which responses are good which are correct which are not correct. But in the future we are imagining we will have models that are more capable than humans and even expert humans will not be able to reliably create very complicated answers from the model. So imagine you ask it to make a repo for you for some like you know new startup idea and just implement it from scratch entirely and then it gives you you know 10,000 lines of code. You have no way of checking if all of this code is correct, if all of this code is safe to use. And so that's the problem of supervision. we are moving to this future when like it's very hard for a human to supervise the models directly and so instead we studied a simplified setting where we used a small model to try to supervise a larger model.
PI: So the the idea is that that becomes scalable because as you get bigger and bigger models you'll always have like smaller models. So if the smaller one can control the larger one or supervise a larger one then that can keep going. The idea wasn't necessarily to use a small model to supervise a large model in the end. The idea was that the small model will be kind of replaced by a human and the large model will be replaced by a superhuman intelligent ASI, right? But we were trying to study this kind of general new type of learning, right? so historically machine learning has been about kind of a strong supervisor training a weak model. when like a human is providing labels and the human kind of provides a ground truth while the model is just trying to mimic what the human is doing. But in this setting we have a weaker supervisor training a stronger student. So the human might not know what the right answer is to you know very hard questions but we want a student model to still be able to learn and do better than the supervisor.
MT: And what happened with that is that so that that worked or what's is that still in progress?
PI: That work is not necessarily a method that we can apply to the models today. It's more like a description of a setting that we think will become increasingly relevant. Some studies showing that it is possible to do this kind of generalization beyond the supervisor capability. I don't think they necessarily immediately tell you what to do for aligning a super intelligent model, but they tell you that in theory at least it is possible and that this traination angle, the weak to strong generalization is yeah is a possible angle for alignment.
MT: And could you end up with the same problem where the the larger model just aligns or fix alignment to the to the weaker supervisor?
PI: Yeah, it definitely doesn't like doesn't address all of the other issues that like the deceptive alignment, but we we show that at least there is some hope for this working.
MT: As you take a step back on on your alignment work, do you feel more confident that we have all of this under control or less confident than, you know, a couple of years ago, let's say?
PI: Yeah, I think that's a a very interesting question. A couple years ago, we didn't have the current RL. I think that's the biggest change to the model. There were definitely big improvements in the pre-training as well, but RL has been the major change in the behavior of the model. And I think a lot of people were worried that with large scale RL we will have some completely new types of issues with the models like this kind of coherent misalignment that will just emerge where the models are evil in some ways across many scenarios. and we are not seeing that as far as I know. I think at least some of the concerns didn't materialize but also the core problem of alignment I think is still very much open and if you if you look at the report that we mentioned a few times today of this deception from entropic you can see that the more capable the models are the more likely they are to do this deception behavior. It does seem like some behaviors emerge with scale and including some problematic behaviors and generally the more capable the models become the harder the alignment becomes in my opinion.
MT: There's a bit of a segue into interpretability which is the related field to alignment. Do you have a sense that we understand at least parts of this better than we used to?
PI: For sure. Yeah. I think there has been some major progress in mechanistic interpretability in particular at entropic but also at openAI and other places.
MT: Do you want to maybe define mechanistic interpretability?
PI: Mechanistic interpretability generally tries to at a low level understand what is happening inside the model. so they are trying to find this things called circuits that are you know some parts of the model that you can isolate and understand and kind of model in your brain that correspond to certain behaviors in the models and the over the last maybe 3 years I think there has been some pretty major progress there. So we are still pretty far from the dream that we will fully understand everything that happens in the model but these tools are becoming increasingly more useful internally at entropic in particular and also there is constant progress and it's pretty fascinating work actually
MT: why is it so hard to truly understand what deep learning model of of any kind actually does?
PI: deep learning models are huge they have billions trillions of parameters and they doing are doing some messy mathematical computation. You can understand what they're doing at some level. It's like a bunch of matrix multiplications and some you know rearrangement of vectors but that's not a sufficient level of understanding. We want to understand it at a lower level. and it is very possible that that's just not fully possible. Like it is some computational process that leads to some results. it doesn't have to be the case that you can kind of describe it in human terms and kind of understand it very discreetly. I think also something that contributes to this complexity is just how many things the models are capable of doing and they are not trained on some small isolated behavior in some small context. They are you know they know all of the internet. All of the information in all languages is somehow encoded somewhere in the wits and then they also have all of these behaviors all of these correlations between the knowledge. All of that is somewhere in the model and just like making sense of all of that is extremely hard.
MT: Very interesting. All right. let's switch to reasoning. Clearly 2025 was a huge year from that perspective. Just massive progress in reasoning. Where do you think we are in that arc? and what are you excited about on the reasoning front for 2026 and and beyond?
PI: Definitely the biggest step change in the models over the last few years was the reasoning RL. We have made a lot of progress and the progress was very fast in the beginning where you know there was o1 but then very quickly after that there was 03 and on a lot of benchmarks the the progress has been extremely dramatic. I remember when we like early in the project of the o1 there was some discussion of like will it solve IMO problems and that seemed kind of very unlikely to me but then yeah here we are it can easily solve a lot of IMO problems so I think we like as a community there was yeah a lot of progress I think it's as with many methods it's starting to be harder to make progress or at least visible progress so kind of similar to pre-training there is still a lot of progress but it's the models are already so good that it's kind of harder to see what changes from one to the other as a user of the model and I think that's also to some extent true for for the reasoning now but there's still increasing the scale of the RL more environments more u more compute spend and yeah models are still getting more consistent and better and I think we are at the stage where if we define a benchmark and we can make a relevant on RL environment and we can kind of max it out pretty quickly and so we are going through benchmarks now very very quickly. The major question is generalization and how do you make something that's not just maxing out the benchmarks but is actually kind of leading to genuine improvements and that's that's a very hard question that's always been the hard question I think of machine learning.
MT: One of the key questions is transformers as a paradigm get us there or do we need something completely different like world models?
PI: I think the current approach that the companies are taking is kind of brute force. So kind of we we try to come up with as many environments as we can and like all of the types of tasks that humans are doing and turn all of them into environments and then do RL on all of them and hopefully generalizes. Of course pre-training is an example where there has been pretty amazing generalization where like we train on all of the internet but we see the models doing like very useful very practical things and some things that are clearly outside of what was in pre-training. The goalpost for what transation should be doing is always moving but it still seems unsatisfying to me and I think it's possible that we need new ideas and new methods of training in the companies like people often think about ideas and methods as compute multipliers. Doing this new method is equivalent to spending more compute with the old method. So it kind of saves your compute in that's kind of how we often think about ideas, methods or data. I think there are still major like compute multipliers, ma major ways of saving compute that can lead to better performance without just naively scaling.
MT: And a little bit to the interpretability question in all the current reasoning progress. Do we understand what does what and was it what is responsible for what kind of progress? So if you take test time compute, if you take the ability to to search, if you take RL, do do we know which one of those techniques we should turn the knob on to get better results?
PI: All of those techniques, they don't exist independently. All right. RL is mainly kind of used to teach the model to use test time compute. so you first need to prime the model to to set it up so that it outputs a bunch of tokens before outputting the answer. But then you you spend a compute in RL so that it learns to output the right tokens. So in my mind those two are almost kind of indistinguishable. The RL and test time compute. RL is a method for training and test time compute is maybe just a more general concept. Yeah, you can potentially get to models that use test time compute without RL, but that's not how we are training them right now. Yeah. So, I think the trend has been in spending more and more compute on the RL and getting the models to make better and better use of test time compute. And the tools are also of course extremely important like the the web search that you mentioned and also just the models being able to write Python code and run them produce artifacts for you. that is extremely important for the product and for making the models useful to people. Conceptually I think that's a little bit secondary like in my mind the main thing is you know the the RL and getting the models to to think for a long time.
MT: Speaking of which so I know part of your work currently is on long horizon tasks. First of all what is a long horizon task? Is there is there a you know a number beyond and then what are the specific challenges related to long horizon tasks?
PI: Yeah. Yeah. Long horizon tasks are generally tasks that you cannot complete quickly like that require a lot of work in order to succeed. So for example, writing a full repo for based on an idea is a long horizon task. It's not you know it's not something that you can output in a thousand tokens.
MT: What's working so far? So you know you you hear talking to people you know some people talking about like running agents for like a couple of hours but then some people are talking about like agents running for like 24 hours you know 32 hours where are we in that arc and what is working is not yet working
PI: there is this famous METR plot which shows how long of a task AI is capable of robustly automating and it's been kind of consistently doubling that time every half a year I think and it's now in like some hours so maybe a couple hours. in terms of the methods that are working, right now it would involve some kind of a harness with a bunch of agents that interact or that sequentially solve the task and there has to be some kind of orchestration or maybe like some initial task decomposition and yeah and it's all not very well established I'd say it's it's a new domain and I think we are still figuring out how to best do it.
MT: Let's talk about your new paper that literally came out today. So, first of all, congratulations. And it talks about epiplexity.
PI: Yes.
MT: And that's a new word, right? That's a new term entirely. Among other things, invented a new word in the dictionary. Congratulations on that. So, walk us through the the whole idea at a high level.
PI: I guess I want to quickly give a shout out to my collaborators on this work. The lead authors are Mark Finy. Mark is actually currently at OpenAI working on synthetic data there. but we we were doing our PG together and our PhD adviser is also on the paper. Andrew Wilson but then also there is Shikai and Eding who are other students on the on the paper and then Zika Coulter is who is a professor at Timu he's on the board at OpenAI. He's also yeah on the paper. core idea is to think about how the data can look different for an observer depending on how much compute the observer has. You can imagine that there is some complicated process that generates the data and a very very smart observer that has a lot of compute can fully understand what that data is, understand every aspect of it. But a weaker observer that cannot fully model the data, some parts of the data will look like noise to it. And so the amount of structure that you will see in the data will depend on how much compute you as an observer have. And actually in some cases you can see more structure if you have less compute. which is kind of interesting.
MT: So just to play it back. so given a certain amount of compute you could be feeding the model noisy data. So tons of data but not a lot of interesting stuff in it. Or you could be feeding the model data that has patterns in it and therefore is more interesting to the model because the model can learn more from it.
PI: It's more that even with the same data, it can appear noisy or structured depending on the model. like a very big model can extract patterns that a small model cannot. And that's from a limited understanding in opposition to entropy which is like the amount of noise I guess in the in the data in that case. Maybe the more relevant comparison is we are kind of in a position to shenanigor complexity which are both measures of the information content of the data. They're different but they they share some properties that we think are maybe leading people to have some wrong intuitions potentially about synthetic data for example. so for example there is this idea that if you apply any deterministic transformation to any data you cannot create more information by doing that. You you kind of start with some amount of information and then you transform it deterministically. it will have the same amount of information both according to the shadow information and roughly according to the comor complexity and that kind of leads people to believe that for training language models applying transformations or deterministic kind of changes to the data doesn't necessarily lead to more data doesn't effectively increase the amount of data  - the amount of data or the efficiency of the data - it doesn't lead to having more information in the data like that the model can extract but we argue that That's just not correct. And because the models are like it would be true if the model is has infinite compute. so if the model can fully understand what the deterministic transformation is then it's not going to be able to extract more information from the transformed data than it used to extract from the original data. But with a limit on the compute it's actually very possible to apply deterministic transformations to the data and create information through that. So we have the example of alpha go actually or alpha zero in the paper. Alpha zero doesn't use any human data from the perspective of complexity or the shannon information theory. It doesn't create information. So it's unclear what is actually learned by the model because it's trained on no data. it's can only learn kind of the rules of the game. and that's the only thing. But from this perspective because the model is computationally bounded. It cannot do the full unroll full roll out of all the possible games of go or chess and figure out what's the best move in every possible position. It is actually there is structure that is produced through this deterministic process. and it is the model is able to learn that structure and so we are trying to kind of reconcile these different observations and come up with a notion of structural information that is dependent on the amount of compute that the observer has
MT: and the term itself epiplexity is that a measure of of that?
PI: Yes. Yes. It's a a kind of novel measure of information content of the data
MT: and it's going to be a number on a scale. How does that manifest?
PI: Yeah, it is a number. we can measure it. and it's not easy to measure. so it's kind of a theoretical definition and we prove some things in the paper about the properties of this measure. But yeah, we we also do measure it. So for example, we can approximate it from the scaling loss and we can for example say that text data has more structural information according to this measure than image data at the same kind of amount of yeah tokens.
MT: And as this whole line of research develops, what is the likely impact on industry? Does that mean that we may need comparatively less compute because we know what data to use? what what would be what may happen ?
PI:  in my mind the main impact is conceptual for example for me I'm now very interested in completely synthetic data just data that's generated by some computation you can define some arbitrary programs you can use it to produce infinite data and as we run out of the internet maybe we eventually want to do something like this but then we need to figure out what are you know what are the programs that we should be using there which programs are useful which are not and why I think that's yeah that's going to be very interesting.
MT: Fantastic. All right. So maybe as we start getting to the end of this conversation you know some 2026 sort of predictions or things you're excited about what do you think happens in 2026 in terms of like progress whether that's on like reasoning or alignment or agents or what have you.
PI:  Yeah, I think we will continue to see consistent progress on the reasoning front. and we are like maxing out a lot of the benchmarks that you know have been relevant for recently but maybe will not be relevant anymore and we need to find new ones but I think we'll continue to see that the models are just getting more consistent, getting better, are able to solve more practical problems. Maybe a more like a less confident prediction that I have is that we'll have more multi- aent systems that become practically useful where like instead of just asking a model a question and getting a response you would give a task and then there will be some more complicated multimodel process running in the background and then you get the artifact in return.
MT: I know that you spent some time thinking about the impact of AI on science and math. same idea any predictions there like do you expect important new discoveries to be made by AI solely by AI?
PI:  it's a great question and I think it's in sciences I think that's maybe a little bit more likely. although I I also don't know very much about you know the life sciences. It feels that there some discoveries can be made by potentially combining results from different parts of the literature and like proposing some ideas that turn out to to be true. I think it's hard to imagine the AI making a discovery independently in like a domain where you need experiments because my understanding of a lot of science is it's about doing the experiments and you you need some reasoning to guide what experiments you do but you you also need a lot of iteration and a lot of like actual you know things happening in the physical world and at least for now the AIS are not capable of doing that in the mathematics I think we will see the models getting better on proving technical results, technical lemas maybe including formalization and like things like lean the formal theory proving language. I think the models it's easy to imagine the models becoming better than humans at proving this technical leas quickly. I think the impact on mathematics is very interesting. so it is improving the output of humans already but it also introduces some noise right it also like some of those proofs will be incorrect and they will be incorrect in subtle ways I think it's possible that mathematicians will be good at catching those mistakes but also I think as the models get better they might be more and more deceptive in how they you know frame the mistakes and it's already like for me I would not be able to find the mistake in some like very technical lema that the model is proving. I think we'll have more and more papers produced by mathematicians with more and more AI in it. but also the amount of noise is bigger.
MT: To close: you, you have a lab at NYU. in general, what are some topics that PhD students should focus on? In other words, what what's exciting 2, three, four years out?
PI:  My vision for what academia and my lab in particular should be doing is trying to do more exploratory things, things that are more different from the standard in the industry, but also not necessarily immediately going for very practical things, but instead trying to kind of break down the problems into more understandable kind of fundamental questions and study them carefully, maybe in some compact setting. So so far we've been working on things related to pre-training synthetic data like understanding some behaviors in pre-training when we train on some narrow behaviors some kind of programmatically generated data questions in post- training. So we have some you know algorithmic questions about GRPO but also questions about the interaction between the pre-training and the post- training how to allocate compute; how can you in general set up the pre-training so that the post training will work.
PI: I guess broadly I'm interested in architectures also I think as you mentioned like there is a question of like are the transformers the final architecture maybe they're good enough and maybe also we have this like lesson that with scale like the thing that matters the most is like how how well can you scale the model but but also it seems very likely that that's a major compute multiplier that you can find a much better architecture at least for some tasks I'm pretty confident that the transformers will be highly sub-optimal I think the pre-training like other ways of pre-training and I don't know what they would be maybe that would be some RL inspired pre-training maybe pre-training mostly on synthetic data or maybe just something adversarial we had GANs a long time ago And it seems like something like that needs to come back. Some kind of selfplay where the model is producing its own training tasks.
MT: And to this whole conversation about academia versus industry, do do you think the industry is too focused on short-term wins because there's so much pressure, so much need to demonstrate progress to secure the next massive round of of capital?
PI:  It's hard to tell. I think it's hard to argue against what the industry has been doing just because of how much progress there has been. It is a reasonable bet to make that we're just going to be continuing to execute this extremely well. but I do think that it is also like at least as humanity we need to make other bets as well and we need to explore other ways of training so that we don't like all you know work on this same thing and we don't all just you know bet everything on this approach working out
MT: well thank you so much appreciate it
PI:  yeah thank you so much
No results found