
Microsoft used its Build 2026 conference this week to push a clear message: agents are rapidly moving into production throughout enterprise systems, and the winning platform will be the one that gives them reliable context, governance, identity, memory — and secure access to enterprise data. The company announced Microsoft IQ as a context layer across GitHub Copilot, Microsoft Foundry and Copilot Studio; Work IQ APIs coming June 16; Fabric IQ for structured business data; Foundry IQ for retrieval across enterprise knowledge and the live web; and Web IQ as a new agent-facing web search stack. Microsoft also introduced Scout , a personal work agent, and a whopping seven new in-house AI models in its growing MAI family across modalities and use cases, including MAI-Thinking-1. Those announcements sit directly in Marco Casalaina ’s lane. Casalaina is Microsoft’s VP Products, Core AI and AI Futurist . He leads Microsoft’s AI Futures team and previously led teams across Azure AI, including Azure OpenAI, Vision, Speech, Decision, Language, Responsible AI and AI Studio. Before Microsoft , he led Salesforce’s Einstein AI team and earned a computer science degree from Cornell University. CRN reported that he joined Microsoft in early 2022 as vice president of products for Azure Cognitive Services, meaning he has now been at the company for more than four years. VentureBeat spoke with Casalaina ahead of Build about Microsoft’s agent strategy, the company’s model-choice philosophy, how Microsoft IQ fits with MCP, and why he believes enterprises need far more than just access to powerful models. The interview below has been edited for clarity and condensed from the transcript. VentureBeat (VB): To start, can you explain your role at Microsoft and what “AI Futurist” means in practice? Marco Casalaina (MC): I am VP Products of what we call Core AI. Core AI is our set of tools for AI developers, and that includes Foundry, Visual Studio, VS Code, GitHub and GitHub Copilot. That’s our overall group. My Silicon Valley title is AI Futurist, and that has a very concrete meaning here. I’ve worked with other folks who are considered futurists, like Peter Schwartz, and that can be a little bit more fuzzy. For me, what it means concretely is that I am the first person to try anything new here. I am constantly getting things from all over Microsoft, not even just Foundry, because I work with really everybody across the company. Pretty much everybody sends me the new things at all times. Even today, I got something brand new just before this call. I’m usually the first person to try anything new here, which is pretty cool. I get to see a lot of really cool stuff. A friend of mine, who is head of AI at Intuit, calls me an “adjacent possiblist.” I consider my futurist concept to be about a year out from now — the immediate future of what’s about to happen next. That’s what I focus on. VB: Where are you looking at the agentic state of things, and in particular Microsoft’s position as enterprises and individuals rush to adopt agentic AI? MC: We can look at it from bottom to top. At the very base of the stack is our commitment to model choice. All along, we’ve had the OpenAI GPT frontier models. Now we have a really solid partnership with Anthropic, where we’re offering the Claude models. We just launched Claude Opus 4.8 on Azure — on Foundry, I should say — and at Build, we are introducing our new MAI model. The MAI models are a set of frontier models that we’re building in-house. They are made for token efficiency, optimization and customization. We are specifically making them for our customers to customize on their own data sets. One level above that, we are announcing hosted agents in Foundry. That is our managed agent capability in Foundry. It automatically handles scaling, containerization and those kinds of things. It is an environment where you can manage agents. One level above that is the Foundry control plane. At least for the agents you build, you want to have control over them. This gives you observability into their cost, tokens and correctness. You can do continuous evaluations and sample interactions with those agents, run evals and make sure they are continuing to work and not drifting. The big news is going to be the GA of what we call the IQs here at Microsoft. There are currently three, and there will be four. There is Foundry IQ, which is basically for knowledge — largely unstructured knowledge. There is Fabric IQ. We have a ton of customers who have entrusted a lot of data to the Microsoft Cloud in Fabric, Power BI and related technologies. Fabric IQ is about making an agent-facing interface for this data, so agents can get to it without literally going through a Power BI report. That’s ridiculous. Work IQ is about the Microsoft ecosystem. You can look at Work IQ as the agentic face of all the Microsoft apps: Outlook, Teams, Word, SharePoint and all those kinds of things. How does an agent interact with those things? That is Work IQ. And finally, the fourth IQ is Web IQ. We are releasing our new agent-facing web search capability. It can search the web, search through videos and even do some kinds of browsing tasks automatically. It is super fast, and it kind of has no face. It’s headless. The interface is intended for agents. We will also be announcing Agent Optimizer. That includes a new type of evaluation that allows you to evaluate much more granularly whether an agent is actually working and working correctly. The optimization step can go back in and make modifications to the prompt, obviously with your consent, and modify your agent so it works more correctly going forward. Effectively, it creates a feedback loop to make agents work better. VB: Microsoft has sometimes been criticized for murky and clunky product naming. Where do these IQ products sit? Are enterprise users supposed to go to IQ first, or is IQ more for developers to connect to? MC: All of the IQs are headless. The concept of IQ is that each one provides a different type of context to an agent specifically. Largely, it will be developers interacting with the various IQs — developers and the agents they build. The IQ brand is really about agent context. End users largely won’t interact with the IQs. It is true that if you use Microsoft 365 Copilot today, you’ll notice a little thing that says it is using Work IQ. So it is a little bit visible, but the customer or end user doesn’t have to go find the IQ. Their system or developers hook that up. VB: Is the IQ family essentially Microsoft’s version of MCP? Is it using MCP, or is it something different? MC: All of the IQs are indeed exposed as MCP servers. You have correctly characterized MCP as basically an agent-facing or self-describing API. It’s not that fancy. That’s really what it is, with some authentication layers and capabilities built in, which is super useful. Something like Work IQ — really all the IQs — have to be authenticated. In order for Work IQ to see my email, Teams messages, documents and stuff like that, I have to be able to authenticate it on behalf of me. That gets us to another core differentiator that we will be announcing at Build, which is agent identity. We have this Entra system, and Entra is, I believe, the world’s largest used identity system for human users. For some time now, you have been able to declare an agent to have an identity in there. Now, agents will be able to have their own identity, their own Teams box, their own email inbox and stuff like that. These agents will use Work IQ to check their own email, check their own documents and that sort of thing. VB: Enterprises are not one-size-fits-all on models. Microsoft supports many leading models through Foundry and Azure, while also building its own. Is Microsoft a model company, an infrastructure company or a connector between models and work products? MC: The answer is yes. We are obviously the hyperscaler. We are absolutely committed to model choice, and we will continue to offer the frontier models from all of the major players: OpenAI, Anthropic, Mistral, Black Forest, xAI — you name it. They are all going to be represented in there. At the same time, we have what is now called our Microsoft AI Superintelligence Team, formed by Mustafa Suleyman, and we are building our own frontier models as well. Like I said earlier, we are really gearing these models toward optimization — token efficiency, bang for the buck and customization. These are things our customers have been asking for: the ability to more finely customize models, whether that is fine-tuning or continued pre-training. Continued pre-training is literally changing the weights of the model, whereas fine-tuning is adding a little layer on top. We have these capabilities in Foundry: fine-tuning, distillation and those kinds of things. I would note, by the way, that our MAI models are not distilled. Some model providers, especially some of the less scrupulous ones, will distill other models into theirs, and that can have unusual effects. We don’t do that. The data provenance of our models is of primary importance to us. When we come out with these models, we want our customers to know that the data provenance is clean in terms of the rights to the data, where it came from and all that kind of stuff. The choice thing also goes above the model layer. When we talk about Foundry hosted agents, we have the Microsoft Agent Framework. You talk about agent orchestration — how you make agents work together when you have multiple agents — and Microsoft Agent Framework is an excellent framework for that. However, I can make a LangGraph or LangChain Foundry hosted agent. I can make a CrewAI Foundry hosted agent. I can use any number of orchestration frameworks and put that up as a Foundry hosted agent, and it becomes a first-class Foundry agent. That means I get the observability. It shows up in the Foundry control plane. I can do evaluations on it. I can do traces on it. I can get all those things from the Foundry control plane with an agent built in really any framework I choose. VB: Some companies are interested in Chinese and open-source models. How much of Microsoft offering its own models is about giving customers an American version of that? MC: I can’t speak to that exactly. Of course, we offer DeepSeek models and Qwen models in Foundry, so we offer all of these choices today, and our customers can make that choice. The MAI models are really focused on token efficiency and customizability. That is what our customers are demanding, and that is the gap we are filling. VB: As agents take on longer tasks and more specialized work, will enterprises keep expanding the number of models they use, or will there be a winnowing? MC: I do see it expanding. We are not just focused on tokens per se. A token is not a token is not a token. One token is not necessarily equivalent across these things. It is all about what you are doing with each token and the efficiency of that. It comes back to what kind of value you are getting for the cost. That is a lot of the rationale behind why we are developing our own MAI models. Part of my job is to travel all around the world. I’ve been all over the place. For example, I’ve been working with Bayer. One of the things we are measuring is not just token usage, but number of users — monthly active users and daily active users — because we have a lot of first-party capabilities like Microsoft 365 Copilot. Over the last year, we’ve seen a 6x increase in monthly active users. We have over 20 million users of Microsoft 365 Copilot alone. That is on the agents you use. In terms of the agents you build, Bayer put up its own agent system on Foundry, and now it has 20,000 of its own employees on it. A few weeks ago, I was in Sydney, Australia, hanging out with AEMO, the Australian Energy Market Operator. They operate the electrical grid of Australia. They showed me that they had built agents to manage grid operations. This is a human-centered thing. They have grid operators sitting in centers in West Sydney, Brisbane and places like that, and they are bombarded with alerts. I wouldn’t believe it if I hadn’t seen it myself. The alerts are constant. They built a system to triage those alerts. Is this alert a super major thing, or is it just that a transformer is getting a little hot? It also says, here is when we had this problem last time, and here is how we resolved it last time. Maybe now we need to replace this component, or whatever. Ultimately, it is the grid operators making the choice. A lot of our philosophy here is human empowerment. These human-centered agents are the ones that are working best among our customers. What I saw at AEMO and Bayer is this notion of human empowerment: taking away some of the grunt work, or in the case of AEMO, taking billions of alerts and reducing them to something much more manageable and actionable for the people involved. We are moving past the era where agents are just answering questions. AI in general is moving past that. We are not just answering questions anymore. We are moving toward a place where AI can really meaningfully help you do your work. VB: How do observability, tokenomics, ROI analysis and agent governance fit into Microsoft Foundry? MC: That is what the Foundry control plane is all about. We introduced it in November of last year. If you looked at my own Foundry control plane — I’ve built a ton of these agents, and I am a developer by background — you would see all of my agents that are running and the ones that are paused. I can see how many tokens they’ve used over the last day, week or month. I can look at trends. I can look at costs, because the cost will be different depending on what underlying model I’m using. If I’m using our model router, it can route to different models depending on the complexity of the inbound prompt. We also have Azure cost management overall. Azure has had cost management for over a decade, before the AI thing even happened. This integrates with overall Azure cost management. It is not just narrowly about what your AI is doing. Your AI will be using storage resources, data resources and other compute resources around that AI. You can get a complete picture of not just the cost and token usage of the AI itself, but everything around it. When you think about governance, that also extends to evaluation. One of the things we are releasing in preview is rubric-based evaluation. Rubric-based evaluation is much more granular. Let’s say you have built a restaurant reservation agent. The things you want to test about that agent are not really groundedness. Groundedness is the opposite of hallucination, and that is very question-answering. For a restaurant reservation agent, you want to test very granular things. If you say, “Make me a table for two tomorrow,” did it come back and ask, “What time would you like the table?” Before it gave you a table for two tomorrow at 6 p.m., did it actually check that the table was available, or did it randomly give you a table without checking first? There are very granular things you want to test about that specific use case. You don’t just want to test whether the agent works. You want to test whether the agent works right. That is what we are approaching with our new rubric-based evaluation system. You will see that in Satya’s keynote. I have been using it myself lately, and I’m very happy about it. I’ve been waiting for this. VB: Microsoft is also partnering with companies like Anthropic and allowing Claude to work with Microsoft 365. How important is Copilot to this story? Why would someone turn to Copilot over other options? MC: Microsoft 365 Copilot is a huge advantage for us. As I mentioned, we crossed the 20 million user mark on Copilot relatively recently. The great thing about that is that it is the face. When you go into Foundry and make an agent, there is a button that says “publish to Copilot” — actually, it says “publish to Copilot in Teams,” because you can put it in Teams too. The idea is that you want to put these agents where your users are. A lot of people who use the Microsoft ecosystem are in Teams, or they are using Copilot. I can create a custom agent, as many of my colleagues have, and now it is in Copilot, which I use maybe 50 times a day. Since January, Copilot has become more and more capable. I now use it to draft my email. I am not just using it for question answering. I’m starting to use it to manage my calendar and draft emails. I really do this every day now. When I want to use a custom agent — for example, to file my expenses, because we have a custom agent for that now — I can access that agent not in some random standalone interface, but in Copilot or Teams, where I already am. That surface area that people are already engaging with is a major advantage. VB: As people offload more repetitive work to AI, what are they able to spend more time doing? MC: Let’s consider something I did yesterday. I got an email from a customer named Frankie, and he asked me a question about Foundry hosted agents. I knew the answer because I had talked to my colleague Jeff Holland, who is the head of our hosted agents product management. I had asked Jeff the same question two weeks ago. Where or how I asked him, I don’t remember. Was it in Teams? Was it email? Was it a meeting? I don’t really remember. But I knew the answer to the question Frankie was asking. So I went into Copilot and said, “Answer Frankie’s question about how hosted agents scale, and reference the conversation I had with Jeff a couple of weeks ago on this same topic.” And it did it. It drafted the email. Over time, I have taught Copilot my style. I don’t do the bold-print thing. I tell it: don’t use em dashes and that kind of stuff. I have a certain style in the way I write emails. It’s a little terse, to be perfectly honest, but I want it to be the way I write. It drafted this thing. It searched through my Teams messages, my emails and the transcripts of my meetings with Jeff. It used Work IQ, as a matter of fact. It found the answer, drafted the email and provided a link to the documentation that specifically covered the question Frankie was asking. I looked at the draft and thought, yep, that’s it. Yes, I could have composed this email myself. I knew the answer to the question. I could have looked up the documentation. If I dug around, I’m sure I could have found the conversation I had with Jeff in whatever medium that was. I could have done that stuff. It probably would have taken me, I don’t know, an hour to find all the information and compose it. Instead, I did it in about a minute. I had a draft, I looked at it, I was happy with it, I pressed send, and that was the end of that. It really is about giving people time back. It is not even just grunt work. It is all this time you spend looking things up and finding things. Now, I can make it take an action. It didn’t just answer the question. It fully drafted the email and copied Jeff. VB: Do you fear for your job? How has AI changed your own work? MC: I don’t fear for my job. My job has changed. For one thing, I do a lot more now, both in my business life and personal life. This weekend I was using Web IQ, the new Web IQ. I’ve been car shopping. My car’s lease is coming up, and there is a very specific car I’m trying to find, which is hard to find. It’s a Hyundai Ioniq 6, which Hyundai, for whatever reason, has stopped offering in the United States. I’m going to get one, though. I set my agent to the task, using Web IQ, of finding all the Hyundai Ioniq 6s available in the entire Bay Area — everywhere, all the way out to Sacramento, all the way as far south as Gilroy. I set it to this task, and then I went on a hike. When I got back, I had a big long list of all the Hyundai Ioniq 6s, at least the 2024 and 2025 models, available in the entire Bay Area. From that, I started calling down these dealers. Even in my personal life, I’m using it constantly. It saves me a ton of time. That would have taken me hours, to go through every single dealer’s inventory like this. But Web IQ could do that, and it was super quick. VB: Any final thought for developers around this news? MC: Foundry is really the place. This is the place where you can build your agents, scale your agents, test your agents and improve your agents. That’s what it’s all about, and it’s happening.
View original source — VentureBeat ↗
