What We Learned Building an AI Agent for 3D Creation

When we started building Meshy 3D Agent , the obvious problems were easy to name: model quality, texture fidelity, prompt control, and generation speed. Those are still core problems for AI 3D. A weak model generation system gives the user very little to build on. Yet as we watched more people use AI-generated 3D assets in real workflows, we kept seeing the friction move somewhere else. A user might get a model that looks good in the preview, then wonder whether it can be printed. A game developer might generate one nice prop, then struggle to make the next ten assets feel like they belong in the same world. Someone might start with a sketch and only discover halfway through that they want the result to feel like a toy, a game asset, or a physical object. The prompt was rarely the whole brief. The generated model was rarely the whole job. That changed how we thought about the product. A 3D agent has to do more than produce geometry from text or images. It has to help the user move through the unclear space between an idea and an asset they can use. \ A model can look finished before the real work begins One of the first things we learned is that the preview can be misleading. In 3D, a model can look polished on screen while hiding problems that only appear later. A small figurine may look charming in the browser, then become difficult to print because the wings are too thin or the tail has no stable support. A game prop may look fine on its own, then feel out of place once it sits next to other assets in a scene. A character may have a good silhouette, yet still need changes before it can be edited, rigged, or reused. This is one of the ways 3D differs from image generation. With an image, the output is often the final medium. With 3D, the output usually travels somewhere else: Blender, Unity, Unreal Engine, Roblox Studio, a slicer, a prototype review, a marketplace, or a larger creative scene. That makes “usable” a moving target. A printable object has different constraints from a low-poly game prop. A product concept model has different constraints from a decorative object. A stylized environment asset has different constraints from a character that may need animation later. This matters because users do not always describe those constraints upfront. Many people begin with the object in their head, then discover the practical requirements as they go. A useful agent has to recognize when the destination changes the work. For example, “make this dragon statue printable” should affect the next step. The system should think about physical structure, fragile parts, the stability of the base, and whether the geometry is likely to create problems in a slicer. The same dragon statue for a game prototype would raise a different set of questions: style, scale, complexity, materials, and export format. The asset is shaped by where it needs to go. Users often discover the brief while creating A lot of 3D requests begin in a rough form. That is natural. People often know the feeling of what they want before they know the exact object. Take a simple example: “Make me a small cat with wings.” The first output may answer the literal request. It has a cat. It has wings. The user can see it, react to it, and then realize what they actually want. Maybe the cat should feel more like a collectible toy. Maybe the wings should be thicker because the user wants to print it. Maybe the pose should be calmer. Maybe the first version is too realistic, and the user wants something closer to a fantasy game mascot. Maybe the body from one direction works, while the wings from another direction feel better. This is where many AI workflows become fragile. Each new instruction can feel like a restart. The user has to restate context, rewrite the prompt, regenerate, compare, export, test, and repeat. A better flow keeps the creative thread intact. In the winged cat example, the agent should remember the selected style, the purpose of the object, and the user’s previous feedback. If the user later says, “make it more suitable for printing,” the system should understand that this is a practical constraint added to the existing design, rather than a new unrelated prompt. This type of continuity is easy to undervalue until you watch where people get stuck. Creative work often breaks down through small context losses. The user does not always fail because the model is bad. They fail because the path to the next useful step becomes unclear. The agent’s job is to keep that path visible. Context matters more in 3D than we expected \ Context becomes even more important when users create sets of assets. Imagine an indie developer building a cozy low-poly island survival game. They need a wooden hut, a fishing boat, palm trees, rocks, a campfire, tools, food items, and a few characters. Any one of those assets might look acceptable alone. The harder part is making the set feel consistent. If the hut has a handcrafted low-poly style, the boat should probably follow the same visual logic. The rocks should match the same level of detail. The characters should feel like they belong in that world. The materials, proportions, and silhouettes all need some shared language. Then the user asks: “Can you make a tiny spaceship in the same style?” For a human collaborator, that request is clear. For a generation system, “same style” carries a lot of hidden information. It refers to the previous assets, the game’s tone, the level of detail, the color direction, and the intended use. This is where a simple prompt box has limits. It can generate another object, but it may lose the visual brief that has been built over the conversation. A stronger agent experience treats the conversation as a living creative brief. It should carry forward the decisions that matter: style, purpose, constraints, and previous user preferences. When the user asks for a matching asset, the system should have enough context to make that request meaningful. This is especially useful for small teams and solo creators. They often need coherent assets before they have the time or budget for a full art pipeline. An agent that can preserve direction across multiple outputs can help them prototype faster and make early ideas easier to evaluate. The value here is less about a single impressive result. It is about helping the work hold together over several steps. The agent should guide the next step A chat interface by itself has limited value. The useful part is what the system can do with the conversation. Sometimes the next step should be generation. Sometimes the user needs concept options before moving into 3D. Sometimes one clarification will save several failed attempts. Sometimes the model should be adjusted for printing. Sometimes the user needs help choosing between GLB, FBX, OBJ, STL, 3MF, or another format. The agent has to make these moments easier. For a beginner, that may mean translating technical issues into plain language. If a model has parts that may be too thin for printing, the system should explain the risk and suggest a practical fix. If the user wants to edit the file later, the system should guide them toward a reasonable export choice. For an experienced creator, the same agent should stay out of the way and preserve momentum. A game developer may simply want five prop variations in the same style. A 3D artist may use the system for early ideation before taking the asset into a manual workflow. A product designer may want a quick model that helps a team discuss shape and proportion. These users need different levels of guidance. The agent should adapt without forcing everyone into the same process. One of the hardest product questions is when to ask and when to act. Too many questions make the tool feel slow. Too few questions create assets that look fine at first and cause problems later. A good agent should understand when the answer will materially change the workflow. “What is this for?” can be more useful than another prompt field, especially in 3D. What changed in our product thinking Building Meshy 3D Agent pushed us to think about AI 3D as a workflow category. Generation quality remains the foundation. Geometry, texture, speed, controllability, and consistency still matter enormously. Without strong generation, the rest of the experience has nothing solid to build on. The next layer is workflow awareness. Can the system understand the user’s goal? Can it preserve context across multiple steps? Can it help a user explore directions before committing? Can it keep a group of assets visually aligned? Can it prepare outputs for printing, games, prototyping, or further editing? Can it reduce the number of times a user has to restart or switch tools? These questions have become central to how we think about AI 3D products. A one-step generator is useful for quick experiments. More serious creation needs memory, iteration, and a better understanding of where the asset will be used. This is the space where agents become valuable. The long-term opportunity is making 3D easier to start and easier to finish. A student should be able to turn a project idea into a model. A game developer should be able to prototype a coherent asset set quickly. A maker should be able to move from sketch to printable object with fewer failed attempts. A creator should be able to build a small world, collection, or character set without learning every part of the traditional 3D stack first. Professional 3D work will still require judgment, craft, and control. Expert tools are not going away. The agent’s role is to reduce the distance between a rough idea and a useful starting point, then help users move through the parts of the process where they usually get blocked. For many people, that middle stretch is where 3D creation stops. Closing The biggest lesson from building Meshy 3D Agent is that users need help across the whole path from idea to usable asset. That path includes concept exploration, style decisions, generation, review, repair, export, and downstream use. Each step can introduce friction. Each context switch can slow the user down. Each technical requirement can become a blocker for someone who simply wants to make something. Agents are useful in 3D because they can carry context through that journey. They can remember what the user is making, where the asset will go, and what has already been decided. They can help beginners understand the next step and help experienced creators move faster. That is what we are building toward with Meshy 3D Agent: a workflow that helps more people shape an idea into a usable 3D asset with fewer breaks along the way. For most creators, the hardest part of 3D is still getting from an idea to the next workable step. If agents can make that step easier to find, AI 3D will become useful to a much wider group of people. \

View original source — Hacker Noon ↗

ShareShare on X Share on Facebook