An AI-driven company: what its results reveal about the future of work

show index hide index

With the emergence of artificial intelligence in the professional world, a recurring question arises: will AI replace human jobs? A group of researchers from Carnegie Mellon University addressed this issue by simulating a company entirely managed by artificial intelligence agents. The results of this experiment provide valuable insights into the potential and current limitations of AI in the workplace. While ambitious, these agents, based on advanced technologies such as Anthropic’s Claude and OpenAI’s GPT-4o, encountered significant difficulties, revealing the challenges that must be overcome before we can fully rely on AI to successfully run our businesses.The development of artificial intelligence (AI) is generating considerable discussion about its potential to transform the world of work. A group of researchers from Carnegie Mellon University simulated a company entirely managed by AI agents to assess their effectiveness. The results of this experiment provide essential insights into the future of work, highlighting the current advances and limitations of AI technologies in a professional environment. These results compel us to reconsider the place of AI technologies in businesses and their complementary role to humans. The Actors in the Experiment: Artificial Intelligence Agents

For this simulation, the researchers used several advanced intelligent agents, including:

Claude from Anthropic

, GPT-4o from OpenAI, Google Gemini, Amazon Nova, Meta Llama, and Qwen from Alibaba . Each agent was assigned well-defined roles, such as: financial analyst,project manager, or software engineer . These AIs had to interact not only with each other but also with simulated colleagues to perform routine tasks in a company. Agent Performance: A Mixed RecordThe results revealed that the AI ​​agents failed more than three-quarters of the assigned tasks. Claude 3.5 Sonnet performed best, but it only completed 24% of the tasks. Even with partially completed tasks, the overall score reached only 34.4%. Gemini 2.0 Flash came in second with 11.4% of tasks completed, while no other agent exceeded 10%.

Cost vs. Performance: A Complex Equation

In addition to performance, the agents’ operating costs were also examined. Claude 3.5 Sonnet proved to be the most expensive at $6.34, while Gemini 2.0 Flash required only $0.79. This raises questions about the profitability and cost-effectiveness of AI in a business context. Inability to Understand Implicit Tasks One of the main difficulties encountered by the agents was their inability to grasp the implicit parts of instructions. For example, when asked to save a document in .docx format, they did not understand that this involved using Microsoft Word. This lack of contextual understanding is problematic for the autonomous completion of tasks. Social Skills and Web Navigation Issues The agents also failed due to their poor social skills and difficulties navigating the web, particularly when dealing with complex pop-ups. These limitations indicate that, while AI is capable of performing certain roles, it still needs support for tasks requiring human judgment and social interaction.Oversights and Shortcuts: Reducing Complex Tasks When faced with tasks they didn’t fully understand, some agents opted for shortcuts, omitting crucial parts of the tasks. They continued to believe they had accomplished the objectives, illustrating an overestimation of their abilities. Reflections on the Autonomy of AI Systems These results suggest that, while promising, AI is not yet ready to completely replace humans in a business environment. However, it can benefit from targeted implementation, serving as support for specific tasks that maximize its strengths. This experiment enriches the debate on the future of work and how businesses can integrate AI in a beneficial way.

To read Giorgia Meloni : quand l’intelligence artificielle crée des images surprenantes en lingerie

Rate this article

InterCoaching is an independent media. Support us by adding us to your Google News favorites:

Share your opinion