Anthropic has unveiled major upgrades to its Claude AI models, introducing an enhanced Claude 3.5 Sonnet with improved coding capabilities, a new Claude 3.5 Haiku model offering high performance at lower cost, and a groundbreaking "computer use" feature that allows Claude to interact directly with computer interfaces.
The upgraded Claude 3.5 Sonnet model demonstrates significant improvements across various benchmarks, particularly in coding and tool use tasks. Key enhancements include:
SWE-bench Verified score increase from 33.4% to 49.0%, surpassing all publicly available models12
TAU-bench performance boost from 62.6% to 69.2% in retail and 36.0% to 46.0% in airline domains1
Improved GPQA and MMLU Pro scores, outperforming Gemini 1.5 Pro2
These advancements come at no additional cost or speed penalty compared to its predecessor1. Early feedback from companies like GitLab and Cognition indicates substantial improvements in AI-powered coding, with up to 10% stronger reasoning across various use cases13.
Claude 3.5 Haiku, Anthropic's newest lightweight model, offers impressive performance at a more affordable price point. Key features include:
Matches Claude 3 Opus performance on many benchmarks while maintaining lower costs and faster speeds12
Scores 40.6% on SWE-bench Verified, outperforming the original Claude 3.5 Sonnet and GPT-4 Turbo3
Initially available as a text-only model, with image support coming later3
Well-suited for user-facing products, specialized sub-agent tasks, and generating personalized experiences from large data volumes1
Claude 3.5 Haiku will be available later this month through Anthropic's API and major cloud providers, offering developers a cost-effective option for high-performance AI capabilities21.
Anthropic's groundbreaking "computer use" capability allows Claude 3.5 Sonnet to interact directly with computer interfaces, marking a significant advancement in AI functionality. This experimental feature enables Claude to perform tasks by viewing screenshots, moving cursors, clicking buttons, and typing text12. Key aspects of this innovation include:
Ability to perceive and interact with user interfaces via an API
Generalization of skills from simple software training to more complex applications
Performance on OSWorld benchmark: 14.9% in screenshot-only category, surpassing other AI systems3
Potential applications in automating repetitive processes, software testing, and open-ended research tasks2
While still in public beta and prone to occasional errors, this capability represents a major step towards AI systems that can navigate and utilize computer interfaces like human users4. Anthropic emphasizes caution in implementation and expects rapid improvements in the coming months4.
The upgraded Claude 3.5 Sonnet is now available to all users through Anthropic's API, Amazon Bedrock, and Google Cloud's Vertex AI, while Claude 3.5 Haiku will be released later this month12. Anthropic has implemented robust safety measures for these new models, including:
Joint pre-deployment testing conducted by the US and UK AI Safety Institutes1
New classifiers to identify potential misuse and prevent harm3
Maintenance of existing safety standards from previous versions4
Responsible Scaling Policy categorizing Claude 3.5 Sonnet as ASL-2 Standard5
These precautions aim to ensure the responsible deployment of increasingly capable AI systems while encouraging developers to explore and provide feedback on the new models and features31.