Anthropic Upgrades Claude
Curated by
elymc
3 min read
37,725
762
Anthropic has unveiled major upgrades to its Claude AI models, introducing an enhanced Claude 3.5 Sonnet with improved coding capabilities, a new Claude 3.5 Haiku model offering high performance at lower cost, and a groundbreaking "computer use" feature that allows Claude to interact directly with computer interfaces.
Claude 3.5 Sonnet Improvements
The upgraded Claude 3.5 Sonnet model demonstrates significant improvements across various benchmarks, particularly in coding and tool use tasks. Key enhancements include:
- SWE-bench Verified score increase from 33.4% to 49.0%, surpassing all publicly available models12
- TAU-bench performance boost from 62.6% to 69.2% in retail and 36.0% to 46.0% in airline domains1
- Improved GPQA and MMLU Pro scores, outperforming Gemini 1.5 Pro2
1
. Early feedback from companies like GitLab and Cognition indicates substantial improvements in AI-powered coding, with up to 10% stronger reasoning across various use cases1
3
.3 sources
Claude 3.5 Haiku Features
anthropic.com
Claude 3.5 Haiku, Anthropic's newest lightweight model, offers impressive performance at a more affordable price point. Key features include:
- Matches Claude 3 Opus performance on many benchmarks while maintaining lower costs and faster speeds12
- Scores 40.6% on SWE-bench Verified, outperforming the original Claude 3.5 Sonnet and GPT-4 Turbo3
- Initially available as a text-only model, with image support coming later3
- Well-suited for user-facing products, specialized sub-agent tasks, and generating personalized experiences from large data volumes1
2
1
.3 sources
Innovative Computer Use Capability
Anthropic's groundbreaking "computer use" capability allows Claude 3.5 Sonnet to interact directly with computer interfaces, marking a significant advancement in AI functionality. This experimental feature enables Claude to perform tasks by viewing screenshots, moving cursors, clicking buttons, and typing text
1
2
. Key aspects of this innovation include:
- Ability to perceive and interact with user interfaces via an API
- Generalization of skills from simple software training to more complex applications
- Performance on OSWorld benchmark: 14.9% in screenshot-only category, surpassing other AI systems3
- Potential applications in automating repetitive processes, software testing, and open-ended research tasks2
4
. Anthropic emphasizes caution in implementation and expects rapid improvements in the coming months4
.4 sources
Availability and Safety Measures
The upgraded Claude 3.5 Sonnet is now available to all users through Anthropic's API, Amazon Bedrock, and Google Cloud's Vertex AI, while Claude 3.5 Haiku will be released later this month
1
2
. Anthropic has implemented robust safety measures for these new models, including:
- Joint pre-deployment testing conducted by the US and UK AI Safety Institutes1
- New classifiers to identify potential misuse and prevent harm3
- Maintenance of existing safety standards from previous versions4
- Responsible Scaling Policy categorizing Claude 3.5 Sonnet as ASL-2 Standard5
3
1
.5 sources
Related
What safety measures are in place to prevent misuse of the computer use capability
How does the new computer use feature impact data privacy
What are the limitations of the current computer use capability
How does the new customization feature enhance user safety
What feedback has been received on the new Claude 3.5 Sonnet
Keep Reading
Claude 3.5 Sonnet Launch
Anthropic has unveiled Claude 3.5 Sonnet, its latest AI model that sets new benchmarks in intelligence and outperforms competitors across various domains, including graduate-level reasoning, undergraduate-level knowledge, and coding proficiency. Operating at twice the speed of its predecessor while maintaining cost-effectiveness, Claude 3.5 Sonnet marks a significant advancement in Anthropic's AI capabilities.
46,152
Claude Gets Bored
Anthropic's latest AI model, Claude 3.5 Sonnet, has demonstrated unexpected behavior during recent demonstrations, including abandoning a coding task to browse photos of Yellowstone National Park. As reported by Futurism, this incident highlights both the advancing capabilities and current limitations of AI agents designed to autonomously control computers.
39,903
GitHub Cuts AI Coding Deals
GitHub, owned by Microsoft, is enhancing its AI-powered coding assistant Copilot by integrating Google's Gemini and Anthropic's Claude 3.5 Sonnet models, offering developers increased flexibility and capabilities in code generation and chat functions, while also introducing new enterprise features and innovative tools like Project Spark for diverse user needs.
30,359
Claude Debuts Personalized Writing
Anthropic has introduced new features for its AI assistant Claude, including preset writing modes and advanced style customization tools, which allow users to tailor the AI's output to their specific needs while enhancing productivity and maintaining their unique voice.
86,992