anthropic.com
 
Anthropic Upgrades Claude
User avatar
Curated by
elymc
3 min read
37,725
762
Anthropic has unveiled major upgrades to its Claude AI models, introducing an enhanced Claude 3.5 Sonnet with improved coding capabilities, a new Claude 3.5 Haiku model offering high performance at lower cost, and a groundbreaking "computer use" feature that allows Claude to interact directly with computer interfaces.

Claude 3.5 Sonnet Improvements

thezvi.substack.com
The upgraded Claude 3.5 Sonnet model demonstrates significant improvements across various benchmarks, particularly in coding and tool use tasks. Key enhancements include:
  • SWE-bench Verified score increase from 33.4% to 49.0%, surpassing all publicly available models
    1
    2
  • TAU-bench performance boost from 62.6% to 69.2% in retail and 36.0% to 46.0% in airline domains
    1
  • Improved GPQA and MMLU Pro scores, outperforming Gemini 1.5 Pro
    2
These advancements come at no additional cost or speed penalty compared to its predecessor
1
.
Early feedback from companies like GitLab and Cognition indicates substantial improvements in AI-powered coding, with up to 10% stronger reasoning across various use cases
1
3
.
anthropic.com favicon
neowin.net favicon
siliconangle.com favicon
3 sources

Claude 3.5 Haiku Features

anthropic.com
anthropic.com
Claude 3.5 Haiku, Anthropic's newest lightweight model, offers impressive performance at a more affordable price point. Key features include:
  • Matches Claude 3 Opus performance on many benchmarks while maintaining lower costs and faster speeds
    1
    2
  • Scores 40.6% on SWE-bench Verified, outperforming the original Claude 3.5 Sonnet and GPT-4 Turbo
    3
  • Initially available as a text-only model, with image support coming later
    3
  • Well-suited for user-facing products, specialized sub-agent tasks, and generating personalized experiences from large data volumes
    1
Claude 3.5 Haiku will be available later this month through Anthropic's API and major cloud providers, offering developers a cost-effective option for high-performance AI capabilities
2
1
.
anthropic.com favicon
web.swipeinsight.app favicon
neowin.net favicon
3 sources

Innovative Computer Use Capability

Anthropic's groundbreaking "computer use" capability allows Claude 3.5 Sonnet to interact directly with computer interfaces, marking a significant advancement in AI functionality. This experimental feature enables Claude to perform tasks by viewing screenshots, moving cursors, clicking buttons, and typing text
1
2
.
Key aspects of this innovation include:
  • Ability to perceive and interact with user interfaces via an API
  • Generalization of skills from simple software training to more complex applications
  • Performance on OSWorld benchmark: 14.9% in screenshot-only category, surpassing other AI systems
    3
  • Potential applications in automating repetitive processes, software testing, and open-ended research tasks
    2
While still in public beta and prone to occasional errors, this capability represents a major step towards AI systems that can navigate and utilize computer interfaces like human users
4
.
Anthropic emphasizes caution in implementation and expects rapid improvements in the coming months
4
.
anthropic.com favicon
siliconangle.com favicon
web.swipeinsight.app favicon
4 sources

Availability and Safety Measures

The upgraded Claude 3.5 Sonnet is now available to all users through Anthropic's API, Amazon Bedrock, and Google Cloud's Vertex AI, while Claude 3.5 Haiku will be released later this month
1
2
.
Anthropic has implemented robust safety measures for these new models, including:
  • Joint pre-deployment testing conducted by the US and UK AI Safety Institutes
    1
  • New classifiers to identify potential misuse and prevent harm
    3
  • Maintenance of existing safety standards from previous versions
    4
  • Responsible Scaling Policy categorizing Claude 3.5 Sonnet as ASL-2 Standard
    5
These precautions aim to ensure the responsible deployment of increasingly capable AI systems while encouraging developers to explore and provide feedback on the new models and features
3
1
.
anthropic.com favicon
siliconangle.com favicon
web.swipeinsight.app favicon
5 sources
Related
What safety measures are in place to prevent misuse of the computer use capability
How does the new computer use feature impact data privacy
What are the limitations of the current computer use capability
How does the new customization feature enhance user safety
What feedback has been received on the new Claude 3.5 Sonnet
Keep Reading
Claude 3.5 Sonnet Launch
Claude 3.5 Sonnet Launch
Anthropic has unveiled Claude 3.5 Sonnet, its latest AI model that sets new benchmarks in intelligence and outperforms competitors across various domains, including graduate-level reasoning, undergraduate-level knowledge, and coding proficiency. Operating at twice the speed of its predecessor while maintaining cost-effectiveness, Claude 3.5 Sonnet marks a significant advancement in Anthropic's AI capabilities.
46,152
Claude Gets Bored
Claude Gets Bored
Anthropic's latest AI model, Claude 3.5 Sonnet, has demonstrated unexpected behavior during recent demonstrations, including abandoning a coding task to browse photos of Yellowstone National Park. As reported by Futurism, this incident highlights both the advancing capabilities and current limitations of AI agents designed to autonomously control computers.
39,903
GitHub Cuts AI Coding Deals
GitHub Cuts AI Coding Deals
GitHub, owned by Microsoft, is enhancing its AI-powered coding assistant Copilot by integrating Google's Gemini and Anthropic's Claude 3.5 Sonnet models, offering developers increased flexibility and capabilities in code generation and chat functions, while also introducing new enterprise features and innovative tools like Project Spark for diverse user needs.
30,359
Claude Debuts Personalized Writing
Claude Debuts Personalized Writing
Anthropic has introduced new features for its AI assistant Claude, including preset writing modes and advanced style customization tools, which allow users to tailor the AI's output to their specific needs while enhancing productivity and maintaining their unique voice.
86,992