Amazon Bedrock introduced cross-region inference on August 27, 2024, as reported by AWS, enabling developers to manage traffic bursts across different AWS Regions and providing increased capacity and resilience during peak demand periods.
The launch of cross-region inference marked a significant enhancement to Amazon Bedrock's capabilities. This feature, introduced on August 27, 2024, allows developers to seamlessly manage traffic bursts by leveraging compute across different AWS Regions1. Key benefits include:
Up to 2x increase in allocated in-region quotas
Enhanced resilience during periods of peak demand
Improved throughput for API users
The cross-region inference feature became generally available immediately upon announcement, enabling customers to take advantage of this powerful functionality for their machine learning workloads23.
On September 13, 2024, Amazon Bedrock expanded its cross-region inference support to include Knowledge Bases1. This enhancement specifically benefits users of the RetrieveAndGenerate API, allowing them to access higher throughput limits. The integration of Knowledge Bases with cross-region inference demonstrates Amazon's commitment to improving the scalability and performance of its AI services, providing developers with more robust tools for managing large-scale machine learning workloads.
As of December 2024, cross-region inference capabilities have been extended to support a wider range of models within Amazon Bedrock. Notably, on November 6, 2024, cross-region inference profiles were introduced for Anthropic Claude and Meta Llama models1. This expansion enhances the flexibility and utility of the feature, allowing developers to leverage advanced language models across different AWS regions. The ongoing development of cross-region inference demonstrates Amazon's commitment to improving the scalability and performance of its AI services, providing users with more options for their machine learning workloads.