AWS Bedrock RateLimit 해결

문제

토큰이 큰 작업 요청시 Bedrock API 에 RateLimit 이 걸리는 경우

RateLimitError: BedrockException - {"message":"Too many requests, please wait before trying again."}


해결 1

cross-region inference 모델 사용

(참고) AWS의 미국 내 여러 리전을 로드밸런싱 하는 모델 - us.anthropic.claude-3-sonnet-20240229-v1:0

https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html

https://aws.amazon.com/ko/blogs/machine-learning/getting-started-with-cross-region-inference-in-amazon-bedrock/

https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html


해결 2

LiteLLM 에서 로드밸런싱

미국, 아태, 유럽 지역의 API 호출

litellm_config.yml
model_list:
  - model_name: claude-3-sonnet
    litellm_params:
      model: bedrock/apac.anthropic.claude-3-sonnet-20240229-v1:0
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: "ap-northeast-2"

  - model_name: claude-3-sonnet
    litellm_params:
      model: bedrock/us.anthropic.claude-3-sonnet-20240229-v1:0
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: "us-west-2"