유저별 LLM 비용 모니터링 대시보드 구축

1 서론

본 문서는 Grafana와 Prometheus를 연계하여 AI 모델별 LLM 비용을 효과적으로 모니터링하는 대시보드 구축 방안을 제시한다.
이를 통해 각 모델의 비용 기여도를 명확히 파악하고, 불필요한 지출을 줄이며, 자원 활용의 효율성을 극대화하는 데 기여하고자 한다.

2. 문제 정의 및 목표

2.1 문제 정의

- 여러 AI 사용자가 동시에 LLM을 사용할 때, 각 사용자가 얼마나 많은 LLM 리소스를 사용하고 그에 따른 비용이 얼마인지 파악하기 어려움
- 실시간에 가까운 비용 모니터링이 어려워 즉각적인 대응이 어려움
- LLM 운영 효율성이 저하됨

2.2 목표

- 특정 모델의 비용 급증 원인을 신속하게 확인하고, 비효율적인 자원 사용을 식별하여 최적화에 기여
- LLM 모델별 사용자의 사용량 및 비용을 실시간으로 시각화
- LLM 운영의 효율성 극대화

3. 사용 기술

2.1 Prometheus

- 시계열 데이터를 수집하고 저장하는 오픈소스 모니터링 시스템

2.2 Grafana

- 오픈소스 데이터 시각화 및 대시보드 도구, 다양한 데이터 소스와 연동하여 풍부하고 직곽적인 대시보드 구축 가능

4. 데이터 흐름

4.1 상세 데이터 흐름

- 각 사용자는 LLM 서비스를 통해 LLM API Gateway 요청을 보냅니다.
- 모든 LLM 요청을 중앙에서 처리하고 User ID를 포함한 상제 정보 로깅
- 요청별 사용된 LLM 모델명, 입력/출력 토큰 수, API 응답 시간 등의 정보 기록

4.2 LLM 사용량 데이터 수집

- LLM API Gateway의 로그를 실시간으로 파싱하거나, 각 사용자 애플리케리션에서 직접 LLM 사용량(토큰)을 측정하여 Prometheus exposition format으로 노출
- 각 메트릭에 user_id 레이블 포함

4.3 LLM 비용 데이터 수집

- LLM Provider의 API를 통해 각 LLM 모델별 사용량에 따른 실제 비용 데이터를 주기적으로 조회

4.4 메트릭 노출

- 수집된 사용자별 사용량 및 비용 데이터를 Prometheus가 스크랩할 수 있는 HTTP 엔드포인트에 노출

4.5 메트릭 예시

- llm_user_usage_tokens_total{user_id="alice", llm_model="gpt-4", token_type="input"}
- llm_user_api_calls_total{user_id="charlie", llm_model="gpt-3.5-turbo"}

4.6 prometheus.yml

- Cost/Usage Exporter가 노출하는 /metrics End-point를 주기적으로 스크랩하여 시계열 데이터베이스에 저장, 파일에 exporter의 대상 설정

5. 상세 구축 방안

5.1 prometheus.yml (ex)

global:
  scrape_interval: 15s 

scrape_configs:
  - job_name: 'llm-user-cost-exporter'
    metrics_path: /metrics # 기본값 /metrics
    static_configs:
      - targets: ['llm-user-cost-exporter:9100']

5.2 Cost Exporter

Python

from prometheus_client import start_http_server, Gauge, Counter
import time
import random
import os

# 환경 변수에서 LLM 가격 정보 로드 (실제 시스템에서는 DB, 설정 파일 등에서 관리)
LLM_PRICING = {
    "gpt-4-input": 0.00003,  # USD per token
    "gpt-4-output": 0.00006, # USD per token
    "claude-3-opus-input": 0.000015,
    "claude-3-opus-output": 0.000075,
    "gpt-3.5-turbo-input": 0.0000005,
    "gpt-3.5-turbo-output": 0.0000015,
}

# Prometheus 메트릭 정의
LLM_USER_USAGE_TOKENS_TOTAL = Counter(
    'llm_user_usage_tokens_total',
    'Total number of tokens used by LLM models per user.',
    ['user_id', 'llm_model', 'token_type']
)

LLM_USER_API_CALLS_TOTAL = Counter(
    'llm_user_api_calls_total',
    'Total number of API calls made to LLM models per user.',
    ['user_id', 'llm_model']
)

LLM_USER_COST_USD_TOTAL = Gauge(
    'llm_user_cost_usd_total',
    'Total estimated cost in USD for LLM usage per user.',
    ['user_id', 'llm_model']
)

# LLM 사용량 및 비용 시뮬레이션 함수 (실제로는 LLM Gateway 로그 파싱 또는 API 호출)
def simulate_llm_usage():
    users = ["alice", "bob", "charlie", "diana", "eve"]
    llm_models = ["gpt-4", "claude-3-opus", "gpt-3.5-turbo"]

    for user in users:
        for llm_model in llm_models:
            # 사용자 및 모델별 랜덤 사용량 생성
            input_tokens = random.randint(50, 3000)
            output_tokens = random.randint(20, 1500)
            api_calls = random.randint(1, 8)

            LLM_USER_USAGE_TOKENS_TOTAL.labels(user, llm_model, 'input').inc(input_tokens)
            LLM_USER_USAGE_TOKENS_TOTAL.labels(user, llm_model, 'output').inc(output_tokens)
            LLM_USER_API_CALLS_TOTAL.labels(user, llm_model).inc(api_calls)

            # 비용 계산
            input_cost = input_tokens * LLM_PRICING.get(f"{llm_model}-input", 0)
            output_cost = output_tokens * LLM_PRICING.get(f"{llm_model}-output", 0)
            total_cost_for_this_call = input_cost + output_cost

            # Gauge는 현재 값을 설정. 누적 비용을 표현하기 위해 기존 값에 더함.
            current_cost_value = LLM_USER_COST_USD_TOTAL.labels(user, llm_model)._value
            LLM_USER_COST_USD_TOTAL.labels(user, llm_model).set(current_cost_value + total_cost_for_this_call)

if __name__ == '__main__':
    exporter_port = int(os.environ.get("EXPORTER_PORT", 9100))
    start_http_server(exporter_port)
    print(f"Prometheus exporter listening on port {exporter_port}")

    while True:
        simulate_llm_usage()
        time.sleep(10) # 10초마다 새로운 데이터 생성