LLM API dành cho Nhà phát triển — Hướng dẫn Tích hợp 2026 | APIMaster.ai

Hướng dẫn dành cho nhà phát triển về LLM API: xác thực, streaming, gọi hàm, embeddings, RAG, mẫu bất đồng bộ và quản lý chi phí. Hoạt động với Claude, GPT và DeepSeek thông qua APIMaster.

LLM API dành cho Nhà phát triển: Hướng dẫn Tích hợp Toàn diện

Hướng dẫn này bao gồm mọi thứ mà một nhà phát triển cần để tích hợp LLM API vào các ứng dụng sản xuất: xác thực, streaming, sử dụng công cụ, embeddings, mẫu RAG và quản lý chi phí. Tất cả các ví dụ đều sử dụng định dạng tương thích với OpenAI và hoạt động với APIMaster.ai.

Thiết lập

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_APIMASTER_KEY",
    base_url="https://apimaster.ai/v1",
)

Các Mẫu Cốt Lõi

1. Hoàn tất Trò chuyện Cơ bản

def ask(prompt: str, model: str = "claude-sonnet-4-6") -> str:
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
    )
    return response.choices[0].message.content

2. System Prompt + Hội thoại

class Conversation:
    def __init__(self, system: str, model: str = "claude-sonnet-4-6"):
        self.model = model
        self.messages = [{"role": "system", "content": system}]
    
    def send(self, user_msg: str) -> str:
        self.messages.append({"role": "user", "content": user_msg})
        resp = client.chat.completions.create(
            model=self.model,
            messages=self.messages,
        )
        reply = resp.choices[0].message.content
        self.messages.append({"role": "assistant", "content": reply})
        return reply

bot = Conversation("Bạn là một nhà phát triển Python chuyên nghiệp.")
print(bot.send("GIL là gì?"))
print(bot.send("Làm thế nào để khắc phục nó?"))

3. Streaming

def stream(prompt: str, model: str = "gpt-5.4"):
    with client.chat.completions.stream(
        model=model,
        messages=[{"role": "user", "content": prompt}],
    ) as s:
        for text in s.text_stream:
            yield text

for chunk in stream("Giải thích async/await trong Python"):
    print(chunk, end="", flush=True)

4. Đầu ra Có Cấu trúc

from pydantic import BaseModel
from typing import List

class ExtractedData(BaseModel):
    entities: List[str]
    sentiment: str
    summary: str

import json

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "system", "content": f"Trích xuất dữ liệu và trả về JSON khớp với lược đồ này: {ExtractedData.schema()}"},
        {"role": "user", "content": "Apple báo cáo doanh thu kỷ lục. CEO Tim Cook gọi đây là một thành tích xuất sắc."},
    ],
    response_format={"type": "json_object"},
)

data = ExtractedData(**json.loads(response.choices[0].message.content))
print(data.entities)    # ["Apple", "Tim Cook"]
print(data.sentiment)   # "positive"

5. Sử dụng Công cụ / Gọi Hàm

import json

tools = [
    {
        "type": "function",
        "function": {
            "name": "execute_sql",
            "description": "Chạy một truy vấn SQL chỉ-đọc",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"},
                    "database": {"type": "string", "enum": ["users", "orders", "products"]},
                },
                "required": ["query", "database"],
            },
        },
    }
]

def handle_tool_call(tool_name: str, args: dict) -> str:
    # Triển khai của bạn
    return json.dumps({"result": "dữ liệu mô phỏng"})

def agent_loop(user_msg: str):
    messages = [{"role": "user", "content": user_msg}]
    
    while True:
        resp = client.chat.completions.create(
            model="gpt-5.4",
            messages=messages,
            tools=tools,
        )
        
        if resp.choices[0].finish_reason != "tool_calls":
            return resp.choices[0].message.content
        
        # Xử lý các lệnh gọi công cụ
        messages.append(resp.choices[0].message)
        for tc in resp.choices[0].message.tool_calls:
            result = handle_tool_call(tc.function.name, json.loads(tc.function.arguments))
            messages.append({"role": "tool", "tool_call_id": tc.id, "content": result})

6. Embeddings

def embed(texts: list[str]) -> list[list[float]]:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=texts,
    )
    return [item.embedding for item in response.data]

# Tương tự ngữ nghĩa
import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

vecs = embed(["Python thật tuyệt", "Tôi yêu Python", "Java dài dòng"])
print(cosine_similarity(vecs[0], vecs[1]))  # Cao: ~0.95
print(cosine_similarity(vecs[0], vecs[2]))  # Thấp hơn: ~0.70

7. RAG (Truy xuất Tăng cường Sinh)

from typing import List

def rag_query(user_question: str, knowledge_base: List[str]) -> str:
    # Bước 1: Nhúng câu hỏi
    q_embedding = embed([user_question])[0]
    doc_embeddings = embed(knowledge_base)
    
    # Bước 2: Tìm tài liệu liên quan nhất
    similarities = [cosine_similarity(q_embedding, d) for d in doc_embeddings]
    top_indices = sorted(range(len(similarities)), key=lambda i: similarities[i], reverse=True)[:3]
    context = "\n\n".join(knowledge_base[i] for i in top_indices)
    
    # Bước 3: Tạo câu trả lời với ngữ cảnh
    response = client.chat.completions.create(
        model="claude-sonnet-4-6",
        messages=[
            {"role": "system", "content": f"Chỉ trả lời bằng ngữ cảnh này:\n\n{context}"},
            {"role": "user", "content": user_question},
        ],
    )
    return response.choices[0].message.content

8. Bất đồng bộ cho Thông lượng Cao

import asyncio
from openai import AsyncOpenAI

async_client = AsyncOpenAI(
    api_key="YOUR_APIMASTER_KEY",
    base_url="https://apimaster.ai/v1",
)

async def process_batch(prompts: list[str]) -> list[str]:
    tasks = [
        async_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": p}],
            max_tokens=100,
        )
        for p in prompts
    ]
    responses = await asyncio.gather(*tasks)
    return [r.choices[0].message.content for r in responses]

# Xử lý 50 prompt đồng thời
results = asyncio.run(process_batch(my_prompts))

Danh sách Kiểm tra Sản xuất

API keys trong biến môi trường, không phải mã nguồn
Logic thử lại với backoff theo cấp số nhân cho lỗi 429/500
max_tokens được đặt để ngăn chi phí vượt tầm kiểm soát
Streaming cho các phản hồi hướng đến người dùng >2 giây
Ghi log yêu cầu với số lượng token để theo dõi chi phí
Bộ giới hạn tốc độ để tuân thủ giới hạn của nhà cung cấp

Chọn Mô hình Phù hợp

Trường hợp Sử dụng	Mô hình	Bậc Chi phí
Tạo mẫu thử	deepseek-v4-flash hoặc gpt-4o-mini	Rất thấp
Chatbot sản xuất	claude-haiku-4-5	Thấp
Trợ lý mã	deepseek-v4-flash hoặc claude-sonnet-4-6	Thấp–Trung bình
Phân tích phức tạp	claude-sonnet-4-6	Trung bình
Nghiên cứu/suy luận	claude-opus-4-8 hoặc o3	Cao

Các Câu hỏi Thường gặp

LLM API là gì? LLM API là một giao diện HTTP cho phép mã của bạn gửi các prompt văn bản và nhận phản hồi do AI tạo ra. Bạn gửi một mảng messages; API trả về một kết quả hoàn tất. Hầu hết sử dụng định dạng OpenAI Chat Completions.

Làm thế nào để chọn giữa các nhà cung cấp LLM API? Hãy xem xét khả năng của mô hình (điểm chuẩn), chi phí mỗi token, độ trễ và độ tin cậy. Đối với hầu hết các trường hợp sử dụng, DeepSeek V4 Flash (mã hóa chi phí thấp), Claude Sonnet (viết/phân tích) hoặc GPT-4o (đa phương thức) bao phủ các nhu cầu phổ biến. APIMaster cho phép bạn chuyển đổi nhà cung cấp chỉ với một dòng lệnh.

API tương thích với OpenAI là gì? Một endpoint triển khai cùng định dạng /v1/chat/completions như OpenAI, cho phép bạn sử dụng thư viện Python openai hoặc bất kỳ công cụ tương thích OpenAI nào với các mô hình không phải của OpenAI.

Làm thế nào để xử lý lỗi LLM API trong sản xuất? Bắt RateLimitError (thử lại với backoff), APIConnectionError (thử lại) và InvalidRequestError (sửa prompt). Sử dụng timeouts và circuit breakers để đảm bảo khả năng phục hồi trong sản xuất.

Tôi có thể sử dụng một API key cho nhiều nhà cung cấp LLM không? Có—APIMaster cung cấp một key và endpoint duy nhất cho GPT, Claude, DeepSeek và Gemini. Chuyển đổi mô hình bằng cách thay đổi tham số model. Không cần key hoặc SDK riêng cho từng nhà cung cấp.

Truy cập LLM API → · So sánh mô hình →