# 文档搜索API使用说明

## 概述

本文档介绍了文档搜索API接口，该接口提供文档检索、知识库列表查询和文档下载功能。

**基础URL**: `https://ai.ctc-zc.com:8100/api`

**认证方式**: Bearer Token（除健康检查接口外）
```
Authorization: Bearer <your-api-key>
```

## API端点

### 1. 搜索文档

**接口地址**: `POST /document_search/search`

**需要认证**: 是

**请求头**:
```
Content-Type: application/json
Authorization: Bearer <api-key>
```

**请求体参数**:

| 参数名 | 类型 | 必填 | 默认值 | 描述 |
|--------|------|------|--------|------|
| query | string | 是 | - | 搜索查询词 |
| classification_ids | array[int] | 否 | [] | 分类ID列表 |
| kb_names | array[string] | 否 | ["mu_34_1740625285897"] | 知识库名称列表，支持多个知识库 |
| max_documents | integer | 否 | 6 | 最大返回文档数 (1-20) |

**响应格式**:

```json
{
  "code": 200,
  "msg": "success",
  "data": {
    "documents": [
      {
        "id": 371633,
        "kb_name": "mu_34_1740625285897",
        "file_name": "example.pdf",
        "file_ext": ".pdf",
        "file_version": 1,
        "document_loader": "PyPDFLoader",
        "text_splitter": "MarkdownHeaderTextSplitter",
        "create_time": "2025-01-01T00:00:00",
        "file_mtime": "2025-01-01T00:00:00",
        "file_size": 1024000,
        "custom_docs": false,
        "docs_count": 50,
        "character_count": 100000,
        "md_filename": "example.md",
        "md_content": "文档的Markdown内容...",
        "classification_ids": [1, 2],
        "url": "https://ai.ctc-zc.com:8100/api/document_search/download_doc?knowledge_base_name=mu_34_1740625285897&file_name=example.pdf"
      }
    ],
    "total_count": 15,
    "query": "混凝土材料性能",
    "classification_ids": [1, 2, 3],
    "kb_names": ["mu_34_1740625285897", "mu_34_1740625318986"]
  }
}
```

**响应字段说明**:

| 字段名 | 类型 | 描述 |
|--------|------|------|
| code | integer | 响应状态码 |
| msg | string | 响应消息 |
| data | object | 响应数据 |
| data.documents | array | 文档列表（直接返回 kb_file 对象） |
| data.total_count | integer | 文档总数 |
| data.query | string | 原始查询词 |
| data.classification_ids | array[int] | 使用的分类IDs |
| data.kb_names | array[string] | 使用的知识库名称列表 |

**Document (kb_file) 字段说明**:

| 字段名 | 类型 | 描述 |
|--------|------|------|
| id | integer | 文档ID |
| kb_name | string | 知识库名称 |
| file_name | string | 原始文件名 |
| file_ext | string | 文件扩展名 |
| file_version | integer | 文件版本 |
| document_loader | string | 文档加载器名称 |
| text_splitter | string | 文本分割器名称 |
| create_time | datetime | 创建时间 |
| file_mtime | datetime | 文件修改时间 |
| file_size | integer | 文件大小（字节） |
| custom_docs | boolean | 是否自定义文档 |
| docs_count | integer | 文档块数量 |
| character_count | integer | 字符总数 |
| md_filename | string | Markdown文件名 |
| md_content | string | Markdown内容 |
| classification_ids | array[int] | 分类ID列表 |
| url | string | 下载URL（自动拼接） |

---

### 2. 下载知识库文档

**接口地址**: `GET /document_search/download_doc`

**需要认证**: 是

**请求头**:
```
Authorization: Bearer <api-key>
```

**查询参数**:

| 参数名 | 类型 | 必填 | 默认值 | 描述 |
|--------|------|------|--------|------|
| knowledge_base_name | string | 是 | - | 知识库名称 |
| file_name | string | 是 | - | 文件名称（可以是原始文件名或Markdown文件名） |
| preview | boolean | 否 | false | 是否在线预览（true: 浏览器内预览；false: 下载） |

**功能说明**:
- 支持通过原始文件名（如 `example.pdf`）或 Markdown 文件名（如 `example.md`）下载
- 如果直接查找失败，系统会自动从数据库查询文件详情并尝试使用原始文件名
- 确保文件路径安全，防止目录遍历攻击

**请求示例**:
```
GET /document_search/download_doc?knowledge_base_name=mu_34_1740625285897&file_name=example.pdf&preview=false
```

**响应**:
- 成功: 返回文件流（FileResponse）
- 失败: 返回JSON错误信息

**错误响应示例**:
```json
{
  "detail": "文件不存在: example.pdf"
}
```

---

### 3. 获取知识库列表

**接口地址**: `GET /document_search/list_knowledge_bases`

**需要认证**: 是

**请求头**:
```
Authorization: Bearer <api-key>
```

**功能说明**: 
- 只返回在ID映射关系中存在的知识库（分类ID 1-7对应的知识库）
- 过滤掉其他无效的知识库
- 返回有效的kb_name列表供参考

**响应格式**:

```json
{
  "code": 200,
  "msg": "success",
  "data": {
    "knowledge_bases": [
      {
        "id": 1,
        "kb_name": "mu_34_1740625285897",
        "ch_name": "材料分类1",
        "kb_info": "知识库描述",
        "customer_id": 34,
        "classification_type": 0,
        "create_time": "2025-01-01T00:00:00",
        "file_count": 100
      }
    ],
    "total_count": 7,
    "valid_kb_names": [
      "mu_34_1740625285897",
      "mu_34_1740625318986",
      "mu_34_1740625346474",
      "mu_34_1740625303475",
      "mu_34_1740625365079",
      "mu_34_1740625355308",
      "mu_34_1740625376621"
    ]
  }
}
```

**响应字段说明**:

| 字段名 | 类型 | 描述 |
|--------|------|------|
| code | integer | 响应状态码 |
| msg | string | 响应消息 |
| data | object | 响应数据 |
| data.knowledge_bases | array | 知识库列表 |
| data.total_count | integer | 知识库总数 |
| data.valid_kb_names | array[string] | 有效的知识库名称列表 |

**KnowledgeBase 字段说明**:

| 字段名 | 类型 | 描述 |
|--------|------|------|
| id | integer | 知识库ID |
| kb_name | string | 知识库名称 |
| ch_name | string | 中文名称 |
| kb_info | string | 知识库描述 |
| customer_id | integer | 客户ID |
| classification_type | integer | 分类类型 |
| create_time | string | 创建时间（ISO格式） |
| file_count | integer | 文件数量 |

---

### 4. 健康检查

**接口地址**: `GET /document_search/health`

**需要认证**: 否（公开访问）

**响应格式**:

```json
{
  "code": 200,
  "msg": "success",
  "data": {
    "status": "healthy",
    "service": "document_search_api"
  }
}
```
```

## 错误处理

API可能返回以下错误：

### HTTP状态码

| 状态码 | 说明 |
|--------|------|
| 200 | 请求成功 |
| 400 | 请求参数格式错误 |
| 401 | 缺少认证信息或认证格式错误 |
| 403 | API密钥无效或非法访问 |
| 404 | 资源不存在（知识库或文件未找到） |
| 500 | 服务器内部错误 |

### 错误响应格式

```json
{
  "detail": "错误描述信息"
}
```

### 常见错误信息

| 错误信息 | 说明 |
|---------|------|
| Missing authorization header | 缺少Authorization头 |
| Invalid authorization header format. Expected 'Bearer <api_key>' | Authorization头格式错误 |
| API key is required | API密钥为空 |
| Invalid API key for document search service | API密钥无效 |
| Don't attack me | 非法的知识库名称 |
| 未找到知识库 {name} | 知识库不存在 |
| 文件不存在: {filename} | 文件不存在 |
| 文档搜索失败: {error} | 搜索过程中发生错误 |
| 文件下载失败: {error} | 下载过程中发生错误 |