自动审核指让 AI 当“内容安全员”,自动帮你检查内容合不合规。
当用户发送/上传一段文字、图片、视频时,大模型自己读、自己看,然后判断内容是否存在如下违规:
有没有骂人、脏话
有没有造谣、诈骗
有没有色情、暴力
有没有违规违法内容
....更多,可以自定义的...
当大模型判断到内容不合规就拦住或提醒,合规就放行。
使用自动审核,就不用人一个个去看,快、准、省人力。而且,更实时、秒级判断。还能理解语气、语境、暗语,不只是查关键词。越用越准,会不断学习。
简单说:AI 自动把关内容,就是大模型自动审核。
人工智能服务能够自动执行内容审核。当检测到不当内容时,会抛出一个 ModerationException,其中包含原始的 Moderation 对象。该对象包含有关被标记内容的信息,例如被标记的具体文本。
构建 AI 服务时可以配置自动审核:
Assistant assistant = AiServices.builder(Assistant.class)
.chatModel(model)
.moderationModel(moderationModel) // 配置审核模型
.build();常用的自动审核模型有:
国内中文 UGC / 社交 / 短视频:优先选豆包、通义千问、文心、混元(合规 + 中文强)
开源私有化、低成本:选 Qwen3-Guard、Llama-Guard
出海 / 英文为主:选 OpenAI Moderation、Claude、Perspective
追求极致准确率:选 DeepSeek-R1、Claude 3 Opus
下面是一个关于自动审核的完整示例:
package com.hxstrive.langchain4j.aiServices;
import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.model.moderation.Moderation;
import dev.langchain4j.model.openai.OpenAiChatModel;
import dev.langchain4j.model.openai.OpenAiModerationModel;
import dev.langchain4j.model.output.Response;
import dev.langchain4j.service.AiServices;
public class SimpleAutoModerationDemo {
// 推荐:将OPEN_API_KEY设置成环境变量, 避免硬编码或随着代码泄露
// 注意,设置完环境变量记得重启IDEA,不然可能环境变量不会生效
private static final String API_KEY = System.getenv("OPEN_API_KEY");
// 定义业务接口
interface Assistant {
String chat(String userMessage);
}
public static void main(String[] args) {
// 创建自动审核模型
OpenAiModerationModel moderationModel = OpenAiModerationModel.builder()
.baseUrl("https://api.xty.app/v1")
.apiKey(API_KEY)
// .modelName("text-moderation-latest")
.modelName("omni-moderation-latest")
.logRequests(true)
.logResponses(true)
.build();
// 创建聊天模型
ChatModel chatModel = OpenAiChatModel.builder()
.baseUrl("https://api.xty.app/v1")
.apiKey(API_KEY)
.modelName("gpt-4.1-mini")
.logRequests(true)
.logResponses(true)
.build();
// 使用 AiServices 创建服务
Assistant assistant = AiServices.builder(Assistant.class)
.chatModel(chatModel) // 设置聊天模型
.moderationModel(moderationModel) // 设置自动审核模型
.build();
// 发起对话
final String userMessage = "我将要杀了你...";
// 自动审核
Response<Moderation> response = moderationModel.moderate(userMessage);
Moderation moderation = response.content();
if(moderation.flagged()) {
System.err.println(moderation.flaggedText());
System.err.println("内容被审核拒绝:" + moderation.flaggedText());
} else {
String answer = assistant.chat(userMessage);
System.out.println(answer);
}
}
}运行示例,输出日志如下:
10:10:11.360 [main] INFO dev.langchain4j.http.client.log.LoggingHttpClient -- HTTP request:
- method: POST
- url: https://api.xty.app/v1/moderations
- headers: [Authorization: Beare...00], [User-Agent: langchain4j-openai], [Content-Type: application/json]
- body: {
"model" : "omni-moderation-latest",
"input" : [ "我将要杀了你..." ]
}
10:10:13.444 [main] INFO dev.langchain4j.http.client.log.LoggingHttpClient -- HTTP response:
- status code: 200
- headers: [:status: 200], ....
- body: {"id":"modr-6122","model":"omni-moderation-latest","results":[{"categories":{"harassment":true,"harassment/threatening":true,"hate":false,"hate/threatening":false,"illicit":false,"illicit/violent":false,"self-harm":false,"self-harm/instructions":false,"self-harm/intent":false,"sexual":false,"sexual/minors":false,"violence":true,"violence/graphic":false},"category_applied_input_types":{"harassment":["text"],"harassment/threatening":["text"],"hate":["text"],"hate/threatening":["text"],"illicit":["text"],"illicit/violent":["text"],"self-harm":["text"],"self-harm/instructions":["text"],"self-harm/intent":["text"],"sexual":["text"],"sexual/minors":["text"],"violence":["text"],"violence/graphic":["text"]},"category_scores":{"harassment":0.6495624744416085,"harassment/threatening":0.5711033371143481,"hate":0.001183863107963002,"hate/threatening":0.0033597857327111024,"illicit":0.039215390025813264,"illicit/violent":0.03455085433180944,"self-harm":0.0005711240507138813,"self-harm/instructions":0.00021478433912260148,"self-harm/intent":0.0002979007248315314,"sexual":0.0007067749478525128,"sexual/minors":0.00000714190139989638,"violence":0.9528195567652853,"violence/graphic":0.00004955359475635505},"flagged":true}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0,"prompt_tokens_details":{"cached_tokens":0,"text_tokens":0,"audio_tokens":0,"image_tokens":0},"completion_tokens_details":{"text_tokens":0,"audio_tokens":0,"reasoning_tokens":0},"input_tokens":0,"output_tokens":0,"input_tokens_details":null}}
我将要杀了你...
内容被审核拒绝:我将要杀了你...下面是对响应结果中的数据进行分析:
{
// 审核请求唯一标识
"id": "modr-6122",
// 所用审核模型版本
"model": "omni-moderation-latest",
// 审核结果列表
"results": [
{
// 违规分类判定结果
"categories": {
// 是否存在骚扰内容
"harassment": true,
// 是否存在威胁性骚扰
"harassment/threatening": true,
// 是否存在仇恨言论
"hate": false,
// 是否存在威胁性仇恨言论
"hate/threatening": false,
// 是否涉及非法内容
"illicit": false,
// 是否涉及暴力类非法内容
"illicit/violent": false,
// 是否涉及自伤相关
"self-harm": false,
// 是否包含自伤指导方法
"self-harm/instructions": false,
// 是否表达自伤意图
"self-harm/intent": false,
// 是否涉及性暗示/色情内容
"sexual": false,
// 是否涉及未成年人色情
"sexual/minors": false,
// 是否涉及暴力内容
"violence": true,
// 是否包含血腥暴力画面描述
"violence/graphic": false
},
// 各分类适用的输入类型(文本/图片等)
"category_applied_input_types": {
"harassment": [
"text"
],
"harassment/threatening": [
"text"
],...
},
// 各违规分类置信度分数(0~1)
"category_scores": {
"harassment": 0.6495624744416085,
"harassment/threatening": 0.5711033371143481,
"hate": 0.001183863107963002,
"hate/threatening": 0.0033597857327111024,
"illicit": 0.039215390025813264,
"illicit/violent": 0.03455085433180944,
"self-harm": 0.0005711240507138813,
"self-harm/instructions": 0.00021478433912260148,
"self-harm/intent": 0.0002979007248315314,
"sexual": 0.0007067749478525128,
"sexual/minors": 0.00000714190139989638,
"violence": 0.9528195567652853,
"violence/graphic": 0.00004955359475635505
},
// 内容是否被模型判定为违规并标记
"flagged": true
}
]
}更多 LangChain4j 知识请阅读后续教程……