Video Moderation

Video moderation aims to detect malicious information in video files, and give the control suggestions for the moderation result. It supports detecting audio and video data in the file at the same time.

1. Introduction

The caller submits one or more moderation files, and specifies the detection type. The server returns the detection result through asynchronous callback.

Image supported detection types and their labels are listed below:

Detection Type	Description	Action	Primary Label	Secondary Label
Porn recognition	Recognize porn and sexy contents in pictures	v-porn
			normal: normal	normal: normal
			sexy: sexy	female_underwear: female underwear female_sexy_chest_l12: female sexy chest level 12 female_sexy_chest_l3: female sexy chest level 3 female_sexy_chest_l4: female sexy chest level 4 female_backless: female backless female_sexy_leg: female sexy leg female_focus_leg: female focus leg bathing_suit: bathing suit male_topless: male topless male_normal_topless: male normal topless other_sexy: other sexy
			porn: porn	sex_product: sex aids naked_private_part: exposed sensitive parts extensive_naked: extensive naked sex_behavior: sex behavior naked_female_back: naked famale back naked_hip: naked hip sex_bulge: sex bulge focus_female_crotch: focus female crotch focus_male_crotch: focus male crotch hand_on_sexy: hand on sexy lick: lick kiss: kiss sm: SM sperm: sperm naked_child: naked child other_dirty: other dirty tongue_out: tongue out female_focus_hip: female focus hip male_underwear: male underwear porn_pip: porn pip
Terrorism recognition	Recognize bloody and terrorism contents in pictures	v-terrorism	normal: normal fire_explosion: fire explosion gun: gun knife: knife crowd: crowd flag_of_terrorism: flag of terrorism special_dress: special dress disgusted: disgusted with_weapon: with weapon bloody: bloody uniform: uniform	nil
Sensitive information recognition	Recognize sensitive contents in pictures	v-antispam	normal: normal special_building: special building rmb: RMB map_of_China: map of China cartoons_of_leaders: cartoons of leaders flags_of_China: flags of China Tibetan_buddhism: Tibetan buddhism other_antispam: Other sensitive information tank: tank fighter: fighter cannon: cannon battleship: battleship	nil
Sensitive figure recognition	Recognize domestic and overseas politicians, and public figures in pictures	v-sface	normal: normal sface: sensitive figure involved	nil
Illegal recognition	Identify whether the picture contains illegal scene information	v-illegal	normal: normal minor: minor drug: drug drive: drive gamble: gamble smoke: smoke id_infomation: id infomation tattoo: tattoo inbed: lie on the bed	nil
AD recognition	Identify whether the image contains advertising information	v-ad	normal: normal QR_code: QR code bar_code: bar code applet_code: applet code	nil
OCR recognition	Identify whether the picture contains suspected violation text information	v-ocr	normal: normal ocr_politics: politics ocr_terrorism: terrorism ocr_porn: porn ocr_illegal: illegal ocr_abuse: abuse ocr_ad: ad	nil

Voice supported detection types and their labels are listed below:

Detection Type	Description	Action	Label
Moan recognition in audio	Detect voiceprint features in audio files, and recognize illegal features, such as moan	a-porn	normal: normal moan: Moan
Sensitive word recognition in audio	Translate the text of audio files, and recognize sensitive and illegal contents	a-antispam	normal: normal terrorism: terrorism porn: porn illegal: illegal politics: politically sensitive contents abuse: abuse ad: advertisement cheating feudalism: feudalism religion: religiously sensitive contents affairs: Affairs contraband: contrabands minors: minors banned-website: banned websites
Automatic speech recognition	Translate the text of audio files	a-asr	normal: normal

2. Restrictions

Restriction Category	Description
Video format	Support .avi, .mp4, .asf, .wmv, and .mov. For other formats, contact customer support.
Limit on file size	A single video does not exceed 200 MB. For larger videos, contact customer support.
Screenshot interval	Screenshot every 2 seconds
Concurrency restriction	You can submit up to 20 videos for moderation per second, and the system processes up to 200 videos concurrently. For a higher concurrency capacity, contact customer support.
Save duration	The system will automatically save suspected illegal screenshots and audio clips, and return the file URL and detection result to the user. These files will be kept for 3 hours. Their URLs may become invalid over 3 hours. Export the files in time.
Video resolution	At least 128 x 128. The extra-low resolution may affect the recognition effect.
Area restriction	Only in Chinese mainland. For support in other countries and regions, contact customer support.

3. API

3.1 Start the Task

Request APIs

Item	Description
Request method	POST
Request protocol	HTTPS
Request domain name	ai.jocloud.com
Request path	/app/{appid}/v1/video/async?traceId=uuid-xxxx-xxxx-xxxx-xxxx
Request parameters	traceId is a uuid string, and used for problem positioning during troubleshooting. It is suggested to use different values for each request.
Request header	Content-Type: application/json;charset=UTF-8 token: authentication token; see its generation method in Identity Authentication
Request body	json string, defined as follows

Request Parameters

The request parameter, as a json object, is stored in the request body. The specific field is described below:

Name	Type	Required	Description
actions	String array	Yes	Detection type, options including: v-porn: porn recognition v-terrorism: terrorism recognition v-antispam: sensitive information recognition v-sface: sensitive figure recognition v-illegal: illegal recognition v-ad: ad information recognition v-ocr: ocr information recognition Audio detection type, options including: a-porn: moan recognition a-antispam: sensitive word recognition a-asr: automatic speech recognition
data[]	JSON array	Yes	Specify the detection object information list. Each element in the JSON array is a detection task structure (see Table "Request Data" below). A single callback can process up to 5 videos each time.
callback	String	No	Result callback path, supporting HTTP/HTTPS callback. Allow nulls. When the callback address is null, you can obtain the detection result through search APIs (suggested to receive the moderation result through callback).
sequence	String	No	This value is used for the signature in the callback notification request. This field is mandatory for callback. See details about the usage in the description on callback of detection results.

Table: Request Data

Name	Type	Required	Description
dataId	String	Yes	Unique data ID, for example: uuid-xxxx-xxxx-xxxx-xxxx
dataType	String	Yes	Data type. URL is mandatory.
content	String	Yes	HTTP address of a video file to be detected
extra	JSON	No	Extended parameters of audio; see the following table
context	JSON	No	Customized context data, automatically provided when a result is returned

Table: Extended parameters

Name	Type	Required	Description
lang	String	a-antispam detection language, default to Chinese, optional value： - chinese：Chinese - bahasa：bahasa	nil

Return the Result

Name	Type	Required	Description
code	Integer	Yes	Error code, consistent with HTTP status code and also subject to extension, Error code in request
message	String	Yes	Error message description
traceId	String	Yes	Map to traceId in the request parameter
requestId	String	Yes	The unique request ID generated by the system for this request, used for subsequent result callback and status query.
timestamp	Integer	Yes	Current unix timestamp (s)

3.2 Callback of Detection Results

Callback Method

Upon completion of detection, the system accesses the user provided callback address using HTTP POST, and returns the detection result to the user.
To prevent content tampering, add a checksum item to the header of HTTP request, to verify content validity.

The checksum string is generated by the following method:

The sequence + body string data contained in parameters of the starting task generate the checksum value through the SHA256 algorithm.

Callback Contents

The detection result is saved in JSON structure in body, and the specific field is described as follows:

Name	Type	Required	Description
code	Integer	Yes	Error code, consistent with HTTP status code and also subject to extension, Error code in request
message	String	Yes	Error message description
traceId	String	Yes	traceId content in the pass-through request parameter
requestId	String	Yes	The system generates a unique task ID specific to this detection request
timestamp	Integer	Yes	Current unix timestamp (s)
data[]	JSON array	No	Detection result data list (for specific structure, see the table 'Returned Data' below). Each item in the array represents a processing result of one data, and this field may be empty in case of errors.

Table: returned data

Name	Type	Required	Description
code	Integer	Yes	Error code, Error code in data and action
message	String	Yes	Error description
dataId	String	Yes	Map to dataId in the request
taskId	String	Yes	A unique task identifier generated for multiple detection types of this detection object
context	JSON	No	Map to context in the request
results[]	JSON array	No	Return the result data, and exist when the callback succeeds Elements included in the return result mapping to inputted actions. Each element is a structure, and represents the processing result of the mapping action The structures of the results for different actions are listed in the following table.

Table: result

Name	Type	Required	Description
code	Integer	Yes	Error code, Error code in data and action
message	String	Yes	Error description
action	String	Yes	Detection type, mapping to parameters of request actions
label	String	Yes	Detection result label. See the detection types and their labels above.
rate	Floating-point number	Yes	Probability of detection result label, with the value ranging between [0.00 – 1.00]. The larger the value, the higher the credibility.
suggestion	String	Yes	Operation recommended, with the value options: - pass: normal, no operation needed; - review: suspected, detection result uncertain, requiring further manual moderation - block: illegal, suggested to give punishment
duration	Floating-point number	No	Return the length of detected audio as per the audio detection type
text	String	No	Text content of audio clips, provided only when action is "a-antispam" or "a-asr".
segment[]	JSON array	No	'review' and 'block' video frame or audio segment identification result list. Different actions correspond to different segment parameters, see the definition of each action segment below for details

The result structure of result->segment in different action detection is different, including the following situations:

(1)When action is v-porn or v-terrorism or v-antispam or v-sface or v-illegal or v-ad or v-ocr, the structure of result->segment as follow

Name	Type	Required	Description
label	String	Yes	Detection label, the one with the largest rate in all face labels
rate	Floating-point number	Yes	Probability of detection result label, with the value ranging between [0.00 – 1.00]. The larger the value, the higher the credibility.
suggestion	String	Yes	Suggested operation
url	String	Yes	Screenshot address
timeOffset	Floating-point number	Yes	Time from screenshot capturing to video start
extraData[]	JSON array	No	This field only exists when action is 'v-porn' or 'v-sface'. For 'v-porn', the array saves the identified secondary label information, and the element structure is shown below porn table For 'v-sface', the array is saved the face information of all people recognized in the screenshot, the element structure is shown below face table

Table: face

Name	Type	Required	Description
label	String	Yes	Detected face label
rate	Floating-point number	Yes	Probability of detection result label, with the value ranging between [0.00 – 1.00]. The larger the value, the higher the credibility.
name	String	Yes	Detected name of sensitive figure
x	Integer	Yes	X-coordinate of the upper left corner of the detected face in the picture
y	Integer	Yes	Y-coordinate of the upper left corner of the detected face in the picture
w	Integer	Yes	Width of detected face
h	Integer	Yes	Height of detected face

Table. porn

Name	Type	Required	Description
label	String	Yes	secondary label
rate	Floating-point number	Yes	Probability of detection result label, with the value ranging between [0.00 – 1.00]. The larger the value, the higher the credibility.

(2)When action is a-porn, the structure of result->segment as follow

Name	Type	Required	Description
begin	Floating-point number	No	Start time of audio clips (s)
end	Floating-point number	No	End time of audio clips (s)
score	Floating-point number	No	Matching degree of moan, value ranging between:[0–100]. The higher the score, the higher the matching degree.

(3)When action is a-antispam, the structure of result->segment as follow

Name	Type	Required	Description
begin	Floating-point number	No	Start time of audio clips (s)
end	Floating-point number	No	End time of audio clips (s)
extraData[]	JSON array	No	The sensitive word recognition result list of this audio segment, the element structure is shown below antispam-extraData table

Table. antispam-extraData

Name	Type	Required	Description
hint	JSON array	No	Hit keyword
label	String	No	Type of hit keyword
rate	Floating-point number	No	Meaningless, always "1.0"

3.3 Synchronous Search of Results

Due to asynchronous processing, it is suggested to receive the processing result with the above asynchronous callback method. The results can be obtained through polling of synchronous APIs if necessary. The specific description is as below:

Request Method

Item	Description
Request method	GET
Request protocol	HTTPS
Request domain name	ai.jocloud.com
Request path	app/{appid}/v1/video/async/results?traceId=uuid-xxxx-xxxx-xxxx-xxxx&requestId=yyyy
Request parameters	traceId is a uuid string, and used for problem positioning during troubleshooting. It is suggested to use different values for each request. requestId is the request ID to be searched, i.e. requestId carried in the return result of the task submitted for detection.
Request header	Content-Type: application/json;charset=UTF-8 token: Authentication token. See its generation method in Identity Authentication

Return Parameters

Data in body is JSON, and the specific field is described as follows:

Name	Type	Required	Description
code	Integer	Yes	Error code, Error code in request
message	String	Yes	Error message description
traceId	String	Yes	traceId content in the pass-through request parameter
requestId	String	Yes	requestId for the current search, consistent with the request parameter
status	String	Yes	Task status (received-pending, processing-in progress, completed-done)
timestamp	Integer	Yes	Current unix timestamp (s)
data[]	Array	No	Return the result data. When it's successfully called (code==200), see the above table "returned data" for element definition.

4. Sample Code

The following shows the sample code of calling with python:

# -*- coding: utf-8 -*-
#! python3.5

import requests
import uuid
import base64

host = "https://ai.jocloud.com"
appid = 123456789  # Your Service id
restful_id = '********************'  # Your certificate ID
restful_secret = '********************'  # Your certificate key
traceid = str(uuid.uuid4())
dataid = str(uuid.uuid4())

# url
url = host + '/app/%s/v1/video/async/submit?traceId=%s' % (appid, traceid)

# headers
headers = {
    "content-type": "application/json"
}

auth = base64.b64encode(("%s:%s" % (restful_id, restful_secret)).encode('utf-8'))
headers['token'] = 'Base ' + auth.decode()

# The URL of the video file to be identified
file_url = 'http://newcntv.qcloudcdn.com/asp/hls/1200/0303000a/3/default/d67f1b655f0b49be87fdcc84f7f06029/7.ts'

# Context information to be used by the service to assist the subsequent treatment in the callback message, for example: 
context = {
    'myid': 123,
    'myname': 'test'
}

# Identification result and callback address of status, and the identification result and status notificafication are called back through http POST
callback_addr = 'http://mydomain.com/callback'

# content
values = {
    'actions': ['v-sface', 'v-porn'],
    'data': [
        {
            'dataId': dataid,
            'dataType': 'URL',
            'content': file_url,
            'extra': {},
            'context': context
        }
    ],
    'callback': callback_addr,
    'sequence': 'test'
}

# request
res = requests.post(url, json=values, headers=headers)
print ('url=%s\nbody=%s\ncode=%s\ndata=%s\n' % (url, values, res.status_code, res.text))

5. Update History

Version	Time	Description
v2.2.3	2020-10-15	Add new label in OCR recognition
V2.2.2	2020-10-14	Add 'a-asr' action support
v2.2.1	2020-10-13	Add 'inbed' label in Illegal recognition
v2.2.0	2020-08-31	Add ad and ocr recognition; Update labels of 'v-terrorism' and 'v-antispam' recognition
v2.1.0	2020-08-24	Add illegal recognition
v2.0.0	2020-08-17	Add secondary labels in the porn recognition
v1.1.0	2020-06-30	remove 'interval' and 'maxframes' of 'extra' params in start request params, fixed it to screenshot every 2 seconds
V1.0.0	2020-03-13	Initial version