Audio moderation aims to identify malicious information in audio contents, and give the moderation result and handling suggestions.
The caller submits an audio clip for moderation, and specifies the detection type. The server synchronously returns the calling result. The currently available detection types include: moan recognition and sensitive word recognition.
Detection types and their labels are listed below:
Detection Type | Description | Action | Label |
---|---|---|---|
Moan recognition | Detect voiceprint features in audio files, and recognize illegal features, such as moan | porn | normal: normal moan: moan |
Sensitive word recognition | Translate the text of audio files, and recognize sensitive and illegal contents | antispam | normal: normal terrorism: terrorism porn: porn illegal: illegal politics: politically sensitive contents abuse: abuse ad: advertisement cheating feudalism: feudalism religion: religiously sensitive contents affairs: affairs contraband: contraband minors: minors banned-website: banned websites |
Automatic speech recognition | Translate the text of audio files | asr | normal: normal |
Restriction Category | Description |
---|---|
File source | 1) Audio URL starting with HTTP/HTTPS 2) Base64 coded string of audio binary stream |
File duration | In the synchronous detection scenario, the single audio length is no more than 60s |
File size | A single audio is smaller than 10 MB, and a single request should be smaller than 50 MB. |
File format | Support aac, amr, mp3, and wav. For more audio formats, contact customer support. |
Language support | Support sensitive word recognition (Chinese and Bahasa Indonesia). For more language support, contact customer support. |
Timeout limit | The file download timeout is limited to 10 seconds. Please ensure that the file storage service is stable and reliable; it is recommended that the user-side interface call timeout control is 40s |
Concurrency restriction | Process up to 20 audio clips per second ([20QPS]). For higher QPS concurrency, contact customer support. |
Area restriction | Only in Chinese mainland. For support in other countries and regions, contact customer support. |
Item | Description |
---|---|
Request method | POST |
Request protocol | HTTPS |
Request domain name | ai.jocloud.com |
Request path | app/{appid}/v1/audio/sync?traceId=uuid-xxxx-xxxx-xxxx-xxxx |
Request parameters | traceId is a uuid string, and used for problem positioning during troubleshooting. It is suggested to use different values for each request. |
Request header | Content-Type: application/json;charset=UTF-8 token: Authentication token; see its generation method in Identity Authentication |
Request body | json character string, defined as follows |
Name | Type | Required | Description |
---|---|---|---|
actions | String array | Yes | Detection type. Options include: - porn: Moan recognition - antispam: Sensitive word recognition - asr: Automatic speech recognition |
data[] | JSON array | Yes | Specify the detection object information list. Each element in the JSON array is a sound detection object structure (see the request data table below). A single request can process up to 5 audio clips. |
Table: request data
Name | Type | Required | Description |
---|---|---|---|
dataId | String | Yes | Object unique identifier, for example: uuid-xxxx-xxxx-xxxx-xxxx |
dataType | String | Yes | Data type - URL: URL starting with HTTP/HTTPS - BASE64: BASE64 encoded string of binary stream |
content | String | Yes | Audio content to be detected (e.g., enter URL for the dataType field, and enter audio URL for the content field) |
context | JSON | No | Customized context data, automatically provided when a result is returned. |
extra | JSON | No | Extra configure. See the extra table below. |
Table: extra
Name | Type | Description |
---|---|---|
lang | String | Language of the audio clip. -chinese: Chinese -bahasa: Bahasa Indonesia |
The response content is a json object, as defined below:
Name | Type | Required | Description |
---|---|---|---|
code | Integer | Yes | Error code. See the description of error codes, Error code in request |
message | String | Yes | Error message description |
traceId | String | Yes | traceId content in the pass-through request parameter |
requestId | String | Yes | The system generates a unique task identifier specific to this detection request |
timestamp | Integer | Yes | Current unix timestamp (s) |
data[] | JSON array | No | Detection result data list (for specific structure, see the table of returned data below). Each item in the array represents a processing result of one image, and this field may be null in case of errors. |
Table: returned data
Name | Type | Required | Description |
---|---|---|---|
code | Integer | Yes | Error code. See the description of error codes, Error code in data and action |
message | String | Yes | Description of errors |
dataId | String | Yes | Map to dataId in the request |
taskId | String | Yes | A unique task identifier generated for multiple detection types of this detection object |
context | JSON | No | Map to context in the request |
results[] | JSON array | No | Return the result data. When the callback succeeds (code==200), the return result contains one or more elements. Each element represents the processing result of one action, and its specific structure is shown in the Table: Result below. |
Table: result
Name | Type | Required | Description |
---|---|---|---|
code | Integer | Yes | Error code. See the description of error codes, Error code in data and action |
message | String | Yes | Description of errors |
action | String | Yes | Detection type, mapping to the detection type (actions) in the call request |
label | String | Yes | Detection result label; its value is related to action. For specific values, see above moderation types and corresponding label specification table |
rate | Floating-point number | Yes | Probability of detection result label, with the value ranging between [0.00 – 1.00]. The larger the value, the higher the credibility. |
suggestion | String | Yes | Operation recommended, with the value options: - pass: normal, requiring no operation; - block: illegal, suggested to give punishment on illegal contents; - review: suspected; the detection result is uncertain and requires further manual moderation. |
duration | Floating-point number | Yes | Play duration of voice data |
text | String | No | Contents of transliteration text. This field only exists when action is 'antispam' or 'asr'. |
extraData[] | JSON array | No | Extension of detection results; varying for different actions. See details in the Table: porn-extraData and Table: antispam-extraData below. |
Table: porn-extraData
Name | Type | Required | Description |
---|---|---|---|
begin | Floating-point number | No | Start time of audio clip (s) |
end | Floating-point number | No | End time of audio clip (s) |
score | Floating-point number | No | Matching degree of moan, value ranging between:[0–100]. The higher the score, the higher the matching degree. |
Table: antispam-extraData
Name | Type | Required | Description |
---|---|---|---|
hint | Json array | No | Hit keyword |
label | String | No | Type of hit keyword |
rate | Floating-point number | No | Meaningless, always "1.0" |
The following shows the sample code of calling with python:
# -*- coding: utf-8 -*-
#! python3.5
import requests
import uuid
import base64
host = "https://ai.jocloud.com"
appid = 123456789 # Your Service ID
restful_id = '********************' # Your certificate ID
restful_secret = '********************' # Your certificate key
traceid = str(uuid.uuid4())
dataid = str(uuid.uuid4())
# url
url = host + '/app/%s/v1/audio/sync?traceId=%s' % (appid, traceid)
# headers
headers = {
"content-type": "application/json"
}
auth = base64.b64encode(("%s:%s" % (restful_id, restful_secret)).encode('utf-8'))
headers['token'] = 'Base ' + auth.decode()
# content
values = {
'actions': ['porn'],
'data': [
{
'dataType': 'URL',
'content': 'http://127.0.0.1/some-url.aac',
'dataId': dataid,
'context': {'uid': 12345}
}
]
}
# request
res = requests.post(url, json=values, headers=headers)
print('code=%s, data=%s\n' % (res.status_code, res.text))
Response content
{
"code": 200,
"message": "OK",
"traceId": "36b5eaa2-6e56-41ac-9c1a-08a73d8143b4",
"requestId": "a72a64a8-2e15-47be-b900-3e64ca731445",
"timestamp": 1584082795,
"data": [
{
"code": 200,
"message": "OK",
"dataId": "3d9c927a-a0f4-4a73-b712-db5aded30677",
"taskId": "17428f1a-a86a-42c0-9c90-aa579aba436e",
"context": {
"uid": 12345
},
"results": [
{
"action": "porn",
"code": 200,
"duration": 10,
"extraData": [
{
"begin": 0,
"end": 15,
"score": 76
}
],
"label": "moan",
"message": "OK",
"rate": 0.7599999904632568,
"suggestion": "review"
}
]
}
]
}
Version | Time | Description |
---|---|---|
V1.1.0 | 2020-10-14 | Add 'asr' action support |
V1.0.2 | 2020-07-31 | Adjust a single request with up to 5 audio clips |
V1.0.1 | 2020-07-24 | Add Bahasa Indonesia language support |
V1.0.0 | 2020-03-13 | Initial version |