Synchronous Audio Moderation

Audio moderation aims to identify malicious information in audio contents, and give the moderation result and handling suggestions.

1. Introduction

The caller submits an audio clip for moderation, and specifies the detection type. The server synchronously returns the calling result. The currently available detection types include: moan recognition and sensitive word recognition.

Detection types and their labels are listed below:

Detection TypeDescriptionActionLabel
Moan recognitionDetect voiceprint features in audio files, and recognize illegal features, such as moanpornnormal: normal
moan: moan
Sensitive word recognitionTranslate the text of audio files, and recognize sensitive and illegal contentsantispamnormal: normal
terrorism: terrorism
porn: porn
illegal: illegal
politics: politically sensitive contents
abuse: abuse
ad: advertisement cheating
feudalism: feudalism
religion: religiously sensitive contents
affairs: affairs
contraband: contraband
minors: minors
banned-website: banned websites
Automatic speech recognitionTranslate the text of audio filesasrnormal: normal

2. Restrictions

Restriction CategoryDescription
File source1) Audio URL starting with HTTP/HTTPS
2) Base64 coded string of audio binary stream
File durationIn the synchronous detection scenario, the single audio length is no more than 60s
File sizeA single audio is smaller than 10 MB, and a single request should be smaller than 50 MB.
File formatSupport aac, amr, mp3, and wav. For more audio formats, contact customer support.
Language supportSupport sensitive word recognition (Chinese and Bahasa Indonesia). For more language support, contact customer support.
Timeout limitThe file download timeout is limited to 10 seconds. Please ensure that the file storage service is stable and reliable; it is recommended that the user-side interface call timeout control is 40s
Concurrency restrictionProcess up to 20 audio clips per second ([20QPS]). For higher QPS concurrency, contact customer support.
Area restrictionOnly in Chinese mainland. For support in other countries and regions, contact customer support.

3. API

3.1 Initiate a Request

Request methodPOST
Request protocol HTTPS
Request domain
Request pathapp/{appid}/v1/audio/sync?traceId=uuid-xxxx-xxxx-xxxx-xxxx
Request parameterstraceId is a uuid string, and used for problem positioning during troubleshooting. It is suggested to use different values for each request.
Request headerContent-Type: application/json;charset=UTF-8
token: Authentication token; see its generation method in Identity Authentication
Request bodyjson character string, defined as follows
actionsString arrayYesDetection type. Options include:
- porn: Moan recognition
- antispam: Sensitive word recognition
- asr: Automatic speech recognition
data[]JSON arrayYesSpecify the detection object information list. Each element in the JSON array is a sound detection object structure (see the request data table below).
A single request can process up to 5 audio clips.

Table: request data

dataIdStringYesObject unique identifier, for example: uuid-xxxx-xxxx-xxxx-xxxx
dataTypeStringYesData type
- URL: URL starting with HTTP/HTTPS
- BASE64: BASE64 encoded string of binary stream
contentStringYesAudio content to be detected
(e.g., enter URL for the dataType field, and enter audio URL for the content field)
contextJSONNoCustomized context data, automatically provided when a result is returned.
extraJSONNoExtra configure. See the extra table below.

Table: extra

langStringLanguage of the audio clip.
-chinese: Chinese
-bahasa: Bahasa Indonesia

3.2 Response

The response content is a json object, as defined below:

codeIntegerYesError code. See the description of error codes, Error code in request
messageStringYesError message description
traceIdStringYestraceId content in the pass-through request parameter
requestIdStringYesThe system generates a unique task identifier specific to this detection request
timestampIntegerYesCurrent unix timestamp (s)
data[]JSON arrayNoDetection result data list (for specific structure, see the table of returned data below). Each item in the array represents a processing result of one image, and this field may be null in case of errors.

Table: returned data

codeIntegerYesError code. See the description of error codes, Error code in data and action
messageStringYesDescription of errors
dataIdStringYesMap to dataId in the request
taskIdStringYesA unique task identifier generated for multiple detection types of this detection object
contextJSONNoMap to context in the request
results[]JSON arrayNoReturn the result data. When the callback succeeds (code==200), the return result contains one or more elements. Each element represents the processing result of one action, and its specific structure is shown in the Table: Result below.

Table: result

codeIntegerYesError code. See the description of error codes, Error code in data and action
messageStringYesDescription of errors
actionStringYesDetection type, mapping to the detection type (actions) in the call request
labelStringYesDetection result label; its value is related to action. For specific values, see above moderation types and corresponding label specification table
rateFloating-point numberYesProbability of detection result label, with the value ranging between [0.00 – 1.00]. The larger the value, the higher the credibility.
suggestionStringYesOperation recommended, with the value options:
- pass: normal, requiring no operation;
- block: illegal, suggested to give punishment on illegal contents;
- review: suspected; the detection result is uncertain and requires further manual moderation.
durationFloating-point numberYesPlay duration of voice data
textStringNoContents of transliteration text. This field only exists when action is 'antispam' or 'asr'.
extraData[]JSON arrayNoExtension of detection results; varying for different actions. See details in the Table: porn-extraData and Table: antispam-extraData below.

Table: porn-extraData

beginFloating-point numberNoStart time of audio clip (s)
endFloating-point numberNoEnd time of audio clip (s)
scoreFloating-point numberNoMatching degree of moan, value ranging between:[0–100]. The higher the score, the higher the matching degree.

Table: antispam-extraData

hintJson arrayNoHit keyword
labelStringNoType of hit keyword
rateFloating-point numberNoMeaningless, always "1.0"

4. Sample Code

The following shows the sample code of calling with python:

# -*- coding: utf-8 -*-
#! python3.5

import requests
import uuid
import base64

host = ""

appid = 123456789                       # Your Service ID
restful_id = '********************'     # Your certificate ID
restful_secret = '********************' # Your certificate key
traceid = str(uuid.uuid4())
dataid = str(uuid.uuid4())

# url
url = host + '/app/%s/v1/audio/sync?traceId=%s' % (appid, traceid)

# headers
headers = {
    "content-type": "application/json"

auth = base64.b64encode(("%s:%s" % (restful_id, restful_secret)).encode('utf-8'))
headers['token'] = 'Base ' + auth.decode()

# content
values = {
    'actions': ['porn'],
    'data': [
            'dataType': 'URL',
            'content': '',
            'dataId': dataid,
            'context': {'uid': 12345}

# request
res =, json=values, headers=headers)
print('code=%s, data=%s\n' % (res.status_code, res.text))

Response content

  "code": 200,
  "message": "OK",
  "traceId": "36b5eaa2-6e56-41ac-9c1a-08a73d8143b4",
  "requestId": "a72a64a8-2e15-47be-b900-3e64ca731445",
  "timestamp": 1584082795,
  "data": [
      "code": 200,
      "message": "OK",
      "dataId": "3d9c927a-a0f4-4a73-b712-db5aded30677",
      "taskId": "17428f1a-a86a-42c0-9c90-aa579aba436e",
      "context": {
        "uid": 12345
      "results": [
          "action": "porn",
          "code": 200,
          "duration": 10,
          "extraData": [
              "begin": 0,
              "end": 15,
              "score": 76
          "label": "moan",
          "message": "OK",
          "rate": 0.7599999904632568,
          "suggestion": "review"

5. Update History

V1.1.02020-10-14Add 'asr' action support
V1.0.22020-07-31Adjust a single request with up to 5 audio clips
V1.0.12020-07-24Add Bahasa Indonesia language support
V1.0.02020-03-13Initial version

Was this page helpful?

Helpful Not helpful
Submitted! Your feedback would help us improve the website.