This standard is developed in accordance with the rules given in GB/T 1.1-2009.
This standard replaces GB/T 25724-2010 Technical specification of surveillance video and audio coding, and has the following main technical changes with respect to GB/T 25724-2010:
——Some terms are added (see 3.1.93 to 3.1.95);
——The coded unit structure is modified (see 5.1.3; 5.1.3 of 2010 Edition);
——The syntax and semantics of code stream are modified (see 5.2.3 and 5.2.4; 5.2.3 and 5.2.4 of 2010 Edition);
——The syntax and semantics of security parameter set are modified (see 5.2.3.2.5 and 5.2.4.4.4; 5.2.3.2.3 and 5.2.4.4.3 of 2010 Edition);
——The selection method of reference picture is modified (see 5.3.3.4; 5.3.3.4 of 2010 Edition);
——The content of intra prediction process is modified (see 5.3.4; 5.3.4 of 2010 Edition);
——The content of inter prediction process is modified (see 5.3.5; 5.3.5 of 2010 Edition);
——The content of transform quantization and reconstruction is modified (see 5.3.6; 5.3.6 of 2010 Edition);
——The content of deblocking filtering process is modified (see 5.3.7; 5.3.7 of 2010 Edition);
——The sample adaptive offset (SAO) is added (see 5.3.8);
——The adaptive loopfilter (ALF) is added (see 5.3.9);
——The content of parsing process is modified (see 5.4; 5.4 of 2010 Edition);
——Annex F is modified, namely, the variable length code table is deleted, and the description of intelligent analysis data is added (see Annex F; Annex F of 2010 Edition).
This standard was proposed by the Ministry of Public Security of the People's Republic of China.
This standard is under the jurisdiction of the National Technical Committee on Security & Protection Systems of Standardization Administration of China (SAC/TC 100).
Drafting organizations of this standard: The First Research Institute of the Ministry of Public Security of P.R.C., Beijing Vimicro, Beijing Zhongdun Security Technology Development Co., Vimicro Electronics Corporation, HISOME Digital Equipment Co., Ltd., Testing Center for Quality of Security & Police Electronic Products under the Ministry of Public Security of P.R.C., Shanxi Zhongtianxin Science & Technology Co., Ltd., Qianmu Juyun Digital Technology (Shanghai) Co., Ltd., Beijing Symboltek Science & Technology Co., Ltd., Hangzhou Hikvision Digital Technology Co., Ltd., Hunan Goke Microelectronics Co., Ltd., Zhejiang Dahua Technology Co., Ltd., Suzhou KEDACOM Science & Technology Co., Ltd., Zhejiang Uniview Technologies Co., Ltd., Tianjin Tiandy Technologies Co., Ltd., Beijing Univision Shendun Security Technology Co., Ltd., Beijing ICETech Science & Technology Co., Ltd., Shanghai Sailing Information Technology Co., Ltd.
Chief drafters of this standard: Chen Chaowu, Deng Zhonghan, Zhi Chen, Qiu Song, Yu Zilong, Zhang Yundong, Dong Qian, Zan Jinwen, Ouyang Dian, Lu Jinghui, Yan Xue, Lin Dong, Shi Juling, Zha Minzhong, Wang Renrui, Liang Minxue, Huang Qilin, Liao Shuanglong, Zhou Wenbo, Ma Li, Xia Changsheng, Zeng Juanjuan, Li Weili, Lu Yuhua, Hu Jianhua, Wang Lei, Sun Darui, Yu Hai, Duan Zhengzhi, Liu Wenyao, Lv Zhuoyi, Jiang Li, Lu Hong, Ni Xin, Ma Wei, Wang Qinjing, Zhang Yong, Xing Peiyin, Wang Dazhi, Wu Shenyi.
The previous edition of this standard is as follows:
——GB/T 25724-2010.
Introduction
Before the release of GB/T 25724-2010 Technical specification of surveillance video and audio coding (hereinafter referred to as SVAC standard), there are no video and audio coding standards for security protection and surveillance in China and internationally, and all video and audio coding standards are designed for applications in regard to radio & television and public entertainment, and are not suitable for direct use in the field of security protection.
The SVAC standard (2010 Edition) was issued on December 23, 2010 and implemented on May 1, 2011. It is a technical standard of digital video and audio coding with independent intellectual property rights in China, which is specially applied in the field of security protection and video surveillance technology. After the issuance and implementation of this standard, the Standardization Administration of China, the Ministry of Public Security, the Ministry of Industry and Information Technology and other departments attached great importance to the promotion and application of the standard, and supported the establishment of Beijing Surveillance Video and Audio Coding Technology Industry Alliance (hereinafter referred to as the SVAC Alliance). All scientific research institutes concerned and the most enterprises are actively engaged in technology research and development and product application around the SVAC industry chain.
During the implementation of the standard, it is found that the SVAC standard still needs to be supplemented and improved in terms of data security protection, improvement of compression performance and coding efficiency, support for intelligence and big data, etc. To this end, the National Technical Committee on Security & Protection Systems of Standardization Administration of China (SAC/TC100) organized the First Research Institute of the Ministry of Public Security, Beijing Vimicro and other units to revise the SVAC standard to make it more advanced and operable.
In recent years, the construction and application of video surveillance system has been expanded from security protection industry to various industries and fields of public security, becoming an important means to maintain national security and social stability under the new situation, and playing an active role in fighting crime, public security prevention, social management, serving the people’s livelihood, etc. This revision fully considers the needs of public security video surveillance networking and application construction, and the content of the standard is generally applicable to all public security industries and fields. Therefore, the standard is renamed as Technical specifications for surveillance video and audio coding.
The main technical features of this standard are as follows:
a) supporting high-precision video data coding, adapting to wide dynamic range, retaining more picture details, and meeting the requirements of faithful to the scenarios; supporting 8 to 12-bit video data;
b) supporting diverse intra and inter prediction, transform quantization, binary arithmetic coding and other technologies to achieve better picture quality and higher coding efficiency;
c) supporting variable quality coding in region of interest (ROI), giving priority to ROI picture quality under limited transmission network bandwidth or data storage space, saving non-ROI overhead, providing high-quality video coding that is more suitable for surveillance needs, and improving the overall performance of surveillance system;
d) supporting scalable video coding (SVC) and video data coding by layers, and meeting the needs of different transmission network bandwidth and data storage environment;
e) supporting dual-core audio coding for switching of algebraic code excited linear prediction (ACELP) and transform audio coding (TAC), which ensures better coding effect of both speech signals and environment (background) sound;
f) supporting coding of voice recognition feature parameters to avoid the influence of coding distortion on speech recognition and voiceprint recognition;
g) supporting surveillance specific information such as absolute time reference information and intelligent analysis information, which are transmitted and stored together with video and audio compression coding data through special syntax, specifying the mode of carrying the commonly used intelligent analysis information, for the convenience of fast retrieval, classified query, video and audio synchronization, and comprehensive application of surveillance data;
h) supporting data security protection, strengthening the support for national cryptographic algorithm, improving the security parameter set, adding the content such as abstract and signature algorithm identification, defining the requirements for carrying keys and digital certificate related information, and supporting video data encryption and authentication.
Description of related patents
The issuing body of this document draws attention to the fact that claims of compliance with this document may involve the use of a patent concerning the related contents in 5.2.3.1, 5.2.3.2, 5.2.4.2, 5.2.4.4, 5.2.4.7, 6.1.2, 6.1.4, 6.2.6.1.3, 6.2.6.1.4.10 and 6.5.2.2.
The issuing body of this document takes no position concerning the evidence, validity and scope of this patent right.
The holder of this patent right has assured the issuing body of this document that he/she is willing to negotiate licenses under reasonable and non-discriminatory terms and conditions with any applicant. The statement of the holder of this patent right is registered with the issuing body of this document. Information may be obtained from:
Name of holder of patent right Address
Beijing Vimicro Shining Tower, No.35, Xueyuan Road, Haidian District, Beijing, 100191, China
Beijing Zhongdun Security Technology Development Co. No.1, Shouti South Road, Haidian District, Beijing, 100048, China
Vimicro Electronics Corporation 2F, Building A1, Tianjin University Science Park, No.80, Fourth Avenue, Tianjin Economic-Technological Development Area, 300457, China
Wuhan University Wuhan University, Wuhan, Hubei, 430079, China
Contact: Zeng Juanjuan
Postal address: 16F, Shining Tower, No.35, Xueyuan Road, Haidian District, Beijing, China
Postal code: 100191
E-mail: zengjuanjuan@vimicro.com
Tel.: 0086-(0)10-68948888-8950
Fax: 0086-(0)10-68944075
Contact: Li Weili
Postal address: No.1, Shouti South Road, Haidian District, Beijing, China
Postal code: 100048
E-mail: lwl@zhongdun.com.cn
Tel.: 0086-(0)10-68773553-6387
Fax: 0086-(0)10-68773553-6215
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights other than those identified above. The issuing body of this document shall not be held responsible for identifying any or all such patent rights.
Technical specifications for surveillance video and audio coding
1 Scope
This standard specifies the decoding process of digital video and audio compression coding for public security video surveillance applications.
This standard is applicable to real-time compression, transmission, playback, storage, etc. of video and audio in the field of public security, and it may also be referred to for other fields that require video and audio coding.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.
rfc 3548 The Base16, Base32, and Base64 Data Encodings
3 Terms, definitions and abbreviations
3.1 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
3.1.1
NAL unit
a syntax structure that contains type indication of subsequent data and number of bytes it contains, with data in the form of RBSP, also including scattered emulation prevention bytes as necessary
3.1.2
NAL unit stream
a sequence of NAL units
3.1.3
reserved
specific values for certain syntax elements
Note: The values are provided for future use by the SVAC working group of China. Bitstreams conforming to this standard shall not use these values, but they may be used in the future extensions of this standard.
3.1.4
closed-loop pitch search
an estimation of pitch delay from weighted input signal and LTP filter status, also known as adaptive codebook search
3.1.5
bitstream
a sequence of bits consisting of coded video and audio and associated data, which can be used to represent both a NAL unit stream and a byte stream
3.1.6
transform coefficient
a scalar in frequency domain, which is associated with a particular one- or two-dimensional frequency index in the inverse transform part of the decoding process
3.1.7
transform coefficient level
an integer value associated with a particular two-dimensional frequency index in the decoding process for calculation of a transform coefficient
3.1.8
encoding process
a process that produces a bitstream conforming to this standard; video coding process is not stipulated in this standard
3.1.9
encoder
an embodiment of an encoding process, including software and hardware
3.1.10
tile
an integral number of coding tree units arranged inside a rectangular area in raster scan order
3.1.11
coded video sequence
a sequence of pictures that consists, in decoding order, of an IDR picture followed by zero or more non-IDR pictures
3.1.12
coded picture
a coded representation of a picture
Note: A coded picture conforming to this standard is a coded frame.
3.1.13
coded picture buffer
a first-in first-out buffer for storage in decoding order
3.1.14
coded frame
a coded representation of a frame
3.1.15
residual
difference between a predictor of a sample or data element and its decoded value
3.1.16
reference index
an index of reference picture
3.1.17
reference picture
a sample picture for inter prediction in a decoding process of subsequent pictures in decoding order
3.1.18
reference frame
a frame marked as a reference picture for inter prediction in a decoding process
3.1.19
parameter
a syntax element in a sequence parameter set, a picture parameter set, or a security parameter set, also used in the term quantization parameter
3.1.20
layer
a set of syntactic structure in a non-branching hierarchical relationship; higher layers contain lower layers; coding layer refers to a coded picture sequence layer, a picture layer, a tile layer, and a coding unit layer; SVC pictures of different layers have different scalability (e.g., different spatial resolution)
3.1.21
algebraic codebook
a set consisting of pulse amplitude and position; through codeword index k, pulse amplitude and position of the kth excitation code vector can be obtained according to certain rules
3.1.22
profile
a specific subset of syntax in this standard
3.1.23
immittance spectral pair
transform of LP coefficients by decomposing transmission function A(z) of an inverse filter into an even symmetric and an odd symmetric polynomial function to indicate roots of the function on unit circle
3.1.24
bin
1 bit in a bin string
3.1.25
bin string
a string of bins, a binary representation of binarized syntax elements
3.1.26
binarization
a unique mapping of all possible values of a syntax element to a set of bin strings
3.1.27
inverse transform
a process of converting a matrix of transform coefficients into a matrix of spatial domain samples
3.1.28
emulation prevention byte
a byte equal to 0x03, which may be present in a NAL unit, so as to ensure that no start code prefix is contained in the subsequent byte-aligned byte stream of a NAL unit
3.1.29
non-reference picture
a picture that is not used for inter coding of any other pictures
3.1.30
component
a matrix of three sample matrices (one luma matrix and two chroma matrices) of a picture or a single sample in a matrix
In the audio part, it also refers to elements in a vector or certain frequency components in a signal.
3.1.31
perceptual weighting filter
a filter for reducing peak-valley subjective perceptual noise by distributing large distortion in formant zone based on noise masking characteristics at the formant
Foreword i
Introduction iii
1 Scope
2 Normative references
3 Terms, definitions and abbreviations
3.1 Terms and definitions
3.2 Abbreviations
4 Conventions
4.1 Arithmetic operators
4.2 Logical operators
4.3 Relational operators
4.4 Bitwise operators
4.5 Assignment operators
4.6 Mathematical functions
4.7 Syntax elements, variables and tables
4.8 Textual description of logical operators
4.9 Process
5 Videos
5.1 Format of coded bitstream and output data
5.2 Syntax and semantics
5.3 Decoding process
5.4 Parsing process
6 Audios
6.1 Overview
6.2 Description of encoder function
6.3 Decoder function description
6.4 Bit allocation description
6.5 Storage and transmission interface formats
Annex A (Normative) Hypothetical Reference Decoder (HRD)
Annex B (Normative) Byte stream format
Annex C (Normative) Profile and level of video
Annex D (Normative) Video usability information (VUI)
Annex E (Normative) Supplement Enhancement Information (SEI)
Annex F (Normative) Description of intelligent analysis data
Annex G (Normative) Profile and level of audio
Annex H (Normative) Definition of abnormal sound and event types
Annex I (informative) Voice Activity Detection (VAD)
Annex J (Informative) Noise reduction
Bibliography