Common Image Format

Introduction

With the development of artificial intelligence, deep neural networks “blossom” in the field of vision. In order to fulfill the needs of different scenarios, we are exposed to a variety of image data formats. This subsection provides a detailed introduction to several image data formats commonly used in deep learning scenarios: RGB, BGR, YUV, NV12 and Gray.

RGB

RGB, a common color image format. Each pixel point of the image stores the luminance values (0 ~ 255, UINT8) for the red (Red), green (Green), and blue (Blue) color channels.

Based on this, if recorded as (R, G, B), then (255,0,0), (0,255,0), (0,0,255) can represent the purest red, green, and blue, respectively. If the values of the three RGB channels are all 0, the black color is obtained, if the values of the three channels are all maxed out at 255, the white color is obtained.

RGB can represent up to 256x256x256≈16.77 million colors, which is far more than the human eye can perceive (about 10 million), so RGB is used in a wide variety of display fields and is closely related to daily life.

However, RGB has a feature that each pixel must store the R, G, and B channel values at the same time, i.e., each pixel needs 3 bytes of storage space, which is very unfriendly to the storage and transmission of video scenes, and will take up a lot of space and bandwidth.

BGR

The BGR image format is similar to RGB, except that the red, green and blue channels are arranged in a different order. In the RGB format, the channel order of pixel points is red, green, and blue, while in the BGR format, the channel order of pixel points is blue, green, and red.

The BGR format is commonly used in computer vision libraries such as OpenCV and is the default image format for some software and hardware, with which it has better compatibility.

BGR, like RGB, has a large amount of data and is not suitable for the storage and transmission of video scenes. Therefore, we also need other image formats to replace RGB/BGR for video.

YUV

Introduction

YUV, a color image format, where Y denotes Luminance, which is used to specify the brightness of a pixel (which can be interpreted as the degree of black and white), and U and V denote Chrominance or Chroma, which are used to specify the color of a pixel, and each of these values is expressed using UINT8.

The YUV format uses luminance-chrominance separation, which means that only U and V are involved in the color representation, which is different from RGB.

Even without the U and V components, we can “recognize” the basic content of an image based on the Y component alone, except that it is presented as a black and white image. And the U and V components give color to these basics, and the black and white image evolves into a color image. This means that we can minimize the sampling of the U and V components while retaining the Y-component information in order to minimize the amount of data, which is of great benefit to the storage and transmission of video data, which is why YUV is more suitable for the video processing field compared to RGB.

YUV Common Format

According to research, the human eye is more sensitive to luminance information than color information. YUV downsampling is based on the characteristics of the human eye, the human eye is relatively insensitive to the color information for compressed sampling, to get a relatively small file for playback and transmission. According to the percentage of Y and UV, the three commonly used YUV formats are: YUV444, YUV422, and YUV420.

Three graphs are used to visualize the percentage of Y and UV for different acquisition methods.

yuv_common_format

  • YUV444: Each Y component corresponds to a pair of UV components, occupying 3 bytes per pixel (Y + U + V = 8 + 8 + 8 = 24bits).
  • YUV422: Every two Y components share a pair of UV components, occupying 2 bytes per pixel (Y + 0.5U + 0.5V = 8 + 4 + 4 = 16bits)。
  • YUV420: Every four Y components share a pair of UV components, occupying 1.5 bytes per pixel (Y + 0.25U + 0.25V = 8 + 2 + 2 = 12bits).

At this point to understand the 4 in YUV4xx, this 4, in fact, expresses the maximum sharing unit! That is, up to 4 Y's share a pair of UV's.

YUV420 in Detail

In YUV420, a pixel point corresponds to a Y and a small 4X4 square corresponds to a U and V. Each pixel occupies 1.5 bytes. YUV420 can also be categorized into two formats, YUV420P and YUV420SP, based on different UV component arrangements.

YUV420P is arranged as shown below by storing the U first and then the V:

YUV420P

YUV420SP is stored alternately with UV and UV, arranged as shown below:

YUV420SP

At this point, I believe you can understand why the length of YUV420 data in memory is width * height * 3 / 2.

NV12

NV12 is a widely used image format, especially in the field of video codec and automatic driving. NV12 can maintain the image brightness information while the amount of data is half of the RGB/BGR and other formats, which can reduce the time of loading input data for the model, so the embedded side usually chooses the NV12 image as the image data input during deployment.

NV12 image format belongs to the YUV420SP format in YUV color space, which adopts the YUV 4:2:0 sampling method, where every four Y components share a set of U and V components, with the Y stored consecutively and the U and V stored in a crossed manner.

The following is an introduction to the Python sample code for converting an RGB image to NV12 form and a C++ reference implementation of data preparation for NV12 input. Additionally, we provide examples using two common image processing libraries to convert images to NV12 format in Python.

RGB to NV12(Python)

import os import numpy as np def rgb_to_nv12(image_data: np.array): r = image_data[:, :, 0] g = image_data[:, :, 1] b = image_data[:, :, 2] y = (0.299 * r + 0.587 * g + 0.114 * b) u = (-0.169 * r - 0.331 * g + 0.5 * b + 128)[::2, ::2] v = (0.5 * r - 0.419 * g - 0.081 * b + 128)[::2, ::2] uv = np.zeros(shape=(u.shape[0], u.shape[1] * 2)) for i in range(0, u.shape[0]): for j in range(0, u.shape[1]): uv[i, 2 * j] = u[i, j] uv[i, 2 * j + 1] = v[i, j] y = y.astype(np.uint8) uv = uv.astype(np.uint8) # Return separately return y, uv if __name__ == '__main__': image_data = np.array(pil_image_data).astype(np.uint8) y, uv = rgb_to_nv12(image_data)

Data Preparation for NV12 Input(C++)

#include <fstream> #include <iostream> #include <vector> #include <cstring> #include "hobot/dnn/hb_dnn.h" #include "hobot/hb_ucp.h" #include "hobot/hb_ucp_sys.h" int32_t read_image_2_tensor_as_nv12(std::string &image_file, hbDNNTensor *input_tensor) { // the struct of input shape is NHWC int input_h = input_tensor[0].properties.validShape.dimensionSize[1]; int input_w = input_tensor[0].properties.validShape.dimensionSize[2]; cv::Mat bgr_mat = cv::imread(image_file, cv::IMREAD_COLOR); if (bgr_mat.empty()) { std::cout << "image file not exist!" << std::endl; return -1; } // resize cv::Mat mat; mat.create(input_h, input_w, bgr_mat.type()); cv::resize(bgr_mat, mat, mat.size(), 0, 0); // convert to YUV420 if (input_h % 2 || input_w % 2) { std::cout << "input img height and width must aligned by 2!" << std::endl; return -1; } cv::Mat yuv_mat; cv::cvtColor(mat, yuv_mat, cv::COLOR_BGR2YUV_I420); uint8_t *yuv_data = yuv_mat.ptr<uint8_t>(); uint8_t *y_data_src = yuv_data; // copy y data uint8_t *y_data_dst = reinterpret_cast<uint8_t *>(input_tensor[0].sysMem.virAddr); for (int32_t h = 0; h < input_h; ++h) { memcpy(y_data_dst, y_data_src, input_w); y_data_src += input_w; // add padding y_data_dst += input_tensor[0].properties.stride[1]; } // copy uv data int32_t uv_height = input_tensor[1].properties.validShape.dimensionSize[1]; int32_t uv_width = input_tensor[1].properties.validShape.dimensionSize[2]; uint8_t *uv_data_dst = reinterpret_cast<uint8_t *>(input_tensor[1].sysMem.virAddr); uint8_t *u_data_src = yuv_data + input_h * input_w; uint8_t *v_data_src = u_data_src + uv_height * uv_width; for (int32_t h = 0; h < uv_height; ++h) { auto *cur_data = uv_data_dst; for (int32_t w = 0; w < uv_width; ++w) { *cur_data++ = *u_data_src++; *cur_data++ = *v_data_src++; } // add padding uv_data_dst += input_tensor[1].properties.stride[1]; } // make sure memory data is flushed to DDR before inference hbUCPMemFlush(&input_tensor[0].sysMem, HB_SYS_MEM_CACHE_CLEAN); hbUCPMemFlush(&input_tensor[1].sysMem, HB_SYS_MEM_CACHE_CLEAN); return 0; }

Additional Introduction

Convert image to NV12 by PIL

import sys import numpy as np from PIL import Image def generate_nv12(input_path, output_path='./'): img = Image.open(input_path) w,h = img.size # Convert images to YUV format yuv_img = img.convert('YCbCr') y_data, u_data, v_data = yuv_img.split() # Convert Y, U, and V channel data to byte streams y_data_bytes = y_data.tobytes() u_data_bytes = u_data.resize((u_data.width // 2, u_data.height // 2)).tobytes() v_data_bytes = v_data.resize((v_data.width // 2, v_data.height // 2)).tobytes() # Arrange the UV data in the form of UVUVUVUV... uvuvuv_data = bytearray() for u_byte, v_byte in zip(u_data_bytes, v_data_bytes): uvuvuv_data.extend([u_byte, v_byte]) # y data y_path = output_path + "_y.bin" with open(y_path, 'wb') as f: f.write(y_data_bytes) # uv data uv_path = output_path + "_uv.bin" with open(uv_path, 'wb') as f: f.write(uvuvuv_data) nv12_data = y_data_bytes + uvuvuv_data # Save as NV12 format file nv12_path = output_path + "_nv12.bin" with open(nv12_path, 'wb') as f: f.write(nv12_data) # Input for the hbir model y = np.frombuffer(y_data_bytes, dtype=np.uint8).reshape(1, h, w, 1).astype(np.uint8) uv = np.frombuffer(uvuvuv_data, dtype=np.uint8).reshape(1, h//2, w//2, 2).astype(np.uint8) return y, uv if __name__ == "__main__": if len(sys.argv) < 3: print("Usage: python resize_image.py <input_path> <output_path>") sys.exit(1) input_path = sys.argv[1] output_path = sys.argv[2] y, uv = generate_nv12(input_path, output_path)

Convert image to NV12 by OpenCV

import cv2 import numpy as np def image2nv12(image): image = image.astype(np.uint8) height, width = image.shape[0], image.shape[1] yuv420p = cv2.cvtColor(image, cv2.COLOR_BGR2YUV_I420).reshape((height * width * 3 // 2, )) y = yuv420p[:height * width] uv_planar = yuv420p[height * width:].reshape((2, height * width // 4)) uv = uv_planar.transpose((1, 0)).reshape((height * width // 2, )) nv12 = np.zeros_like(yuv420p) # y component nv12[:height * width] = y # uv component, UVUV alternate store nv12[height * width:] = uv # Return separately return y, uv image = cv2.imread("./image.jpg") nv12 = image2nv12(image)

Gray

Gray image format, also known as grayscale image format, is a single-channel image format. In a Gray image, each pixel contains only one luminance value, and each value is represented using the UINT8 type, which is an integer between 0 and 255. This brightness value indicates how bright or dark each pixel in the image, with larger values indicating brighter pixels and smaller values indicating darker pixels.

Gray image format is also a common format for other color image formats (such as RGB, YUV, etc.) when converting to a single-channel image, which contains only the luminance information of the image, and the image data is relatively small. Therefore, for some scenes that are less sensitive to image color information, it still has important application value.

Convert between image formats

In terms of image acquisition and display, RGB is mainly used, but in terms of image storage, processing and transmission, YUV is selected. In a complete application scenario, different image formats may need to be used.

How to realize the conversion between image formats? Can be simply understood that there is a “standard”, based on this standard, through certain mathematical operations can be completed between different image formats. The following computer vision library opencv encapsulated function as an example, see how to realize the image format conversion:

import cv2
# Read images, opencv reads images in BGR format by default bgr_img = cv2.imread('example.jpg') cv2.imwrite('bgr_image.jpg', bgr_img) # Convert BGR format to RGB format rgb_img = cv2.cvtColor(bgr_img, cv2.COLOR_BGR2RGB) cv2.imwrite('rgb_image.jpg', rgb_img) # Convert BGR format to YUV444 format yuv_img = cv2.cvtColor(bgr_img, cv2.COLOR_BGR2YUV) cv2.imwrite('yuv_image.jpg', yuv_img) # Convert BGR format to GRAY format gray_img = cv2.cvtColor(bgr_img, cv2.COLOR_BGR2GRAY) cv2.imwrite('gray_image.jpg', gray_img)

We provide in the OE package between the common image format conversion source code (eg: RGB2NV12, BGR2RGB, etc.), image processing commonly used transformer description of the document please refer to the user manual Image Processing Transformer , The corresponding source code is located under the samples/ai_toolchain/horizon_model_convert_sample/01_common/python/data path of the OE development kit.

Different image formats have different performances and advantages and disadvantages, and in practice, you can personalize the selection of image formats according to your needs.