6. Model Deployment Advanced Guide

6.1. Overview

This section provides code examples for running joint models generated by the Pulsar compilation, all provided by the ax-samples project. The purpose of the project is to provide sample code for the deployment of the industry’s leading open source algorithm models, and to facilitate the community to quickly evaluate and adapt AXera’s chips.

6.1.1. Access

Hint

The offline version is obtained from GitHub as this document is released, and there is a delay, so please choose the GitHub version if you want to experience the latest features.

6.1.2. ax-samples Introduction

The current ax-samples has validated but is not limited to the following open source models:

  • Classification Models

    • SqueezeNetv1.1

    • MobileNetv1

    • MobileNetv2

    • ResNet18

    • ResNet50

    • VGG16

    • Others……

  • Detection Model

    • PP-YOLOv3

    • YOLOv3

    • YOLOv3-Tiny

    • YOLOv4

    • YOLOv4-Tiny

    • YOLOv5m

    • YOLOv5s

    • YOLOv7-Tiny

    • YOLOX-S

    • YOLO-Fastest-XL

  • Human Detection

    • YOLO-Fastest-Body

  • Face Detection

    • scrfd

  • Obstacle detection (sweeper scene)

    • Robot-Obstacle-Detect

  • 3D Monocular Vehicle Detection

    • Monodlex

  • Human Body Key Points

    • HRNet

  • Human Segmentation

    • PP-HumanSeg

  • Semantic Segmentation

    • PP-Seg

  • Pose Model

    • HRNet

Validated hardware platforms

  • AX630A

  • AX620A/U

ax-sampless directory description

$ tree -L 2
.
├── CMakeLists.txt
├── LICENSE
├── README.md
├── README_EN.md
├── benchmark
│   └── README.md
├── cmake
│   ├── check.cmake
│   └── summary.cmake
├── docs
│   ├── AX620A.md
│   ├── AX620U.md
│   ├── body_seg_bg_res.jpg
│   ├── compile.md
│   ├── seg_res.jpg
│   └── yolov3_paddle.jpg
├── examples
│   ├── CMakeLists.txt
│   ├── README.md
│   ├── ax_classification_accuracy.cc
│   ├── ax_classification_nv12_resize_steps.cc
│   ├── ax_classification_steps.cc
│   ├── ax_crop_resize_nv12.cc
│   ├── ax_hrnet_steps.cc
│   ├── ax_ld_model_mmap.cc
│   ├── ax_models_load_inspect.cc
│   ├── ax_monodlex_steps.cc
│   ├── ax_nanodet_steps.cc
│   ├── ax_paddle_mobilehumseg_steps.cc
│   ├── ax_paddle_mobileseg.cc
│   ├── ax_paddle_yolov3_steps.cc
│   ├── ax_robot_obstacle_detect_steps.cc
│   ├── ax_scrfd_steps.cc
│   ├── ax_yolo_fastest_body_steps.cc
│   ├── ax_yolo_fastest_steps.cc
│   ├── ax_yolov3_accuracy.cc
│   ├── ax_yolov3_steps.cc
│   ├── ax_yolov3_tiny_steps.cc
│   ├── ax_yolov4_steps.cc
│   ├── ax_yolov4_tiny_3l_steps.cc
│   ├── ax_yolov4_tiny_steps.cc
│   ├── ax_yolov5s_620u_steps.cc
│   ├── ax_yolov5s_steps.cc
│   ├── ax_yolov7_steps.cc
│   ├── ax_yoloxs_steps.cc
│   ├── base
│   ├── cv
│   ├── middleware
│   └── utilities
└── toolchains
    ├── aarch64-linux-gnu.toolchain.cmake
    └── arm-linux-gnueabihf.toolchain.cmake

The above directory contains the console Demo for demonstration purposes. On Linux systems, run from the console.

6.2. Compilation examples

ax-samples source code compilation currently has two implementation paths.

  • Native compilation based on AX-Pi, because of the completed software development environment integrated on AX-Pi and the simplicity of operation.

  • Embedded Linux cross-compilation.

6.2.1. Environment preparation

  • cmake version greater than or equal to 3.13

  • AX620A mating cross-compilation toolchain arm-linux-gnueabihf-gxx added to the environment variables

6.2.1.1. Install cmake

There are several ways to install cmake, but in the case of Anaconda virtual environment, you can install it with the following command:

pip install cmake

If it is a non-virtual environment, and the system is Ubuntu, you can install it with

sudo apt-get install cmake

If you have a lower installation version, you can also download source code compilation cmake, as follows:

  • step 1: cmake official website Download cmake and unzip it

  • step 2: Go to the installation folder, and execute

    . /configure
    make -j4 # 4 is the number of cores, you can omit it
    sudo make install
    
  • step 3: cmake After installation, check the version information with the following command

    cmake --version
    

6.2.1.2. Install the cross-compilation tool arm-linux-gnueabihf-gxx

There are various cross-compilers, but we recommend using the Linaro cross-compiler, which can be downloaded from arm-linux-gnueabihf-gxx, You can download the files from arm-linux-gnueabihf-gxx`_, where ``gcc-linaro-7.5.0-2019.12-x86_64_arm-linux-gnueabihf.tar.xz is the 64bit version.

# Create a new folder and move the archive
mkdir -p ~/usr/local/lib
mv gcc-linaro-7.5.0-2019.12-x86_64_arm-linux-gnueabihf.tar ~/usr/local/lib

# Unzip
xz -d gcc-linaro-7.5.0-2019.12-x86_64_arm-linux-gnueabihf.tar.xz
tar -xvf gcc-linaro-7.5.0-2019.12-x86_64_arm-linux-gnueabihf.tar

# Configure environment variables
vim ~/.bashrc
export PATH=$PATH:~/usr/local/lib/gcc-linaro-x86_64_arm-linux-gnueabihf/bin

# The environment takes effect
source ~/.bashrc

6.2.2. Cross-compiling

Download source code

git clone https://github.com/AXERA-TECH/ax-samples.git

3rdparty directory preparation .

3rdparty 目录准备

  • Download the pre-compiled OpenCV library file

  • Create a 3rdparty file in ax-samples and extract the downloaded OpenCV library zip file to that folder.

Dependent Library Preparation

After obtaining the AX620 BSP development package, do the following

  • Download ax-samples cross-compile the repository file and extract it to the specified path ax_bsp, repository get address

$ wget https://github.com/AXERA-TECH/ax-samples/releases/download/v0.3/arm_axpi_r1.22.2801.zip
$ unzip arm_axpi_r1.22.2801.zip -d ax_bsp
source compilation

Go to the ax-samples root directory and create the cmake compilation task

$ mkdir build
$ cd build
$ cmake -DCMAKE_TOOLCHAIN_FILE=../toolchains/arm-linux-gnueabihf.toolchain.cmake -DBSP_MSP_DIR=${ax_bsp}/ ..
$ make install

After compilation, the generated executable examples are stored under the ax-samples/build/install/bin/ path.

ax-samples/build$ tree install
install
└── bin
    ├── ax_classification
    ├── ax_classification_accuracy
    ├── ax_classification_nv12
    ├── ax_cv_test
    ├── ax_hrnet
    ├── ax_models_load_inspect
    ├── ax_monodlex
    ├── ax_nanodet
    ├── ax_paddle_mobilehumseg
    ├── ax_paddle_mobileseg
    ├── ax_paddle_yolov3
    ├── ax_robot_obstacle
    ├── ax_scrfd
    ├── ax_yolo_fastest
    ├── ax_yolo_fastest_body
    ├── ax_yolov3
    ├── ax_yolov3_accuracy
    ├── ax_yolov3_tiny
    ├── ax_yolov4
    ├── ax_yolov4_tiny
    ├── ax_yolov4_tiny_3l
    ├── ax_yolov5s
    ├── ax_yolov5s_620u
    ├── ax_yolov7
    └── ax_yoloxs

6.2.3. Local compilation

6.2.3.1. Hardware requirements

  • AX-Pi (based on AX620A, a cost-effective development board for community developers)

6.2.3.2. Compilation process

git clone Download the source code, go to the ax-samples root directory, and create the cmake compilation task.

$ git clone https://github.com/AXERA-TECH/ax-samples.git
$ cd ax-samples
$ mkdir build
$ cd build
$ cmake ..
$ make install

After compilation, the resulting executable examples are stored under the ax-samples/build/install/bin/ path.

ax-samples/build$ tree install
install
└── bin
    ├── ax_classification
    ├── ax_classification_accuracy
    ├── ax_classification_nv12
    ├── ax_cv_test
    ├── ax_hrnet
    ├── ax_models_load_inspect
    ├── ax_monodlex
    ├── ax_nanodet
    ├── ax_paddle_mobilehumseg
    ├── ax_paddle_mobileseg
    ├── ax_paddle_yolov3
    ├── ax_robot_obstacle
    ├── ax_scrfd
    ├── ax_yolo_fastest
    ├── ax_yolo_fastest_body
    ├── ax_yolov3
    ├── ax_yolov3_accuracy
    ├── ax_yolov3_tiny
    ├── ax_yolov4
    ├── ax_yolov4_tiny
    ├── ax_yolov4_tiny_3l
    ├── ax_yolov5s
    ├── ax_yolov5s_620u
    ├── ax_yolov7
    └── ax_yoloxs

6.3. Run example

Run preparation

Warning

The examples in this section are only ax-samples, and do not provide any models for mobilenetv2 and yolov5s, the following log is for reference only.

Log in to the AX620A development board, and create the samples folder under the root path.

  • Copy the compiled executable examples from build/install/bin/ to the /root/ax-samples/ path;

  • Copy the mobilenetv2.joint or yolov5s.joint model generated by Pulsar to the /root/ax-samples/ path;

  • Copy the test images to the /root/ax-samples/ path.

Attention

Note: The sample code does not provide a detection model such as mobilenetv2.joint, you need to convert it from the open source onnx model.

/root/ax-samples # ls -l
total 40644
-rwx--x--x    1 root     root       3805332 Mar 22 14:01 ax_classification
-rwx--x--x    1 root     root       3979652 Mar 22 14:01 ax_yolov5s
-rw-------    1 root     root        140391 Mar 22 10:39 cat.jpg
-rw-------    1 root     root        163759 Mar 22 14:01 dog.jpg
-rw-------    1 root     root       4299243 Mar 22 14:00 mobilenetv2.joint
-rw-------    1 root     root      29217004 Mar 22 14:04 yolov5s.joint

If the board is running out of space, it can be solved by mounting the board in a folder.

MacOS mount ARM development board example

Hint

Due to the limited space on the board, it is often necessary to share folders when testing, so it is necessary to share the ARM development board with the host computer. Here is an example of MacOS.

The development machine needs the NFS service to mount the ARM development board, while the MacOS system comes with the NFS service, just create the /etc/exports folder, and nfsd will start automatically and be used for exports.

/etc/exports can be configured as follows:

/path/your/sharing/directory -alldirs -maproot=root:wheel -rw -network xxx.xxx.xxx.xxx -mask 255.255.255.0

Parameter Definition

parameter name

Meaning

alldirs

Share all files in the /Users directory, omit if you want to share only one folder

network

IP address of the mounted ARM development board, can be a network segment address

mask

subnet mask, usually 255.255.255.0

maproot

Mapping rules, when maproot=root:wheel means that the root user on the ARM board is mapped to the root user on the development machine, and the root group on the ARM is mapped to the wheel (gid=0) group on the MacOS. If default, you may get a nfsroot link failure error.

rw

Read and write operations, enabled by default

Modifying /etc/exports requires restarting the nfsd service

sudo nfsd restart

If the configuration is successful, you can use the

sudo showmount -e

command to see the mount information, e.g. output /Users/skylake/board_nfs 10.168.21.xx, you need to execute mount command on the ARM side after configuring the development machine

mount -t nfs -o nolock,tcp macos_ip:/your/shared/directory /mnt/directory

If you have permission problems, you need to check if the maproot parameter is correct.

Hint

The network parameter can be configured as a network segment, e.g. 10.168.21.0, if Permission denied occurs when mounting a single ip, you can try mounting within the network segment.

Classification Model

For the classification model, you can run it on the board by executing the ax_classification program.

/root/ax-samples # ./ax_classification -m mobilenetv2.joint -i cat.jpg -r 100
--------------------------------------
model file : mobilenetv2.joint
image file : cat.jpg
img_h, img_w : 224 224
Run-Joint Runtime version: 0.5.10
--------------------------------------
[INFO]: Virtual npu mode is 1_1

Tools version: 0.6.1.14
59588c54
10.8712, 283
10.6592, 285
9.3338, 281
8.8770, 282
8.1893, 356
--------------------------------------
Create handle took 255.04 ms (neu 7.66 ms, axe 0.00 ms, overhead 247.37 ms)
--------------------------------------
Repeat 100 times, avg time 4.17 ms, max_time 4.83 ms, min_time 4.14 ms

Detection models

For the detection model, the post-processor of the corresponding model (e.g. ax_yolov5s) needs to be executed to achieve the correct on-board operation.

/root/ax-samples # ./ax_yolov5s -m yolov5s.joint -i dog.jpg -r 100
--------------------------------------
model file : yolov5s.joint
image file : dog.jpg
img_h, img_w : 640 640
Run-Joint Runtime version: 0.5.10
--------------------------------------
[INFO]: Virtual npu mode is 1_1

Tools version: 0.6.1.14
59588c54
run over: output len 3
--------------------------------------
Create handle took 490.73 ms (neu 22.06 ms, axe 0.00 ms, overhead 468.66 ms)
--------------------------------------
Repeat 100 times, avg time 26.06 ms, max_time 26.83 ms, min_time 26.02 ms
--------------------------------------
detection num: 3
16:  93%, [ 135,  219,  310,  541], dog
2:  80%, [ 466,   77,  692,  172], car
1:  61%, [ 169,  116,  566,  419], bicycle
More information about ax-samples is available at the official github <https://github.com/AXERA-TECH/ax-samples>`_, and more extensive content is provided in the ``ax-samples counterpart ModelZoo.
  • Pre-compiled executable programs (e.g. ax_classification, ax_yolov5s)

  • Sample program run dependent joint models (e.g. mobilenetv2.joint, yolov5s.joint)

  • Test images (e.g. cat.jpg, dog.jpg)