# BERT 部署手册

本文档介绍了如何在墨芯 AI 加速卡上使用 SOLA 部署 BERT 模型。

## 简介

BERT (Bidirectional Encoder Representations from Transformers) 模型参考 [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) 。本程序所使用的 BERT 是基于 [Google's official implementation](https://github.com/google-research/bert) 的优化版本。

## 模型结构

| **Model** | **Hidden layers** | **Hidden unit size** | **Attention heads** | **Feed-forward filter size** | **Max sequence length** | **Parameters** |
|:---------:|:----------:|:----:|:---:|:--------:|:---:|:----:|
|BERT-Base |12 encoder| 768| 12|4 x  768|512|110M|
|BERT-Large|24 encoder|1024| 16|4 x 1024|512|330M|

本程序使用的 BERT 模型信息：
- model: BERT-Large & BERT-Base
- batch size: 32
- data type: MixInt8Bf16

## 流程介绍

部署模型分为四个步骤：下载、编译、运行、验证，每个步骤都提供了对应的默认脚本。

下载：`prepare.sh`

编译：`build.sh`

运行：`run.sh`

验证：`verify.sh`

你也可以参考以下步骤手动部署模型。

### 部署步骤

1. 下载模型和数据集

   可以执行以下脚本下载模型依赖，下载后会自动进行预处理：
    ```bash
    ./prepare.sh
    ```
   若脚本执行失败，也可以通过以下链接手动下载并解压：
    ```text
    https://moffett-oss-bucket01.oss-cn-shenzhen.aliyuncs.com/sola-demo/bert/bert_demo.tar.gz
    ```
   并手动执行预处理命令（需要 python 环境，激活方式见主目录 README.md）
    ```bash
    mkdir -p data/tmp/
    python -s scripts/pre_process.py --input_path=data/bert_demo/ --output_path=data/tmp/
    ```

2. 编译部署代码

    ```bash
    ./build.sh
    ```

3. 运行

   可以通过运行 `run.sh` 脚本来运行，可以传入 `bert_base` 或 `bert_large` 参数来表明执行的模型，默认运行 bert base 模型：
    ```bash
    # 默认运行 bert base
    ./run.sh
    # 指定运行 bert base
    ./run.sh bert_base
    # 指定运行 bert large
    ./run.sh bert_large
    ```
   或者按照以下指令运行：
   ```bash
   usage: ./build/bert --module=string --inputs=string --outputs=string [options] ... 
   options:
     -m, --module         module file path (string)
     -i, --inputs         input dir path, with input_x inside (string)
     -o, --outputs        result saving dir (string)
     -d, --devices        select devices (string [=all])
     -f, --format         running mode: broadcast / split (string [=broadcast])
     -c, --count          count of duplication (int [=1])
     -b, --batch          run batch size (int [=32])
     -v, --verify_path    save verification result json path (string [=])
     -?, --help           print this message
   ```
   其中`-d`可以指定运行设备（使用`,`分割），`-f`指定运行模式（broadcast/split），`-c`指定运行次数，`-b`指定需要运行的 batch size（32的倍数），如：
   ```bash
   # 在 device 0 上按照 batch size 32 进行 bert base 模型推理
   ./build/bert -m data/bert_demo/bert_base.bin -i data/tmp/ -o data/tmp/ -d 0 -f broadcast -c 1 -b 32 -v data/verification.json
   # 在 device 0 上按照 batch size 64 进行 bert large 模型推理
   ./build/bert -m data/bert_demo/bert_large.bin -i data/tmp/ -o data/tmp/ -d 0 -f broadcast -c 1 -b 64 -v data/verification.json
   # 在 device 0,1,2 上按照 batch size 32 进行 bert large 模型推理
   ./build/bert -m data/bert_demo/bert_large.bin -i data/tmp/ -o data/tmp/ -d 0,1,2 -f broadcast -c 1 -b 32 -v data/verification.json
   ```
   推理输出以二进制文件形式保存到`-o`指定的路径下，根据设备id，保存文件名为`output_<id>`，然后参考 `run.sh` 中的命令执行精度验证的脚本。

4. 验证结果

    ```bash
    ./verify.sh
    ```

### 测试结果参考

| model      | data type   | batch size | accuracy                                                   | performance |
|------------|-------------|------------|------------------------------------------------------------|-------------|
| BERT-Base  | MixInt8Bf16 | 32         | {"exact_match": 83.66130558183538, "f1": 90.8575190748761} | 2101 FPS    |
| BERT-Base  | MixInt8Bf16 | 64         | {"exact_match": 83.66130558183538, "f1": 90.8575190748761} | 2101 FPS    |
| BERT-Base  | MixInt8Bf16 | 128        | {"exact_match": 83.66130558183538, "f1": 90.8575190748761} | 2107 FPS    |
| BERT-Base  | MixInt8Bf16 | 256        | {"exact_match": 83.66130558183538, "f1": 90.8575190748761} | 2107 FPS    |
| BERT-Large | MixInt8Bf16 | 32         | {"exact_match": 83.74645222327341, "f1": 90.9330076613153} | 1009 FPS    |
| BERT-Large | MixInt8Bf16 | 64         | {"exact_match": 83.74645222327341, "f1": 90.9330076613153} | 1008 FPS    |
| BERT-Large | MixInt8Bf16 | 128        | {"exact_match": 83.74645222327341, "f1": 90.9330076613153} | 1009 FPS    |
| BERT-Large | MixInt8Bf16 | 256        | {"exact_match": 83.74645222327341, "f1": 90.9330076613153} | 1009 FPS    |