Goal: Create a script that allows us to estimate latency time for each layer for the TensorRT models.

Approach

There are multiple ways to convert a PyTorch model to TensorRT model. The approaches can be grouped under following 2 categories

  1. PyTorch → TensorRT
  2. PyTorch → ONNX → TensorRT

PyTorch → TensorRT

There are two approaches for this path.

  1. Torch-TensorRT (BSD-3-Clause) This library is created and maintained by PyTorch team. It is licensed under BSD-3-Clause. Jetson Nano support: Dusty provides a jetson container for torch_tensorrt library. It is compatible with L4T ['>=32.6']. Object Detection: I am speculating that this use case should be supported given extensive breadth of use-cases outlined in Model Zoo section.

  2. Torch2trt (MIT) This library is (was) created by Nvidia Team licensed under MIT. The activity on the repo has decreased over past couple of months. It uses TensorRT Python API to convert PyTorch to TensorRT.

    Jetson Nano support: This library can be built using the repo or NGC for almost all Jetpack versions.

    Object Detection: Looking around found a relevant issue on github discussing this. Their recommendation is to use 4GB Jetson Nano for object detection.

Recommendation: I would suggest experimenting with only Torch-TensorRT library as the second library has limited support.

PyTorch → ONNX → TensorRT

This approach is quite popular looking at supported approaches. The first step is common and required for all frameworks. It involves converting PyTorch model to ONNX model.

Approach for converting PyTorch → ONNX

  1. PyTorch (Custom License) PyTorch library provides out of the box solution (one-liner) for converting PyTorch to ONNX. ONNX model can be converted on any platform. It does not have to be Jetson device.

    Object Detection: Torch library supports converting object detection models to ONNX format.

Approach for converting ONNX → TensorRT

  1. TensorRT (Apache 2.0) This is most common approach for converting ONNX model to TensorRT engine using TensorRT Python API. This is achieved using ONNXParser. On top of this, we write a custom inference script that takes input TensorRT engine and provides a way to get output for a given input. Jetson Nano support: This library comes pre-built as part of Jetpack. Almost always we should try to get latest TensorRT supported Jetpack as it supports more operators. Object Detection: TensorRT library supports inference for object detection models.

  2. trtexec (Apache 2.0) Nvidia provides a CLI tool for benchmarking and creating TensorRT engine from ONNX models. It provides latency for each layer as part of it’s profiler (Code). We have to write a custom inference script for the TensorRT engine. The way to do this is to use TensorRT Python API mentioned above. Jetson Nano support: This library comes pre-built as part of Jetpack. Object Detection: trtexec tool works with object detection models.

  3. onnx-tensorrt (Apache 2.0) This library is maintained by ONNX team. It provides an alternate approach to TensorRT Python API for running inference using TensorRT model. Jetson Nano support: It can be installed on Jetson Nano. (Reference and here looks more invovled)

    Object Detection: Torch library supports converting object detection models to ONNX format.

Recommendation: I would suggest experimenting with Torch library for converting PyTorch to ONNX. For ONNX to TensorRT, I would suggest experimenting with both TensorRT library and trtexec CLI tool. CLI tool is more for a validation purpose and not for production. We can see if latency times are roughly the same using TensorRT library and trtexec CLI tool.