On-Device Deep Learning: PyTorch Mobile and TensorFlow Lite

PyTorch and TensorFlow are the two leading AI/ML Frameworks. In this article, we take a look at their on-device counterparts PyTorch Mobile and TensorFlow Lite and examine them more deeply from the perspective of someone who wishes to develop and deploy models for use on mobile platforms.

comments

By Dhruv Matani, Meta (Facebook) and Gaurav Menghani, Google AI.

We’ll examine both PyTorch Mobile and TensorFlow Lite (TFLite) from the perspective of a user of the frameworks and look at the features and capabilities that each provides along a set of key dimensions such as developer productivity, extensibility, ease of use, hardware support, etc.

Both are constantly evolving AI frameworks, so any information presented here is current only as of this writing.

At a high level, TFLite and TensorFlow are different frameworks that share the same name. PyTorch Mobile and PyTorch are the same framework that share the same codebase.

Model Conversion

Since PyTorch Mobile is the same codebase as PyTorch, once you have a model trained on the server using PyTorch, you simply need to save it for consumption by the PyTorch Mobile Lite Interpreter, and you’re done.

Figure: PyTorch Development Workflow (credit: PyTorch Mobile home page).

Since TensorFlow and TFLite are separate frameworks, one needs to think more carefully about using only the set of supported operators and features when training the model on TensorFlow since not all TensorFlow models can be converted for use on TFLite.

Figure: TFLite Model Conversion workflow (source: TFLite page).

Supported Operators

Every operator (on CPU) that is available on PyTorch is available for use on PyTorch Mobile.

TFLite provides a list of TensorFlow core operations that are supported by TFLite runtime with the Select TensorFlow Ops feature.

Custom Operators

Documentation: Both PyTorch and TFLite support user-defined custom operators for use on mobile platforms. Both frameworks provide comprehensive documentation (PyTorch Mobile and TFLite) regarding the authoring and use of custom operators.

Code Reuse: Since PyTorch Mobile is PyTorch, one can write a custom operator in PyTorch and use it on PyTorch Mobile, whereas for TFLite, one needs to define it twice. See the documentation for custom operators in TensorFlow.

Static Typing: TFLite custom operators are untyped since they rely on a TfLiteContent to fetch inputs and provide outputs. PyTorch custom operators are statically typed using C++.

TFLite Code Snippet

The code below shows the interface that a custom operator must implement in TFLite.

typedef struct {
  void* (*init)(TfLiteContext* context, const char* buffer, size_t length);
  void (*free)(TfLiteContext* context, void* buffer);
  TfLiteStatus (*prepare)(TfLiteContext* context, TfLiteNode* node);
  TfLiteStatus (*invoke)(TfLiteContext* context, TfLiteNode* node);
} TfLiteRegistration;

PyTorch Mobile Code Snippet

The code below shows the interface that a custom operator must implement in PyTorch Mobile.

torch::Tensor warp_perspective(torch::Tensor image, torch::Tensor warp);

Model Visualization

PyTorch Mobile models can be visualized on TensorBoard or torchinfo, and TFLite provides a Model Analyser Tool to analyse TFLite models.

Neutron supports TFLite models and has experimental support for PyTorch models.

Binary Size

PyTorch Mobile doesn’t provide any numbers regarding the size of the build. There are pages on the PyTorch wiki page talking about how the lite-interpreter is more size efficient as compared to the jit interpreter.

The TFLite binary is ~1MB when all 125+ supported operators are linked (for 32-bit ARM builds), and less than 300KB when using only the operators needed for supporting the common image classification models InceptionV3 and MobileNet.

Custom/Size Optimized Build: Both PyTorch Mobile and TFLite support the notion of a size-optimized build which is capable of running only a fixed set of models.

Supported Backends/Accelerators

PyTorch Mobile supports the following backends:

CPU
NNAPI (Android)
CoreML (iOS)
Metal GPU (iOS)
Vulkan (Android)

TFLite supports the following backends via its delegates abstraction:

CPU
Mobile GPU (iOS and Android)
Android NNAPI (Android)
Android Hexagon DSP (Android)
CoreML (iOS)

Figure: TFLite Delegate Abstraction (source: TFLite delegates page).

For each framework, not all models can be optimized for the specialized/accelerated backends, and more details are available on the framework-specific pages.

Model Optimization/Quantization

Both PyTorch Mobile and TFLite support the following types of quantization.

Post Training Quantization
1. Weight Quantization (reduces the model’s size)
2. Activation Quantization (improves running time)
Quantization Aware Training, which includes both weights as well as activations

Operator Compatibility

This is applicable only to TFLite, since TensorFlow and TFLite support a different set of operators. Some operators might be available in TensorFlow, but not in TFLite. It is possible to detect this at the time of model authoring to avoid having to re-construct the model to be compliant with TFLite.

In case users need a new custom operator, they need to implement it for both TensorFlow and TFLite.

Operator Versioning

PyTorch Mobile has no known notion of operator versions, so one needs to ensure that they are running a model only on the PyTorch Mobile runtime the model was built for. TFLite has experimental support for operator versioning, which understands three types of compatibility semantics.

Metadata Support

Both PyTorch Mobile and TFLite offer some level of support for storing additional metadata along with the model files.

TFLite supports adding structured metadata to the model. This includes:

Model information - Overall description of the model as well as items such as license terms. See ModelMetadata.
Input information - Description of the inputs and pre-processing required such as normalization.
Output information - Description of the output and post-processing required such as mapping to labels.

PyTorch Mobile has an API that allows you to add an arbitrary number of “extra files” when saving the model. A code snippet of the specific API defined on the Module class is shown below.

def _save_for_lite_interpreter(self, *args, **kwargs):
    r""" _save_for_lite_interpreter(f)

    Add (or update) the bytecode session to the script model.
    The updated model is used in lite interpreter for mobile
    applications.

    Args:
        f: a string containing a file name.
        _extra_files: Map from filename to contents which
                      will be stored as part of 'f'.
    """
    return self._c._save_for_mobile(*args, **kwargs)

Benchmarking

Both TFLite and PyTorch Mobile provide easy ways to benchmark model execution on a real device. TFLite models can be benchmarked through the benchmark_model tool, which provides a detailed breakdown of latency and RAM consumed by different operations in the model graph on CPU, Android, and iOS. PyTorch also provides a way to benchmark its models for different platforms.

CPU Acceleration

Both PyTorch Mobile and TFLite support CPU acceleration via specialized libraries and routines that are optimized for specific Mobile CPU architectures.

PyTorch Mobile and TFLite use XNNPACK to speed up floating-point operations. Additionally, PyTorch Mobile uses QNNPACK to speed up quantized (integer) operations.

Sparse Tensors and Sparsity

TFLite supports sparse inference via XNNPACK. There are some limitations regarding what kind of subgraphs can be supported, though. It is not clear what level of sparse operator support PyTorch Mobile offers, even though PyTorch itself supports sparse tensors.

Pretrained Models

Both PyTorch Mobile and TFLite provide numerous pre-trained for use out of the box. The PyTorch provided pre-trained models need to be re-saved in the lite-interpreter format before using on mobile platforms (that is a trivial operation).

User Support

Both PyTorch Mobile and TFLite have a dedicated support forum for users to ask questions and get help.

Conclusion

We examined both PyTorch Mobile and TFLite on numerous key axes that are important for an On-Device AI/ML Framework. Both provide numerous high-quality implementations of various features that are important to quickly and efficiently run your ML models on mobile platforms.

References

Bios: Dhruv Matani is a software engineer at Meta (Facebook), where he leads projects related to PyTorch (Open Source AI Framework). He is an expert on PyTorch internals, PyTorch Mobile, and is a significant contributor to PyTorch. His work is impacting billions of users across the world. He has extensive experience building and scaling infrastructure for the Facebook Data Platform. Of note are contributions to Scuba, a realtime data analytics platform at Facebook, used for rapid product and system insights. He has a M.S. in Computer Science from Stony Brook University.

Gaurav Menghani (@GauravML on Twitter, Gaurav Menghani on LinkedIn) is a Staff Software Engineer at Google Research where he leads research projects geared towards optimizing large machine learning models for efficient training and inference on devices ranging from tiny microcontrollers to Tensor Processing Unit (TPU)-based servers.

Related: