Embedded deep learning resources for computer vision.



  1. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications [arXiv ‘17, Google]
  2. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices [arXiv ‘17, Megvii]
  3. NasNet: Learning Transferable Architectures for Scalable Image Recognition [arXiv ‘17, Google]
  4. CondenseNet: An Efficient DenseNet using Learned Group Convolutions [arXiv ‘17]
  5. MobilenetV2: Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation [arXiv ‘18, Google]
  6. DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices [AAAI’18, Samsung]


  1. DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications [MobiSys ‘17]
  2. DeepEye: Resource Efficient Local Execution of Multiple Deep Vision Models using Wearable Commodity Hardware [MobiSys ‘17]
  3. MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU [EMDL ‘17]
  4. DeepSense: A GPU-based deep convolutional neural network framework on commodity mobile devices [WearSys ‘16]
  5. DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices [IPSN ‘16]
  6. EIE: Efficient Inference Engine on Compressed Deep Neural Network [ISCA ‘16]
  7. MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints [MobiSys ‘16]
  8. DXTK: Enabling Resource-efficient Deep Learning on Mobile and Embedded Devices with the DeepX Toolkit [MobiCASE ‘16]
  9. Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables [SenSys ’16]
  10. An Early Resource Characterization of Deep Learning on Wearables, Smartphones and Internet-of-Things Devices [IoT-App ’15]
  11. CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android [MM ‘16]
  12. fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs [NIPS ‘17]

Quantization Techniqes

  1. The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning [ICML’17]
  2. Compressing Deep Convolutional Networks using Vector Quantization [arXiv’14]
  3. Quantized Convolutional Neural Networks for Mobile Devices [CVPR ‘16]
  4. Fixed-Point Performance Analysis of Recurrent Neural Networks [ICASSP’16]
  5. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations [arXiv’16]
  6. Loss-aware Binarization of Deep Networks [ICLR’17]
  7. Towards the Limit of Network Quantization [ICLR’17]
  8. Deep Learning with Low Precision by Half-wave Gaussian Quantization [CVPR’17]
  9. ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks [arXiv’17]
  10. Training and Inference with Integers in Deep Neural Networks [ICLR’18]

Pruning Techniques

  1. Learning both Weights and Connections for Efficient Neural Networks [NIPS’15]
  2. Pruning Filters for Efficient ConvNets [ICLR’17]
  3. Pruning Convolutional Neural Networks for Resource Efficient Inference [ICLR’17]
  4. Soft Weight-Sharing for Neural Network Compression [ICLR’17]
  5. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding [ICLR’16]
  6. Dynamic Network Surgery for Efficient DNNs [NIPS’16]
  7. Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning [CVPR’17]
  8. ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression [ICCV’17]
  9. To prune, or not to prune: exploring the efficacy of pruning for model compression [ICLR’18]


  1. Efficient and Accurate Approximations of Nonlinear Convolutional Networks [CVPR’15]
  2. Accelerating Very Deep Convolutional Networks for Classification and Detection (Extended version of above one)
  3. Convolutional neural networks with low-rank regularization [arXiv’15]
  4. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation [NIPS’14]
  5. Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications [ICLR’16]
  6. High performance ultra-low-precision convolutions on mobile devices [NIPS’17]

Design Space Exploration

  1. NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications [arXiv’18, Google]
  2. Latency and Throughput Characterization of Convolutional Neural Networks for Mobile Computer Vision [MMSys’18]


  1. fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs [NIPS ‘17]


  1. A Survey of Model Compression and Acceleration for Deep Neural Networks [arXiv ‘17]



  1. XiaoMi/mace: MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
  2. Tencent/ncnn: ncnn is a high-performance neural network inference framework optimized for the mobile platform
  3. baidu/mobile-deep-learning: This research aims at simply deploying CNN(Convolutional Neural Network) on mobile devices, with low complexity and high speed.
  4. ARM-software/ComputeLibrary: The ARM Computer Vision and Machine Learning library is a set of functions optimised for both ARM CPUs and GPUs using SIMD technologies, Intro
  5. Apple CoreML
  6. xmartlabs/Bender: Easily craft fast Neural Networks on iOS! Use TensorFlow models. Metal under the hood.
  7. Snapdragon Neural Processing Engine
  8. Microsoft Embedded Learning Library
  9. MXNet Amalgamation
  10. TensorFlow on Android
  11. TensorFlow Lite
  12. OAID/Tengine: Tengine is a lite, high performance, modular inference engine for embedded device
  13. RSTensorFlow: GPU Accelerated TensorFlow for Commodity Android Devices


  1. mil-tokyo/webdnn: Fastest DNN Execution Framework on Web Browser



  1. Squeezing Deep Learning Into Mobile Phones
  2. Deep Learning – Tutorial and Recent Trends
  3. Tutorial on Hardware Architectures for Deep Neural Networks
  4. Efficient Convolutional Neural Network Inference on Mobile GPUs