EMDL
Embedded deep learning resources for computer vision.
Papers
Models
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications [arXiv ‘17, Google]
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices [arXiv ‘17, Megvii]
- NasNet: Learning Transferable Architectures for Scalable Image Recognition [arXiv ‘17, Google]
- CondenseNet: An Efficient DenseNet using Learned Group Convolutions [arXiv ‘17]
- MobilenetV2: Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation [arXiv ‘18, Google]
- DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices [AAAI’18, Samsung]
Frameworks
- DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications [MobiSys ‘17]
- DeepEye: Resource Efficient Local Execution of Multiple Deep Vision Models using Wearable Commodity Hardware [MobiSys ‘17]
- MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU [EMDL ‘17]
- DeepSense: A GPU-based deep convolutional neural network framework on commodity mobile devices [WearSys ‘16]
- DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices [IPSN ‘16]
- EIE: Efficient Inference Engine on Compressed Deep Neural Network [ISCA ‘16]
- MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints [MobiSys ‘16]
- DXTK: Enabling Resource-efficient Deep Learning on Mobile and Embedded Devices with the DeepX Toolkit [MobiCASE ‘16]
- Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables [SenSys ’16]
- An Early Resource Characterization of Deep Learning on Wearables, Smartphones and Internet-of-Things Devices [IoT-App ’15]
- CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android [MM ‘16]
- fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs [NIPS ‘17]
Quantization Techniqes
- The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning [ICML’17]
- Compressing Deep Convolutional Networks using Vector Quantization [arXiv’14]
- Quantized Convolutional Neural Networks for Mobile Devices [CVPR ‘16]
- Fixed-Point Performance Analysis of Recurrent Neural Networks [ICASSP’16]
- Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations [arXiv’16]
- Loss-aware Binarization of Deep Networks [ICLR’17]
- Towards the Limit of Network Quantization [ICLR’17]
- Deep Learning with Low Precision by Half-wave Gaussian Quantization [CVPR’17]
- ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks [arXiv’17]
- Training and Inference with Integers in Deep Neural Networks [ICLR’18]
Pruning Techniques
- Learning both Weights and Connections for Efficient Neural Networks [NIPS’15]
- Pruning Filters for Efficient ConvNets [ICLR’17]
- Pruning Convolutional Neural Networks for Resource Efficient Inference [ICLR’17]
- Soft Weight-Sharing for Neural Network Compression [ICLR’17]
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding [ICLR’16]
- Dynamic Network Surgery for Efficient DNNs [NIPS’16]
- Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning [CVPR’17]
- ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression [ICCV’17]
- To prune, or not to prune: exploring the efficacy of pruning for model compression [ICLR’18]
Approximation
- Efficient and Accurate Approximations of Nonlinear Convolutional Networks [CVPR’15]
- Accelerating Very Deep Convolutional Networks for Classification and Detection (Extended version of above one)
- Convolutional neural networks with low-rank regularization [arXiv’15]
- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation [NIPS’14]
- Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications [ICLR’16]
- High performance ultra-low-precision convolutions on mobile devices [NIPS’17]
Design Space Exploration
- NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications [arXiv’18, Google]
- Latency and Throughput Characterization of Convolutional Neural Networks for Mobile Computer Vision [MMSys’18]
CNN-to-FPGAs
- fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs [NIPS ‘17]
Survey
Libraries
General
- XiaoMi/mace: MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
- Tencent/ncnn: ncnn is a high-performance neural network inference framework optimized for the mobile platform
- baidu/mobile-deep-learning: This research aims at simply deploying CNN(Convolutional Neural Network) on mobile devices, with low complexity and high speed.
- ARM-software/ComputeLibrary: The ARM Computer Vision and Machine Learning library is a set of functions optimised for both ARM CPUs and GPUs using SIMD technologies, Intro
- Apple CoreML
- xmartlabs/Bender: Easily craft fast Neural Networks on iOS! Use TensorFlow models. Metal under the hood.
- Snapdragon Neural Processing Engine
- Microsoft Embedded Learning Library
- MXNet Amalgamation
- TensorFlow on Android
- TensorFlow Lite
- OAID/Tengine: Tengine is a lite, high performance, modular inference engine for embedded device
- RSTensorFlow: GPU Accelerated TensorFlow for Commodity Android Devices
Web
Tutorials
General
- Squeezing Deep Learning Into Mobile Phones
- Deep Learning – Tutorial and Recent Trends
- Tutorial on Hardware Architectures for Deep Neural Networks
- Efficient Convolutional Neural Network Inference on Mobile GPUs