Evidence Factory - Synthesis

Research Question

How does model quantization affect system resource efficiency and correctness when deploying DL systems?

Search String

(( "machine learning" OR "ML" OR "deep learning" OR "DL" OR "large language model" OR "LLM?" OR "neural network" OR "?NN" OR "fundational model" OR "agent" ) AND ( "quantization" OR "quantize" OR "quantized" ) AND ( "energy consumption" OR "energy efficien*" OR "sustain*" OR "carbon footprint" OR "carbon emission" ) AND NOT ( "FL" OR "federated learning" ) )

Inclusion Criteria

The study regards the application of model quantization to optimize a DL model.
The study regards the environmental sustainability and/or energy efficiency of applying model quantization.
The study analyzes the application of model quantization for model inference.
The study regards the application of model quantization at the software level.
The study controls the factors in each trial avoiding free variation among different runs.

Exclusion Criteria

The study combines model quantization with other optimization techniques.
The study does not report a non-quantized baseline.
The study is a secondary or tertiary study.
The study is not written in English.
The study is in the form of editorials, tutorials, books, extended abstracts, and so on.

Impact of Memory Voltage Scaling on Accuracy and Resilience of Deep Learning Based Edge Devices (fxp_4_8 evidence)
POQ: Is There a Pareto-Optimal Quantization Strategy for Deep Neural Networks? (w8a8 evidence)
Experimental Energy Consumption Analysis of Neural Network Model Compression Methods on Microcontrollers with Applications in Bird Call Classification (weights fp16 - activations int8 evidence)
Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks (q0.4 evidence)
Language Models in Software Development Tasks: An Experimental Analysis of Energy and Accuracy (8bit evidence)
Energy-Efficient Respiratory Anomaly Detection in Premature Newborn Infants (16-bit evidence)
Activation Density Based Mixed-Precision Quantization for Energy Efficient Neural Networks
Q_YOLOv5m: A Quantization-based Approach for Accelerating Object Detection on Embedded Platforms (w8a8 QAT evidence)
Energy Cost Modelling for Optimizing Large Language Model Inference on Hardware Accelerators (4-bit evidence)
POQ: Is There a Pareto-Optimal Quantization Strategy for Deep Neural Networks? (w2a2 evidence)
POQ: Is There a Pareto-Optimal Quantization Strategy for Deep Neural Networks? (full-int4 evidence)
Impact of ML Optimization Tactics on Greener Pre-Trained ML Models
Experimental Energy Consumption Analysis of Neural Network Model Compression Methods on Microcontrollers with Applications in Bird Call Classification (weights int8 - activations int8 evidence)
Experimental Energy Consumption Analysis of Neural Network Model Compression Methods on Microcontrollers with Applications in Bird Call Classification (weights fp32 - activations int8 evidence)
Implementing Deep Neural Networks on ARM-Based Microcontrollers: Application for Ventricular Fibrillation Detection (full model int8 evidence)
Impact of Memory Voltage Scaling on Accuracy and Resilience of Deep Learning Based Edge Devices (fxp_4_32 evidence)
Energy Cost Modelling for Optimizing Large Language Model Inference on Hardware Accelerators (1-bit evidence)
Verifiable and Energy Efficient Medical Image Analysis with Quantised Self-attentive Deep Neural Networks
Efficient Expiration Date Recognition in Food Packages for Mobile Applications (w8a8 evidence)
POQ: Is There a Pareto-Optimal Quantization Strategy for Deep Neural Networks? (w-int4, a-int8 evidence)
POQ: Is There a Pareto-Optimal Quantization Strategy for Deep Neural Networks? (w-int2, a-int4 evidence)
Energy-Efficient Respiratory Anomaly Detection in Premature Newborn Infants (4-bit evidence)
Implementing Deep Neural Networks on ARM-Based Microcontrollers: Application for Ventricular Fibrillation Detection (w8a8 evidence)
Experimental Energy Consumption Analysis of Neural Network Model Compression Methods on Microcontrollers with Applications in Bird Call Classification (weights int8 - activations fp32 evidence)
Energy-Efficient Respiratory Anomaly Detection in Premature Newborn Infants (2-bit evidence)
Edge AI-Powered System Architecture for Aloe Vera Plant Disease Detection
Energy-Efficient Deep Learning for Cloud Detection Onboard Nanosatellite
Experimental Energy Consumption Analysis of Neural Network Model Compression Methods on Microcontrollers with Applications in Bird Call Classification (weights fp32 - activations fp16 evidence)
Language Models in Software Development Tasks: An Experimental Analysis of Energy and Accuracy (4bit evidence)
Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks (q0.32 evidence)
Quantized Object Detection for Real-Time Inference on Embedded GPU Architectures (int8 evidence)
Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks (q0.16 evidence)
Energy-Efficient Respiratory Anomaly Detection in Premature Newborn Infants (32-bit evidence)
Q_YOLOv5m: A Quantization-based Approach for Accelerating Object Detection on Embedded Platforms (w8a8 PTQ evidence)
Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks (int1 evidence)
POQ: Is There a Pareto-Optimal Quantization Strategy for Deep Neural Networks? (w4a4 evidence)
Experimental Energy Consumption Analysis of Neural Network Model Compression Methods on Microcontrollers with Applications in Bird Call Classification (weights int8 - activations fp16 evidence)
Experimental Energy Consumption Analysis of Neural Network Model Compression Methods on Microcontrollers with Applications in Bird Call Classification (weights fp16 - activations fp32 evidence)
Impact of Memory Voltage Scaling on Accuracy and Resilience of Deep Learning Based Edge Devices (fxp_8_16 evidence)
Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks (int6 evidence)
Experimental Energy Consumption Analysis of Neural Network Model Compression Methods on Microcontrollers with Applications in Bird Call Classification (weights fp16 - activations fp16 evidence)
POQ: Is There a Pareto-Optimal Quantization Strategy for Deep Neural Networks? (w-int2, a-int8 evidence)
Efficient Expiration Date Recognition in Food Packages for Mobile Applications (w16a16 evidence)
POQ: Is There a Pareto-Optimal Quantization Strategy for Deep Neural Networks? (full-int8 evidence)
Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks (q0.8 evidence)
Energy-Efficient Respiratory Anomaly Detection in Premature Newborn Infants (8-bit evidence)
Quantized Object Detection for Real-Time Inference on Embedded GPU Architectures (fp16 evidence)

Experimental Energy Consumption Analysis of Neural Network Model Compression Methods on Microcontrollers with Applications in Bird Call Classification

Study Type : Quasi Experiment
Quality evaluation
Evidence

QUANOS: Adversarial Noise Sensitivity Driven Hybrid Quantization of Neural Networks (5bit evidence)

Study Type : Observational
Quality evaluation
Evidence

Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks (int1 evidence)

Study Type : Observational
Quality evaluation
Evidence

Energy-Efficient Respiratory Anomaly Detection in Premature Newborn Infants (8-bit evidence)

Study Type : Observational
Quality evaluation
Evidence

Impact of ML Optimization Tactics on Greener Pre-Trained ML Models

Study Type : Quasi Experiment
Quality evaluation
Evidence

Experimental Energy Consumption Analysis of Neural Network Model Compression Methods on Microcontrollers with Applications in Bird Call Classification (weights fp16 - activations int8 evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Verifiable and Energy Efficient Medical Image Analysis with Quantised Self-attentive Deep Neural Networks

Study Type : Observational
Quality evaluation
Evidence

Green My LLM: Studying the Key Factors Affecting the Energy Consumption of Code Assistants

Study Type : Quasi Experiment
Quality evaluation
Evidence

Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks (q0.8 evidence)

Study Type : Observational
Quality evaluation
Evidence

UAV-deployed Deep Learning Network for Real-Time Multi-Class Damage Detection Using Model Quantization Techniques (INT8 PTQ evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Activation Density Based Mixed-Precision Quantization for Energy Efficient Neural Networks

Study Type : Observational
Quality evaluation
Evidence

Optimizing Convolutional Neural Networks for IoT Devices: Performance and Energy Efficiency of Quantization Techniques (fp16 PTQ evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Q_YOLOv5m: A Quantization-based Approach for Accelerating Object Detection on Embedded Platforms (w8a8 PTQ evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Energy Cost Modelling for Optimizing Large Language Model Inference on Hardware Accelerators

Study Type : Non Systematic
Quality evaluation
Evidence

QUANOS: Adversarial Noise Sensitivity Driven Hybrid Quantization of Neural Networks (8bit evidence)

Study Type : Observational
Quality evaluation
Evidence

POQ: Is There a Pareto-Optimal Quantization Strategy for Deep Neural Networks? (w-int4, a-int8 evidence)

Study Type : Observational
Quality evaluation
Evidence

Optimization Strategies for Enhancing Resource Efficiency in Transformers & Large Language Models

Study Type : Quasi Experiment
Quality evaluation
Evidence

POQ: Is There a Pareto-Optimal Quantization Strategy for Deep Neural Networks? (full-int4 evidence)

Study Type : Observational
Quality evaluation
Evidence

Q_YOLOv5m: A Quantization-based Approach for Accelerating Object Detection on Embedded Platforms (w8a8 QAT evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Experimental Energy Consumption Analysis of Neural Network Model Compression Methods on Microcontrollers with Applications in Bird Call Classification (weights int8 - activations fp32 evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Efficient Expiration Date Recognition in Food Packages for Mobile Applications (w8a8 evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Efficient Expiration Date Recognition in Food Packages for Mobile Applications (w16a16 evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Experimental Energy Consumption Analysis of Neural Network Model Compression Methods on Microcontrollers with Applications in Bird Call Classification (weights fp32 - activations int8 evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Impact of Memory Voltage Scaling on Accuracy and Resilience of Deep Learning Based Edge Devices (fxp_4_32 evidence)

Study Type : Observational
Quality evaluation
Evidence

UAV-deployed Deep Learning Network for Real-Time Multi-Class Damage Detection Using Model Quantization Techniques (INT8 partial QAT evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Experimental Energy Consumption Analysis of Neural Network Model Compression Methods on Microcontrollers with Applications in Bird Call Classification (weights int8 - activations fp16 evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Optimizing Convolutional Neural Networks for IoT Devices: Performance and Energy Efficiency of Quantization Techniques (int8 QAT evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Optimizing Convolutional Neural Networks for IoT Devices: Performance and Energy Efficiency of Quantization Techniques (int8 PTQ evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Energy-Efficient Respiratory Anomaly Detection in Premature Newborn Infants

Study Type : Non Systematic
Quality evaluation
Evidence

QUANOS: Adversarial Noise Sensitivity Driven Hybrid Quantization of Neural Networks (QUANOS evidence)

Study Type : Observational
Quality evaluation
Evidence

Energy Cost Modelling for Optimizing Large Language Model Inference on Hardware Accelerators (4-bit evidence)

Study Type : Non Systematic
Quality evaluation
Evidence

Energy-Efficient Respiratory Anomaly Detection in Premature Newborn Infants (4-bit evidence)

Study Type : Observational
Quality evaluation
Evidence

Energy-Efficient Respiratory Anomaly Detection in Premature Newborn Infants (32-bit evidence)

Study Type : Observational
Quality evaluation
Evidence

Experimental Energy Consumption Analysis of Neural Network Model Compression Methods on Microcontrollers with Applications in Bird Call Classification (weights fp16 - activations fp32 evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Implementing Deep Neural Networks on ARM-Based Microcontrollers: Application for Ventricular Fibrillation Detection (full model int8 evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Impact of Memory Voltage Scaling on Accuracy and Resilience of Deep Learning Based Edge Devices (fxp_4_8 evidence)

Study Type : Observational
Quality evaluation
Evidence

Experimental Energy Consumption Analysis of Neural Network Model Compression Methods on Microcontrollers with Applications in Bird Call Classification (weights fp16 - activations fp16 evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Experimental Energy Consumption Analysis of Neural Network Model Compression Methods on Microcontrollers with Applications in Bird Call Classification (weights fp32 - activations fp16 evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Experimental Energy Consumption Analysis of Neural Network Model Compression Methods on Microcontrollers with Applications in Bird Call Classification (weights int8 - activations int8 evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Energy-Efficient Respiratory Anomaly Detection in Premature Newborn Infants (2-bit evidence)

Study Type : Observational
Quality evaluation
Evidence

UAV-deployed Deep Learning Network for Real-Time Multi-Class Damage Detection Using Model Quantization Techniques (half-precision training evidence)

Study Type : Quasi Experiment
Quality evaluation

Language Models in Software Development Tasks: An Experimental Analysis of Energy and Accuracy (8bit evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Quantized Object Detection for Real-Time Inference on Embedded GPU Architectures (int8 evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Energy-Efficient Respiratory Anomaly Detection in Premature Newborn Infants (16-bit evidence)

Study Type : Observational
Quality evaluation
Evidence

Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks (q0.4 evidence)

Study Type : Observational
Quality evaluation
Evidence

Language Models in Software Development Tasks: An Experimental Analysis of Energy and Accuracy (4bit evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Energy Efficiency of~Deep Learning Compression Techniques in~Wearable Human Activity Recognition

Study Type : Observational
Quality evaluation
Evidence

POQ: Is There a Pareto-Optimal Quantization Strategy for Deep Neural Networks? (full-int8 evidence)

Study Type : Observational
Quality evaluation
Evidence

Optimization Strategies for Enhancing Resource Efficiency in Transformers & Large Language Models (INT8 evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Energy-Efficient Deep Learning for Cloud Detection Onboard Nanosatellite

Study Type : Quasi Experiment
Quality evaluation
Evidence

Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks (q0.16 evidence)

Study Type : Observational
Quality evaluation
Evidence

Green My LLM: Studying the Key Factors Affecting the Energy Consumption of Code Assistants (BitsAndBytes FP4 evidence

Study Type : Quasi Experiment
Quality evaluation
Evidence

UAV-deployed Deep Learning Network for Real-Time Multi-Class Damage Detection Using Model Quantization Techniques (INT8 QAT evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Implementing Deep Neural Networks on ARM-Based Microcontrollers: Application for Ventricular Fibrillation Detection (w8a8 evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Edge AI-Powered System Architecture for Aloe Vera Plant Disease Detection

Study Type : Quasi Experiment
Quality evaluation
Evidence

Energy Cost Modelling for Optimizing Large Language Model Inference on Hardware Accelerators (1-bit evidence)

Study Type : Non Systematic
Quality evaluation
Evidence

POQ: Is There a Pareto-Optimal Quantization Strategy for Deep Neural Networks? (w4a4 evidence)

Study Type : Observational
Quality evaluation
Evidence

Language Models in Software Development Tasks: An Experimental Analysis of Energy and Accuracy

Study Type : Quasi Experiment
Quality evaluation
Evidence

A Methodological Framework for Optimizing the Energy Consumption of Deep Neural Networks: A Case Study of a Cyber Threat Detector

Study Type : Observational
Quality evaluation
Evidence

Impact of Memory Voltage Scaling on Accuracy and Resilience of Deep Learning Based Edge Devices (fxp_8_16 evidence)

Study Type : Observational
Quality evaluation
Evidence

POQ: Is There a Pareto-Optimal Quantization Strategy for Deep Neural Networks? (w-int2, a-int4 evidence)

Study Type : Observational
Quality evaluation
Evidence

Green My LLM: Studying the Key Factors Affecting the Energy Consumption of Code Assistants (BitsAndBytes NF4 evidence

Study Type : Quasi Experiment
Quality evaluation
Evidence

Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks (q0.32 evidence)

Study Type : Observational
Quality evaluation
Evidence

Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks (int6 evidence)

Study Type : Observational
Quality evaluation
Evidence

POQ: Is There a Pareto-Optimal Quantization Strategy for Deep Neural Networks? (w2a2 evidence)

Study Type : Observational
Quality evaluation
Evidence

QUANOS: Adversarial Noise Sensitivity Driven Hybrid Quantization of Neural Networks (4bit evidence)

Study Type : Observational
Quality evaluation
Evidence

POQ: Is There a Pareto-Optimal Quantization Strategy for Deep Neural Networks? (w-int2, a-int8 evidence)

Study Type : Observational
Quality evaluation
Evidence

POQ: Is There a Pareto-Optimal Quantization Strategy for Deep Neural Networks? (w8a8 evidence)

Study Type : Observational
Quality evaluation
Evidence

Green My LLM: Studying the Key Factors Affecting the Energy Consumption of Code Assistants (EETQ INT8 evidence

Study Type : Quasi Experiment
Quality evaluation
Evidence

Optimization Strategies for Enhancing Resource Efficiency in Transformers & Large Language Models (FP4 evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Quantized Object Detection for Real-Time Inference on Embedded GPU Architectures (fp16 evidence)

Study Type : Quasi Experiment
Quality evaluation
Evidence

Examples

Post-training quantization from FP32 to W-INT8, A-INT8

Full aggregation v2

Full aggregation

Research question

Proposed theory: Model quantization causes positive effects in DL systems’ resource efficiency. Strongly positive effects are observed in storage size and GPU energy consumption. Inference power draw is weakly positively affected while {indiferent - weakly positive} effects are observed for GPU power draw and inference latency. Model quantization also causes weakly negative effects on accuracy.

Effects of model quantization for deploying deep learning systems

Definition

Research Question

Search String

Inclusion Criteria

Exclusion Criteria

Papers

Evidence

Aggregated Evidence

Conclusion

Research question

Full aggregation