SWaP Devices for Edge-Computing

(Credit to work done by David Cornett and Clay Leach)

Description

For most research-based applications of Neural Networks (NNs), the go-to compute device for various network tasks, such as inferencing and training, have been Graphics Processing Units (GPUs). This is because of their relatively high throughput and memory speeds, as well as their ability to do matrix and linear algebra much faster than a traditional CPU. All these factors enable training of NNs to be shortened drastically and the inference speeds of NNs increased to be run at real-time speeds.

The problem with traditional “discrete” GPUs is factors related to their size, weight, and power (SWaP). Discrete GPUs are typically large boards (relative to the size of a ATX case), their extensive cooling systems contribute to a very heavy device, and their aforementioned compute-power also draws a large amount of power. Because of this, they are not suited for edge-based applications.

Field Programmable Gate Arrays (FPGAs) provide an alternative hardware platform for NN inferencing at real-time speeds. This work showed compared the runtime speeds and power usage of different popular object detection and recognition networks. Because of the nature of FPGA computation and memory allocation, networks on the FPGA need to be sparse (quantized and sometimes pruned). This work also analyzed the effect of such sparsification and optimization techniques have on metric performance of the respective networks (Table 2).

System	Physical Dimensions	Physical Weight	Maximum Power	Throughput (batchsize=1)
NVIDIA RTX 2070 Super	267mm x 116mm x 35mm	2,336g	215W	29 FPS
Xilinx ZCU104 Ultrascale +	179mm x 149mm x 1.5mm	1,113g	60W	27 FPS

Comparison of GPU v.s. FPGA considering SWaP (size, weight and power) characteristics in relation to accuracy performance with YOLO version 3 model. Credit: David Cornett Thesis Table 4.6

Implementation	mean Average Precision (mAP)
NVIDIA RTX 2070 Super	78.46%
Xilinx ZCU104 Ultrascale +	77.4%

Comparison of GPU vs. FPGA performance on YOLO version 3. The GPU model leverages 32-bit floating point weights whereas the FPGA model is based on 8-bit integer weights produced by a quantization method. Credit: David Cornett Thesis Table 4.5

The University of Tennessee, Knoxville

Imaging, Robotics, and Intelligent Systems Laboratory Min H. Kao Department of Electrical Engineering and Computer Science

Description