Food recognition can be used in a wide range of applications, such as home appliances (smart fridges, microwave ovens), restaurants, hospitals, or in the food industry. Based on a FD-MobileNet model, the application can recognize 18 different types of food and beverages including pizza, beer, and fries, among many others.
Step-by-step approach
– We used of a camera module (B-CAMS-OMV) to capture the scene
– We selected a pre-trained FD-Mobilenet NN model to perform food recognition
– This model is already integrated in the function pack FP-AI-VISION1 (made for STM32H747 discovery kit)
– The model was then optimized using STM32Cube.AI
Sensor
Vision: camera module bundle (reference: B-CAMS-OMV)
Data
Data format:
– 18 classes: “Apple Pie”, “Beer”, “Caesar Salad”, “Cappuccino”, “Cheesecake”, “Chicken Wings”, “Chocolate Cake”, “Coke”, “Cup Cakes”, “Donuts”, “French Fries”, “Hamburger”, “Hot Dog”, “Lasagna”, “Pizza”, “Risotto”, “Spaghetti Bolognese”, “Steak”
– RGB color image
Results
We provide two different networks, which offer a specific trade-off between inference time and accuracy.
Model: “Standard” Convolutional Neural Network quantized
Input size: 224x224x3
Memory footprint:
132 KB Flash for weights
148 KB RAM for activations
Accuracy: 72.8%
Performance on STM32H747 (High-Perf) @ 400 MHz
Inference time: 79 ms
Frame rate: 11.8 fps
Model: “Optimized” Convolutional Neural Network quantized
Input size: 224x224x3
Memory footprint:
148 KB Flash for weights
199 KB RAM for activations
Accuracy: 77,5%
Performance on STM32H747 (High-Perf) @ 400 MHz
Inference time: 145 ms
Frame rate: 6.6 fps
