YuNet
A Tiny Millisecond-level Face Detector
[pdf:]
- Focus on use for Edge devices
- Objective is to reduce inference speed and memory footprint
- Reducing Parameters
- reduces Memory Footprint
- reduce FLOPs required
Different layers of Deep CNN contribute differently to memory, FLOPs and accuracy. As an example of conventional facedetection network the contribution is as follows:
Layer | Parameters (Memory) | FLOPs | Contribution |
---|---|---|---|
Layer 0 | 0.04% (because less channel) | 3% (because image size is large) | |
Layer 2 | 5% | 25% | 50% |
Layer 4 | 63% (beacause many channels) | 20% (because image size is small) | 10% |
Observation:
- Computation cost is not only directly correlated with number of parameters but with image size too.
- Layer 4 has high parameters but contributes less.
Conclusion:
- focus on small faces (so more parameters in layer 2), larger faces are easy (so less parameters in layer 4)
- use depthwise and pointwise convolution (to reduce parameters count)