TY - GEN
T1 - ConvMTL
T2 - 8th International Conference on Computer Vision and Image Processing, CVIP 2023
AU - Iyer, Vijayasri
AU - Thangavel, Senthil Kumar
AU - Nalluri, Madhusudana Rao
AU - Chang, Maiga
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
PY - 2024
Y1 - 2024
N2 - Perception systems in autonomous vehicles are required to perform multiple scene-understanding tasks under tight constraints of latency and power. Single-task neural networks can become unscalable when the number of tasks increases in the perception stack. Multi-task learning has been shown to improve parameter efficiency and enable models to learn more generalizable task representations compared to single-task neural networks. This work explores a novel convolutional multi-task neural network architecture that simultaneously performs two dense prediction tasks, semantic segmentation and depth estimation. A self-supervised ResNet-50 backbone is used as the basis of the proposed network, along with a multi-scale feature fusion module and a dense decoder. The model uses a simple weighted loss function with an informed search algorithm identifying the optimal parameters. The performance of the proposed model on the segmentation task is assessed using the mean Intersection of Union (mIoU) and pixel accuracy. In contrast, absolute and relative errors assess the depth estimation task. The obtained results for segmentation and depth estimation are mIoU of 73.81%, pixel accuracy of 93.52%, an absolute error of 0.130, and a relative error of 29.05. The model’s performance is comparable to existing multitask algorithms on the Cityscapes dataset, using only 2975 training samples.
AB - Perception systems in autonomous vehicles are required to perform multiple scene-understanding tasks under tight constraints of latency and power. Single-task neural networks can become unscalable when the number of tasks increases in the perception stack. Multi-task learning has been shown to improve parameter efficiency and enable models to learn more generalizable task representations compared to single-task neural networks. This work explores a novel convolutional multi-task neural network architecture that simultaneously performs two dense prediction tasks, semantic segmentation and depth estimation. A self-supervised ResNet-50 backbone is used as the basis of the proposed network, along with a multi-scale feature fusion module and a dense decoder. The model uses a simple weighted loss function with an informed search algorithm identifying the optimal parameters. The performance of the proposed model on the segmentation task is assessed using the mean Intersection of Union (mIoU) and pixel accuracy. In contrast, absolute and relative errors assess the depth estimation task. The obtained results for segmentation and depth estimation are mIoU of 73.81%, pixel accuracy of 93.52%, an absolute error of 0.130, and a relative error of 29.05. The model’s performance is comparable to existing multitask algorithms on the Cityscapes dataset, using only 2975 training samples.
KW - Autonomous Driving
KW - Computer Vision
KW - Deep Learning
KW - Multi-task Learning
KW - Transfer Learning
UR - http://www.scopus.com/inward/record.url?scp=85200353914&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-58181-6_38
DO - 10.1007/978-3-031-58181-6_38
M3 - Published Conference contribution
AN - SCOPUS:85200353914
SN - 9783031581809
T3 - Communications in Computer and Information Science
SP - 455
EP - 466
BT - Computer Vision and Image Processing - 8th International Conference, CVIP 2023, Revised Selected Papers
A2 - Kaur, Harkeerat
A2 - Jakhetiya, Vinit
A2 - Goyal, Puneet
A2 - Khanna, Pritee
A2 - Raman, Balasubramanian
A2 - Kumar, Sanjeev
Y2 - 3 November 2023 through 5 November 2023
ER -