- I am currently in Long Beach attending CVPR 2019, where we are going to present our paper on fast neural architecture search for semantic segmentation – transferrable to other dense per-pixel tasks such as depth estimation and pose estimation. The paper is available here and the models have been released here. Notably, we used only 8 (!) GPU-days to find compact architectures that outperform DeepLab-v3+. If you are attending CVPR and interested in our work, please come over to our poster #18 on Thursday, June 20, 2019 from 10am until 12.45 (poster stand 3.1). In next few weeks, I will publish a more detailed overview of the paper.
In this tutorial, I will cover one possible way of converting a PyTorch model into TensorFlow.js. This conversion will allow us to embed our model into a web-page. Someone might ask why to bother with TensorFlow.js at all when onnx.js or even torch.js already exist? To be completely honest, I tried to use my model in onnx.js and segmentation part did not work at all, even though the depth predictions were decent. Furthermore, onnx.js does not yet support many operators, such as upsampling, which forced me to upsample by concatenation and led to subpar results.
I have just released a PyTorch wrapper that aims to facilitate a typical training workflow of dense per-pixel tasks. The project code is available here. Currently, two training examples are provided: one for single-task training of semantic segmentation using DeepLab-v3+ with the Xception65 backbone, and one for multi-task training of joint semantic segmentation and depth estimation using Multi-Task RefineNet with the MobileNet-v2 backbone.
Our paper, titled “Real-Time Joint Semantic Segmentation and Depth Estimation Using Asymmetric Annotations” has recently been accepted at International Conference on Robotics and Automation (ICRA 2019), which will take place in Montreal, Canada in May. This was a joint work between the University of Adelaide and Monash University, and it was a great experience for me learning from my collaborators about two dense per-pixel tasks that I had only been vaguely familiar with before: depth estimation – i.e., predicting how far each pixel is from the observer, and surface normals estimation – i.e., predicting a perpendicular vector (normal vector) to each pixel’s surface. Both tasks are extremely valuable in the robotic community, and hence we were motivated to explore the limits of performing three tasks (2 above + semantic segmentation) in real-time using a single network.
|Segmenting a turtle - one of the slowest animals on Earth: left - original photo, right - segmentation result (highlighted in different colour)|
After nearly two years from my first publication (at the same venue as now), which included a year of academic break, I have finally submitted my first paper in my new PhD journey to the BMVC conference, which will take place from Sep, 3 to Sep, 6 in Newcastle-upon-Tyne. This time it took me a month more for the submission, although all the main results were in place already in March.