Are we there yet? Part I. Education

As of 2019, if you were to build a text generator (akin to this one), chances are that after the word Artificial it would mostly likely be predicting the word Intelligence; that combination of words is currently everywhere – driverless cars, medical diagnosis, deep fakes, and on, and on. As a person that being fortunate to be doing research in related areas during this period, I would not call the underlying algorithms “AI” and instead would prefer such terms as machine learning, pattern recognition, data analysis and data science, or, more generally, not yet AI.

Read More

A Couple of Updates

  1. I am currently in Long Beach attending CVPR 2019, where we are going to present our paper on fast neural architecture search for semantic segmentation – transferrable to other dense per-pixel tasks such as depth estimation and pose estimation. The paper is available here and the models have been released here. Notably, we used only 8 (!) GPU-days to find compact architectures that outperform DeepLab-v3+. If you are attending CVPR and interested in our work, please come over to our poster #18 on Thursday, June 20, 2019 from 10am until 12.45 (poster stand 3.1). In next few weeks, I will publish a more detailed overview of the paper.
Read More

Tutorial - Converting a PyTorch model to TensorFlow.js

In this tutorial, I will cover one possible way of converting a PyTorch model into TensorFlow.js. This conversion will allow us to embed our model into a web-page. Someone might ask why to bother with TensorFlow.js at all when onnx.js or even torch.js already exist? To be completely honest, I tried to use my model in onnx.js and segmentation part did not work at all, even though the depth predictions were decent. Furthermore, onnx.js does not yet support many operators, such as upsampling, which forced me to upsample by concatenation and led to subpar results.

Read More

Release of DenseTorch

I have just released a PyTorch wrapper that aims to facilitate a typical training workflow of dense per-pixel tasks. The project code is available here. Currently, two training examples are provided: one for single-task training of semantic segmentation using DeepLab-v3+ with the Xception65 backbone, and one for multi-task training of joint semantic segmentation and depth estimation using Multi-Task RefineNet with the MobileNet-v2 backbone.

Read More

Real-Time Joint Segmentation, Depth and Surface Normals Estimation

Our paper, titled “Real-Time Joint Semantic Segmentation and Depth Estimation Using Asymmetric Annotations” has recently been accepted at International Conference on Robotics and Automation (ICRA 2019), which will take place in Montreal, Canada in May. This was a joint work between the University of Adelaide and Monash University, and it was a great experience for me learning from my collaborators about two dense per-pixel tasks that I had only been vaguely familiar with before: depth estimation – i.e., predicting how far each pixel is from the observer, and surface normals estimation – i.e., predicting a perpendicular vector (normal vector) to each pixel’s surface. Both tasks are extremely valuable in the robotic community, and hence we were motivated to explore the limits of performing three tasks (2 above + semantic segmentation) in real-time using a single network.

Read More

Light-Weight RefineNet

After nearly two years from my first publication (at the same venue as now), which included a year of academic break, I have finally submitted my first paper in my new PhD journey to the BMVC conference, which will take place from Sep, 3 to Sep, 6 in Newcastle-upon-Tyne. This time it took me a month more for the submission, although all the main results were in place already in March.

Read More