Joint Learning from Earth Observation and OpenStreetMap Data to Get Faster Better Semantic Maps [pdf]

Winner of the Spacenet Challenge II says this paper inspired him. OpenStreetMap data was included to help.

I thought the difficulty would be how to combine the OSM data with the satellite image, like the weight or normalization. Actually the problem was that we do not have much high quality OSM data. I tried to find some OSM for water area and forest, but most areas of the maps were blank. The data link provided by the winner is no longer available.

Winning solution of SpaceNet Challenge Round 2. XD_XD got the highest overall IoU score in the challenge. The final submission was the averaging ensemble from individually trained three U-Net models, and he used OpenStreetMap data. The link for OSM data is no longer available.

He mentioned that 8-band multispectral data improved the performance of his model. However in my experiment, the result was the same as when I was using RGB images. I suppose it is because we are using DeepLab, which is the current state-of-the-art segmentation model. Both RGB and 8-band data are about the light wave, and the DeepLab model is able to extract enough information from the RGB data.

His model is unable to recognize multiple buildings that are close in distance as one building footprint. This is also a reason for our low score on Shanghai and Khartoum. He also mentioned the ambiguous annotation rules in Khartoum.

From the sample images provided on his page, it seems that he did not perform orthorectification. I think this is OK for a challenge where competitors want to achieve higher scores, but I suppose we need to process the data if we want to use it as a benchmark. Also this feature of satellite images makes the edges quite confusing from the images. If we could find a way to othorectify before we do the segmentation, I think the performance could be improved. But I haven’t found an effective method. I’m still trying to find a orthorectification method.

Learning to Adapt Structured Output Space for Semantic Segmentation [pdf] [code]

This is a paper from CVPR 2018. They uses several adversarial networks for output spaces of different feature levels. It performs domain adaptation on different layers and got the highest mIoU score on benchmark. However, it did not perform so well on some single IoU scores.

Considering we only want to find the building footprints, adding adversarial networks may not help so much.

Context-Reinforced Semantic Segmentation [pdf] [code]

This is a paper from CVPR 2019. They use MDP for context learning. The model is divided into segment net and context net. Context map helps segment net to predict, and the prediction in turn helps adjust context net.

I noticed several papers in 2019 tried to improve the performance with context. But the training seems a little tricky and I’m still trying to understand some parts of the model.

Object-Contextual Representations for Semantic Segmentation [pdf] [code]

This is also a paper using context. It achieved high scores on many datasets. But the model is also a little confusing.

Unsupervised Domain Adaptation to Improve Segmentation Quality Both in Source and Target Domain [pdf]

This paper focuses on improving performance on both the target domain and the source domain. During training, they add a discriminator on $Ys, Yt$. In this way they hope to minimize the distance between the output of the source domain and target domain. They measure the distance with average of different channels.

There results show that, though they do not have labels for target domains, the discriminator helps reduce the distance and improve the performance. One problem is that they did not do any comparison with performaces with other methods. Another problem is that each time new data arrive, the model need to be trained again.

Bidirectional Learning for Domain Adaptation of Semantic Segmentation [pdf] [code]

The idea is very closely related to cycle-GAN. They propose a bidirectional learning frame work, which means they use both ‘translation-to-segmentation’ and ‘segmentation-to-translation’ to improve performance. This seems a version of cycle-GAN.