Aerial scenes captured by UAVs have immense potential in IoT applications related to urban surveillance,
road and building segmentation, land cover classification, etc. which are necessary for the evolution of smart
cities. The advancements in deep learning have greatly enhanced visual understanding, but the domain of
aerial vision remains largely unexplored. Aerial images pose many unique challenges for performing proper
scene parsing such as high-resolution data, small-scaled objects, a large number of objects in the camera view,
dense clustering of objects, background clutter, etc., which greatly hinder the performance of the existing
deep learning methods. In this work, we propose ISDNet (Instance Segmentation and Detection Network), a
novel network to perform instance segmentation and object detection on visual data captured by UAVs. This
work enables aerial image analytics for various needs in a smart city. In particular, we use dilated convolutions
to generate improved spatial context, leading to better discrimination between foreground and background
features. The proposed network efficiently reuses the segment-mask features by propagating them from
early stages using residual connections. Furthermore, ISDNet makes use of effective anchors to accommodate
varying object scales and sizes. The proposed method obtains state-of-the-art results in the aerial context.Aerial scenes captured by UAVs have immense potential in IoT applications related to urban surveillance,
road and building segmentation, land cover classification, etc. which are necessary for the evolution of smart
cities. The advancements in deep learning have greatly enhanced visual understanding, but the domain of
aerial vision remains largely unexplored. Aerial images pose many unique challenges for performing proper
scene parsing such as high-resolution data, small-scaled objects, a large number of objects in the camera view,
dense clustering of objects, background clutter, etc., which greatly hinder the performance of the existing
deep learning methods. In this work, we propose ISDNet (Instance Segmentation and Detection Network), a
novel network to perform instance segmentation and object detection on visual data captured by UAVs. This
work enables aerial image analytics for various needs in a smart city. In particular, we use dilated convolutions
to generate improved spatial context, leading to better discrimination between foreground and background
features. The proposed network efficiently reuses the segment-mask features by propagating them from
early stages using residual connections. Furthermore, ISDNet makes use of effective anchors to accommodate
varying object scales and sizes. The proposed method obtains state-of-the-art results in the aerial context.