Depth-aware salient object segmentation
Main Article Content
Abstract
Object segmentation is an important task which is widely employed in many computer vision applications such as object detection, tracking, recognition, and retrieval. It can be seen as a two-phase process: object detection and segmentation. Object segmentation becomes more challenging in case there is no prior knowledge about the object in the scene. In such conditions, visual attention analysis via saliency mapping may offer a mean to predict the object location by using visual contrast, local or global, to identify regions that draw strong attention in the image. However, in such situations as clutter background, highly varied object surface, or shadow, regular and salient object segmentation approaches based on a single image feature such as color or brightness have shown to be insufficient for the task. This work proposes a new salient object segmentation method which uses a depth map obtained from the input image for enhancing the accuracy of saliency mapping. A deep learning-based method is employed for depth map estimation. Our experiments showed that the proposed method outperforms other state-of-the-art object segmentation algorithms in terms of recall and precision.
Keywords
Saliency map, Depth map, deep learning, object segmentation
References
[1] Itti, C. Koch, E. Niebur, A model of saliency-based visual attention for rapid scene analysis, IEEE Transactions on pattern analysis and machine intelligence 20(11) (1998) 1254-1259.
[2] Goferman, L. Zelnik-Manor, A. Tal, Context-aware saliency detection, IEEE transactions on pattern analysis and machine intelligence 34(10) (2012) 1915-1926.
[3] Kanan, M.H. Tong, L. Zhang, G.W. Cottrell, Sun: Top-down saliency using natural statistics, Visual cognition 17(6-7) (2009) 979-1003.
[4] Liu, Z. Yuan, J. Sun, J. Wang, N. Zheng, X. Tang, H.-Y. Shum, Learning to detect a salient object, IEEE Transactions on Pattern analysis and machine intelligence 33(2) (2011) 353-367.
[5] Perazzi, P. Krähenbühl, Y. Pritch, A. Hornung, Saliency filters: Contrast based filtering for salient region detection, in: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE, 2012, pp. 733-740.
[6] M. Cheng, N.J. Mitra, X. Huang, P.H. Torr, S.M. Hu, Global contrast based salient region detection, IEEE Transactions on Pattern Analysis and Machine Intelligence 37(3) (2015) 569-582.
[7] Borji, L. Itti, State-of-the-art in visual attention modeling, IEEE transactions on pattern analysis and machine intelligence 35(1) (2013) 185-207.
[8] Simonyan, A. Vedaldi, A. Zisserman, Deep inside convolutional networks: Visualising image classification models and saliency maps, arXiv preprint arXiv:1312.6034.
[9] Li, Y. Yu, Visual saliency based on multiscale deep features, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 5455-5463.
[10] Liu, J. Han, Dhsnet: Deep hierarchical saliency network for salient object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 678-686.
[11] Achanta, S. Hemami, F. Estrada, S. Susstrunk, Frequency-tuned saliency detection model, CVPR: Proc IEEE, 2009, pp. 1597-604.
Fu, J. Cheng, Z. Li, H. Lu, Saliency cuts: An automatic approach to object segmentation, in: Pattern Recognition, 2008. ICPR 2008. 19th International Conference on, IEEE, 2008, pp. 1-4Borenstein, J. Malik, Shape guided object segmentation, in: Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, Vol. 1, IEEE, 2006, pp. 969-976.
Jiang, J. Wang, Z. Yuan, T. Liu, N. Zheng, S. Li, Automatic salient object segmentation based on context and shape prior., in: BMVC. 6 (2011) 9.
Ciptadi, T. Hermans, J.M. Rehg, An in depth view of saliency, Georgia Institute of Technology, 2013.
Desingh, K.M. Krishna, D. Rajan, C. Jawahar, Depth really matters: Improving visual salient region detection with depth., in: BMVC, 2013.
Li, J. Ye, Y. Ji, H. Ling, J. Yu, Saliency detection on light field, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2806-2813.
Koch, S. Ullman, Shifts in selective visual attention: towards the underlying neural circuitry, in: Matters of intelligence, Springer, 1987, pp. 115-141.
Laina, C. Rupprecht, V. Belagiannis, F. Tombari, N. Navab, Deeper depth prediction with fully convolutional residual networks, in: 3D Vision (3DV), 2016 Fourth International Conference on, IEEE, 2016, pp. 239-248.
Bruce, J. Tsotsos, Saliency based on information maximization, in: Advances in neural information processing systems, 2006, pp. 155-162.
Ren, X. Gong, L. Yu, W. Zhou, M. Ying Yang, Exploiting global priors for rgb-d saliency detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015, pp. 25-32.
Fang, J. Wang, M. Narwaria, P. Le Callet, W. Lin, Saliency detection for stereoscopic images., IEEE Trans. Image Processing 23(6) (2014) 2625-2636.
Hou, L. Zhang, Saliency detection: A spectral residual approach, in: Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on, IEEE, 2007, pp. 1-8.
Guo, Q. Ma, L. Zhang, Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform, in: Computer vision and pattern recognition, 2008. cvpr 2008. ieee conference on, IEEE, 2008, pp. 1-8.
Fang, W. Lin, B.S. Lee, C.T. Lau, Z. Chen, C.W. Lin, Bottom-up saliency detection model based on human visual sensitivity and amplitude spectrum, IEEE Transactions on Multimedia 14(1) (2012) 187-198.
Lang, T.V. Nguyen, H. Katti, K. Yadati, M. Kankanhalli, S. Yan, Depth matters: Influence of depth cues on visual saliency, in: Computer vision-ECCV 2012, Springer, 2012, pp. 101-115.
Zhang, G. Jiang, M. Yu, K. Chen, Stereoscopic visual attention model for 3d video, in: International Conference on Multimedia Modeling, Springer, 2010, pp. 314-324.
Wang, M.P. Da Silva, P. Le Callet, V. Ricordel, Computational model of stereoscopic 3d visual saliency, IEEE Transactions on Image Processing 22(6) (2013) 2151-2165.
Peng, B. Li, W. Xiong, W. Hu, R. Ji, Rgbd salient object detection: A benchmark and algorithms, in: European Conference on Computer Vision (ECCV), 2014, pp. 92-109.
Wu, L. Duan, L. Kong, Rgb-d salient object detection via feature fusion and multi-scale enhancement, in: CCF Chinese Conference on Computer Vision, Springer, 2015, pp. 359-368.
Xue, Y. Gu, Y. Li, J. Yang, Rgb-d saliency detection via mutual guided manifold ranking, in: Image Processing (ICIP), 2015 IEEE International Conference on, IEEE, 2015, pp. 666-670.
Katz, A. Adler, Depth camera based on structured light and stereo vision, uS Patent App. 12/877,595 (Mar. 8 2012).
Chatterjee, G. Molina, D. Lelescu, Systems and methods for determining depth from multiple views of a scene that include aliasing using hypothesized fusion, uS Patent App. 13/623,091 (Mar. 21 2013).
Matthies, T. Kanade, R. Szeliski, Kalman filter-based algorithms for estimating depth from image sequences, International Journal of Computer Vision 3(3) (1989) 209-238.
Y. Schechner, N. Kiryati, Depth from defocus vs. stereo: How different really are they?, International Journal of Computer Vision 39(2) (2000) 141-162.
Delage, H. Lee, A.Y. Ng, A dynamic bayesian network model for autonomous 3d reconstruction from a single indoor image, in: Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, Vol. 2, IEEE, 2006, pp. 2418-2428.
Saxena, M. Sun, A.Y. Ng, Make3d: Learning 3d scene structure from a single still image, IEEE transactions on pattern analysis and machine intelligence 31(5) (2009) 824-840.
Hedau, D. Hoiem, D. Forsyth, Recovering the spatial layout of cluttered rooms, in: Computer vision, 2009 IEEE 12th international conference on, IEEE, 2009, pp. 1849-1856.
Liu, S. Gould, D. Koller, Single image depth estimation from predicted semantic labels, in: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, IEEE, 2010, pp. 1253-1260.
Ladicky, J. Shi, M. Pollefeys, Pulling things out of perspective, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 89-96.
K. Nathan Silberman, Derek Hoiem, R. Fergus, Indoor segmentation and support inference from rgbd images, in: ECCV, 2012.
Liu, J. Yuen, A. Torralba, Sift flow: Dense correspondence across scenes and its applications, IEEE transactions on pattern analysis and machine intelligence 33(5) (2011) 978-994.
Konrad, M. Wang, P. Ishwar, 2d-to-3d image conversion by learning depth from examples, in: Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, IEEE, 2012, pp. 16-22.
Liu, C. Shen, G. Lin, Deep convolutional neural fields for depth estimation from a single image, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 5162-5170.
Wang, X. Shen, Z. Lin, S. Cohen, B. Price, A.L. Yuille, Towards unified depth and semantic prediction from a single image, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2800-2809.
Geiger, P. Lenz, C. Stiller, R. Urtasun, Vision meets robotics: The kitti dataset, International Journal of Robotics Research (IJRR).
Achanta, S. Süsstrunk, Saliency detection using maximum symmetric surround, in: Image processing (ICIP), 2010 17th IEEE international conference on, IEEE, 2010, pp. 2653-2656.
E. Rahtu, J. Kannala, M. Salo, J. Heikkilä, Segmenting salient objects from images and videos, in: Computer Vision-ECCV 2010, Springer, 2010, pp. 366-37.