TY - JOUR
T1 - Joint Video Object Discovery and Segmentation by Coupled Dynamic Markov Networks
AU - Liu, Ziyi
AU - Wang, Le
AU - Hua, Gang
AU - Zhang, Qilin
AU - Niu, Zhenxing
AU - Wu, Ying
AU - Zheng, Nanning
N1 - Funding Information:
Manuscript received February 10, 2018; revised June 18, 2018; accepted July 16, 2018. Date of publication July 31, 2018; date of current version September 4, 2018. This work was supported in part by the National Key Research and Development Program of China under Grant 2017YFA0700800, in part by the National Natural Science Foundation of China under Grants 61629301, 61773312, 91748208, and 61503296, in part by the China Postdoctoral Science Foundation under Grants 2017T100752 and 2015M572563, in part by the National Science Foundation under Grants IIS-1217302 and IIS-1619078, and in part by the Army Research Office under Grant ARO W911NF-16-1-0138. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Tolga Tasdizen. (Corresponding author: Le Wang.) Z. Liu, L. Wang, and N. Zheng are with the Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an 710049, China (e-mail: liuziyi@stu.xjtu.edu.cn; lewang@xjtu.edu.cn; nnzheng@mail.xjtu.edu.cn).
Publisher Copyright:
© 1992-2012 IEEE.
PY - 2018/12
Y1 - 2018/12
N2 - It is a challenging task to extract segmentation mask of a target from a single noisy video, which involves object discovery coupled with segmentation. To solve this challenge, we present a method to jointly discover and segment an object from a noisy video, where the target disappears intermittently throughout the video. Previous methods either only fulfill video object discovery, or video object segmentation presuming the existence of the object in each frame. We argue that jointly conducting the two tasks in a unified way will be beneficial. In other words, video object discovery and video object segmentation tasks can facilitate each other. To validate this hypothesis, we propose a principled probabilistic model, where two dynamic Markov networks are coupled-one for discovery and the other for segmentation. When conducting the Bayesian inference on this model using belief propagation, the bi-directional message passing reveals a clear collaboration between these two inference tasks. We validated our proposed method in five data sets. The first three video data sets, i.e., the SegTrack data set, the YouTube-objects data set, and the Davis data set, are not noisy, where all video frames contain the objects. The two noisy data sets, i.e., the XJTU-Stevens data set, and the Noisy-ViDiSeg data set, newly introduced in this paper, both have many frames that do not contain the objects. When compared with state of the art, it is shown that although our method produces inferior results on video data sets without noisy frames, we are able to obtain better results on video data sets with noisy frames.
AB - It is a challenging task to extract segmentation mask of a target from a single noisy video, which involves object discovery coupled with segmentation. To solve this challenge, we present a method to jointly discover and segment an object from a noisy video, where the target disappears intermittently throughout the video. Previous methods either only fulfill video object discovery, or video object segmentation presuming the existence of the object in each frame. We argue that jointly conducting the two tasks in a unified way will be beneficial. In other words, video object discovery and video object segmentation tasks can facilitate each other. To validate this hypothesis, we propose a principled probabilistic model, where two dynamic Markov networks are coupled-one for discovery and the other for segmentation. When conducting the Bayesian inference on this model using belief propagation, the bi-directional message passing reveals a clear collaboration between these two inference tasks. We validated our proposed method in five data sets. The first three video data sets, i.e., the SegTrack data set, the YouTube-objects data set, and the Davis data set, are not noisy, where all video frames contain the objects. The two noisy data sets, i.e., the XJTU-Stevens data set, and the Noisy-ViDiSeg data set, newly introduced in this paper, both have many frames that do not contain the objects. When compared with state of the art, it is shown that although our method produces inferior results on video data sets without noisy frames, we are able to obtain better results on video data sets with noisy frames.
KW - Object segmentation
KW - dynamic Markov networks
KW - object discovery
KW - probabilistic graphical model
UR - http://www.scopus.com/inward/record.url?scp=85050734305&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85050734305&partnerID=8YFLogxK
U2 - 10.1109/TIP.2018.2859622
DO - 10.1109/TIP.2018.2859622
M3 - Article
C2 - 30059300
AN - SCOPUS:85050734305
SN - 1057-7149
VL - 27
SP - 5840
EP - 5853
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
IS - 12
M1 - 8423204
ER -