This paper presents a new approach to segmenting monocular videos captured by static or hand-held cameras filming large moving non-rigid foreground objects. The foreground and background objects are modeled using spatialcolor Gaussian mixture models (SCGMM), and segmented using the graph cut algorithm, which minimizes a Markov random field energy function containing the SCGMM models. In view of the existence of a modeling gap between the available SCGMMs and segmentation task of a new frame, one major contribution of our paper is the introduction of a novel foreground/background SCGMM joint tracking algorithm to bridge this space, which greatly improves the segmentation performance in case of complex or rapid motion. Specifically, we propose to combine the two SCGMMs into a generative model of the whole image, and maximize the joint data likelihood using a constrained Expectation-Maximization (EM) algorithm. The effectiveness of the proposed algorithm is demonstrated on a variety of sequences.