This paper describes a robust tracking algorithm used to localize human hand in video sequences. The localization system relies mainly on an automatic color-based segmentation scheme combined with the motion cue. An automatic self-organizing clustering algorithm is proposed to learn the color clusters unsupervisedly in the HSI space without specifying the number of clusters in advance. The schemes of growing, pruning and merging of 1-D self-organizing map (SOM) are facilitated to find an appropriate number of clusters in the forming stage of SOM. The train- ing and segmentation in our approach is fast enough to make possible real-Time applications. This segmentation scheme is capable of tracking multiple objects of different colors simultaneously. Motion cue is employed to focus the attention of the tracking algorithm. This approach is also applied to other tasks such as human face tracking and color indexing. Our localization system implemented on a SGI O2 R10000 workstation is reliable and efficient at 20-30Hz.