Most existing methods of semi-supervised clustering introduce supervision from outside, e.g., manually label some data samples or introduce constrains into clustering results. This paper studies an interesting problem: can the supervision come from inside, i.e., the unsupervised training data themselves? If the data samples are not independent, we can capture the contextual information reflecting the dependency among the data samples, and use it as supervision to improve the clustering. This is called context-aware clustering. The investigation is substantialized on two scenarios of (1) clustering primitive visual features (e.g., SIFT features) with help of spatial contexts, and (2) clustering '0'-'9' hand written digits with help of contextual patterns among different types of features. Our context-aware clustering can be well formulated in a closed-form, where the contextual information serves as a regularization term to balance the data fidelity in original feature space and the influences of contextual patterns. A nested-EM algorithm is proposed to obtain an efficient solution, which proves to converge. By exploring the dependent structure of the data samples, this method is completely unsupervised, as no outside supervision is introduced.