Visual focus of attention in multi-person discussions is a crucial nonverbal indicator in tasks such as inter-personal relation inference, speech transcription, and deception detection. However, predicting the focus of attention remains a challenge because the focus changes rapidly, the discussions are highly dynamic, and the people's behaviors are inter-dependent. Here we propose ICAF (Iterative Collective Attention Focus), a collective classification model to jointly learn the visual focus of attention of all people. Every person is modeled using a separate classifier. ICAF models the people collectively-the predictions of all other people's classifiers are used as inputs to each person's classifier. This explicitly incorporates inter-dependencies between all people's behaviors. We evaluate ICAF on a novel dataset of 5 videos (35 people, 109 minutes, 7604 labels in all) of the popular Resistance game and a widely-studied meeting dataset with supervised prediction. ICAF outperforms the strongest baseline by 1%-5% accuracy in predicting the people's visual focus of attention. Further, we propose a lightly supervised technique to train models in the absence of training labels. We show that light-supervised ICAF performs similar to the supervised ICAF, thus showing its effectiveness and generality to previously unseen videos.