Video analysis is a staple of the education research community. For many contemporary education researchers, participation in the video coding process serves as a rite of passage. However, recent developments in multimodal learning analytics May help to accelerate and enhance this process by providing researchers with a more nuanced glimpse into a set of learning experiences. As an example of how to use multimodal learning analytics towards these ends, this paper includes a preliminary analysis from 54 college students, who completed two engineering design tasks in pairs. Gesture, speech and electro-dermal activation data were collected as students completed these tasks. The gesture data was used to learn a set of canonical clusters (N=4). A decision tree was trained based on individual students’ cluster frequencies, and pre-post learning gains. The nodes in the decision tree were then used to identify a subset of video segments that were human coded based on prior work in learning analytics and engineering design. The combination of machine learning and human inference helps elucidate the practices that seem to correlate with student learning. In particular, both engagement and disengagement seem to correlate with student learning, albeit in a somewhat nuanced fashion.