Vision serves a fundamental role in the human experience of musical performance. In conducting, this particular heuristic influences both expressive and coordinative aspects of musical activity. Ensemble conductors present a special case of musical gesture, as their activities are coordinative rather than directly sound-producing. While the influence of vision on evaluations of musical expressivity has been well studied, less attention has been paid to the temporal aspect of conductors’ gestures. Given anecdotal observations of a flexibly congruent relationship between conductor gesture and ensemble response and the ability of entrainment to promote preference, we theorize that alterations to natural action-sound congruence in conductor-to-ensemble settings may influence evaluations of conductor quality. Naturalistic performance video of five conductors was left intact or adjusted to an audio- or video-lead condition by a percentage of each excerpt tempo (intact, ±15%, ±30%) and fully crossed into stimuli orders. Participants were asked to rate the quality of the conductor, the ensemble, and the performance overall using a Likert-type scale bound by “poor” and “excellent.” Our results indicate that any offset, whether audio- or video-led, resulted in a lower level of conductor quality than intact, unaltered performance. While our effect size was small (ηp2 = .02), participant ratings reinforce the role of action-sound congruence on observers’ perceptions and overall evaluation of conductors’ activities.