The performance of a wireless channel with multiple antennas benefits from channel knowledge at the receiver, which is typically unknown a priori. We study the capacity of a block fading multiple-input/multiple-output (MIMO) channel with a linear receiver, which is estimated from a training sequence via a Least Squares (LS) algorithm. Given a fixed block size, the amount of training overhead plays a key role in balancing the quality of the receiver estimate and the data transmission time. Here we study the optimal training length, which maximizes the large system MIMO capacity, i.e., the number of transmit and receive antennas go to infinity with fixed ratio. In order to obtain a meaningful limit, the training length and packet length also increase in fixed proportion to the number of antennas. We show that the optimal amount of training grows as the square root of the block size, as the block size becomes large. Furthermore, only a slight benefit is obtained from optimizing the allocation of power across training and data symbols. Numerical results show that for a fixed block length, the capacity can be increased somewhat by adding a properly chosen diagonal loading factor to the LS algorithm.