We consider multicast communications from a single source to multiple destinations over a network of erasure channels. Linear network coding maximizes the achievable (min-cut) rate, and a distributed code assignment can be realized by choosing codes randomly at the intermediate nodes. It is typically assumed that the coding information (combining coefficients) at each node is included in the packet overhead, and forwarded to the destination. Instead, we assume that the network coding matrix is communicated to the destinations by appending training bits to the data bits at the source. End-to-end channel coding can then be applied to the training and data either separately, or jointly, by coding across both training and information bits. Ideally, the training overhead should balance the reliability of communicating the network matrix with the reliability of data detection. We maximize data throughput as a function of the training overhead, and show how it depends on the network size, erasure probability, number of independent messages, and field size. The combination network is used to illustrate our results, and shows under what conditions throughput is limited by training overhead.