It is now possible to allow VMs hosting HPC applications to seamlessly bridge distributed cloud resources and tightly-coupled supercomputing and cluster resources. However, to achieve the application performance that the tightly-coupled resources are capable of, it is important that the overlay network not introduce significant overhead relative to the native hardware, which is not the case for current user-level tools, including our own existing VNET/U system. In response, we describe the design, implementation, and evaluation of a layer 2 virtual networking system that has negligible latency and bandwidth overheads in 1-10 Gbps networks. Our system, VNET/P, is directly embedded into our publicly available Palacios virtual machine monitor (VMM). VNET/P achieves native performance on 1 Gbps Ethernet networks and very high performance on 10 Gbps Ethernet networks and InfiniBand. The NAS benchmarks generally achieve over 95% of their native performance on both 1 and 10 Gbps. These results suggest it is feasible to extend a software-based overlay network designed for computing at wide-area scales into tightly-coupled environments.