High performance parallel computing demands careful synchronization, timing, performance isolation and control, as well as the avoidance of OS and other types of noise. The employment of soft real-time systems toward these ends has already shown considerable promise, particularly for distributed memory machines. As processor core counts grow rapidly, a natural question is whether similar promise extends to the node. To address this question, we present the design, implementation, and performance evaluation of a hard real-time scheduler specifically for high performance parallel computing on shared memory nodes built on x64 processors, such as the Xeon Phi. Our scheduler is embedded in a kernel framework that is already specialized for high performance parallel run-times and applications, and that meets the basic requirements needed for a real-time OS (RTOS). The scheduler adds hard real-time threads both in their classic, individual form, and in a group form in which a group of parallel threads execute in near lock-step using only scalable, per-hardware-thread scheduling. On a current generation Intel Xeon Phi, the scheduler is able to handle timing constraints down to resolution of ∼13,000 cycles (∼10 μs), with synchronization to within ∼4,000 cycles (∼3 μs) among 255 parallel threads. The scheduler isolates a parallel group and is able to provide resource throttling with commensurate application performance. We also show that in some cases such fine-grain control over time allows us to eliminate barrier synchronization, leading to performance gains, particularly for fine-grain BSP workloads.