In this paper, a novel Greybox design methodology is proposed to establish a design and co-optimization flow across the boundary of conventional software and hardware design. The dynamic timing of each software instruction is simulated and associated with processor hardware design, which provides the basis of ultra-dynamic clock management. The proposed scheme effectively implements the instruction-based clock management and achieves 21.71% frequency speedup. Besides, a novel program-driven hardware optimization flow is proposed, in which software operations are mapped with hardware gate netlist and sorted by the usage frequency. The experiments on an ARM based pipeline design in commercial 65nm CMOS process show an extra 10% frequency speedup is obtained with high optimization efficiency. Overall, the proposed Greybox design method achieves frequency speedup by 31.56%, comparing with conventional design method.