In this paper, we propose a cross-layer integrated microprocessor design methodology where instructions in software programs drive the design down to the gate level netlists. Based on in-depth exploration of the dynamic timing behavior of each instruction in the program, a fully integrated design approach is proposed with ultra-dynamic clock and power management circuits and software driven design optimization approach. A cross-layer simulation environment is also introduced enabling the collaborative co-design among compiler, architecture and circuits. The proposed design methodology helps bridge the gap between software and hardware development in a conventional development cycle. Our test vehicle using ARM based processor demonstrates substantial improvement on speed and power efficiency using the proposed design methodology.