We propose and evaluate a data filtering method to reduce the power consumption of high-end processors with multiple execution cores. Although the proposed method can be applied to a wide variety of multi-processor systems including MPPs, SMPs and any type of single-chip multiprocessor, we concentrate on Network Processors. The proposed method uses an execution unit called Data Filtering Engine that processes data with low temporal locality before it is placed on the system bus. The execution cores use locality to decide which load instructions have low temporal locality and which portion of the surrounding code should be off-loaded to the data filtering engine.Our technique reduces the power consumption, because a) the low temporal data is processed on the data filtering engine before it is placed onto the high capacitance system bus, and b) the conflict misses caused by low temporal data are reduced resulting in fewer accesses to the L2 cache. Specifically, we show that our technique reduces the bus accesses in representative applications by as much as 46.8% (26.5% on average) and reduces the overall power by as much as 15.6% (8.6% on average) on a single-core processor. It also improves the performance by as much as 76.7% (29.7% on average) for a processor with 16 execution cores.