The performance of a NUMA architecture depends on the efficient use of local memory. Therefore, software-level techniques that improve memory locality (in addition to parallelism) are extremely important to extract the best performance from these architectures. The proposed solutions so far include OS-based automatic data migrations and compiler-based static/dynamic data distributions. This paper proposes and evaluates a hybrid strategy for optimizing memory locality in NUMA architectures. In this strategy, we employ both compiler-directed data distribution and OS-directed dynamic page migration. More specifically, a given program code is first divided into segments, and then each segment is optimized either using compiler-based data distributions (at compile-time) or using dynamic migration (at runtime). In selecting the optimization strategy to use for a program segment, we use a criterion based on the number of compile-time analyzable references in loops. To test the effectiveness of our strategy in optimizing memory locality of applications, we implemented it and compared its performance with that of several other techniques such as compiler-directed data distribution and OS-directed dynamic page migration. Our experimental results obtained through simulation indicate that our hybrid strategy outperforms other strategies and achieves the best performance for a set of codes with regular, irregular, and mixed (regular + irregular) access patterns.