Many large scale applications, have significant I/O requirements as well as computational and memory requirements. Unfortunately, limited number of I/O nodes provided by the contemporary message-passing distributed-memory architectures such as Intel Paragon and IBM SP-2 limits the I/O performance of these applications severely. In this paper, we examine some software optimization techniques and architectural scalability and evaluate the effect of them in five I/O intensive applications from both small and large application domains. Our goals in this study are twofold: First, we want to understand the behavior of large-scale data intensive applications and the impact of I/O subsystem on their performance and vice-versa. Second, and more importantly, we strive to determine the solutions for improving the applications' performance by a mix of architectural and software solutions. Our results reveal that the different applications can benefit from different optimizations. For example, we found that some applications benefit from file layout optimizations whereas some others benefit from collective I/O. A combination of architectural and software solutions is normally needed to obtain good I/O performance. For example, we show that with limited number of I/O resources, it is possible to obtain good performance by using appropriate software optimizations. We also show that beyond a certain level, imbalance in the architecture results in performance degradation even when using optimized software, thereby indicating the necessity of increase in I/O resources.