The standard placental examination helps identify adverse pregnancy outcomes but is not scalable since it requires hospital-level equipment and expert knowledge. Although the current supervised learning approaches in automatic placenta analysis improved the scalability, those approaches fall short on robustness and generalizability due to the scarcity of labeled training images. In this paper, we propose to use the vision-language contrastive learning (VLC) approach to address the data scarcity problem by incorporating the abundant pathology reports into the training data. Moreover, we address the feature suppression problem in the current VLC approaches to improve generalizability and robustness. The improvements enable us to use a shared image encoder across tasks to boost efficiency. Overall, our approach outperforms the strong baselines for fetal/maternal inflammatory response (FIR/MIR), chorioamnionitis, and sepsis risk classification tasks using the images from a professional photography instrument at the Northwestern Memorial Hospital; it also achieves the highest inference robustness to iPad images for MIR and chorioamnionitis risk classification tasks. It is the first approach to show robustness to placenta images from a mobile platform that is accessible to low-resource communities.