Perceptual image quality metrics have explicitly accounted for human visual system (HVS) sensitivity to subband noise by estimating thresholds above which distortion is just-noticeable. A recently proposed class of quality metrics, known as structural similarity (SSIM), models perception implicitly by taking into account the fact that the HVS is adapted for extracting structural information (relative spatial covariance) from images. We compare specific SSIM implementations both in the image space and the wavelet domain. We also evaluate the effectiveness of the complex wavelet SSIM (CWSSIM), a translation-insensitive SSIM implementation, in the context of realistic distortions that arise from compression and error concealment in video transmission applications. In order to better explore the space of distortions, we propose models for typical distortions encountered in video compression/transmission applications. We also derive a multi-scale weighted variant of the complex wavelet SSIM (WCWSSIM), with weights based on the human contrast sensitivity function to handle local mean shift distortions.