In this course, we will learn the notion of locality, parallelism and hierarchy and how these concepts are utilized in designing the processor, interconnection network, and memory/storage subsystem of modern high-performance parallel computer architectures such as CPUs and GP-GPUs. Based on the understanding of these parallel computer architectures, we will explore how these systems can be programmed using parallel programming language models, followed by a discussion on their use-cases in designing domain-specific architectures for certain target application domains.