Document

🔖 Bookmarks ⬆ Upgrade 🧩 More Apps | 🔗 Share ✉ Send Feedback ⭐ Rate App

← ✕

←

Multi - Core Architectures and Programming

Practical project ideas and implementations for hands-on learning

Ensuring the Correct Order of Memory Operations

Examining the Insides of a Computer

Hardware, Processes, and Threads

How Latency and Bandwidth Impact Performance

Increasing Instruction Issue Rate with Pipelined Processor Cores

Supporting Multiple Threads on a Single Chip

The Characteristics of Multiprocessor Systems

The Differences Between Processes and Threads

The Motivation for Multicore Processors

The Performance of 32-Bit versus 64-Bit Code

The Translation of Source Code to Assembly Language

Translating from Virtual Addresses to Physical Addresses

Using Caches to Hold Recently Used Data

Using Virtual Memory to Store Data

Alternative Languages

Clustering Technologies

GPU-Based Computing

Language Extensions

Transactional Memory

Vectorization

Coding for Performance

Commonly Available Profiling Tools

Defining Performance

How Cross-File Optimization Improves Performance

How Not to Optimize

How Structure Impacts Performance

Identifying Where Time Is Spent Using Profiling

Performance and Convenience Trade-Offs in Source Code and Build Structures

Performance by Design

Pointer Aliasing and Compiler Optimization Issues

Selecting Appropriate Compiler Options

The Impact of Data Structures on Performance

The Role of the Compiler

The Two Types of Compiler Optimization

Understanding Algorithmic Complexity

Using Algorithmic Complexity with Care

Using Libraries to Structure Applications

Using Profile Feedback

Why Algorithmic Complexity Is Important

Amdahl’s Law

Anti-dependencies and Output Dependencies

Client–Server Division of Work

Combining Parallelization Strategies

Critical Paths

Data Parallelism Using SIMD Instructions

Determining Maximum Practical Threads

Hosting Multiple Operating Systems Using Hypervisors

How Dependencies Affect Parallel Execution

How Parallelism Changes Algorithm Choice

How Synchronization Costs Reduce Scaling

Identifying Opportunities for Parallelism

Identifying Parallelization Opportunities

Improving Machine Efficiency Through Consolidation

Multiple Copies of the Same Task

Multiple Independent Tasks

Multiple Loosely Coupled Tasks

Multiple Users Utilizing a Single System

Parallelization Patterns

Parallelization Using Processes or Threads

Pipeline of Tasks for One Item

Producer–Consumer Splitting

Single Task Split Over Multiple Threads

Using Containers to Isolate Applications

Using Multiple Processes to Improve System Productivity

Using Parallelism to Improve Performance of a Single Task

Using Speculation to Break Dependencies

Visualizing Parallel Applications

Atomic Operations and Lock-Free Code

Avoiding Data Races

Barriers

Communication Between Threads and Processes

Data Races

Deadlocks and Livelocks

Mutexes and Critical Regions

Readers–Writer Locks

Semaphores

Spin Locks

Storing Thread-Private Data

Synchronization and Data Sharing

Synchronization Primitives

Tools for Detecting Data Races

Compiling Multithreaded Code

Creating Threads

Multiprocess Programming

Process Termination

Reentrant Code and Compiler Flags

Sharing Data Between Threads

Sockets

Using POSIX Threads

Variables and Memory

Windows Threading

Allocating Thread-Local Storage

Atomic Updates of Variables

Communicating Using Sockets

Communicating with Pipes

Creating and Resuming Suspended Threads

Creating Native Windows Threads

Creating Processes

Example: Requiring Synchronization Between Threads

Inheriting Handles in Child Processes

Naming Mutexes and Sharing Them

Protecting Code with Critical Sections

Protecting Regions with Mutexes

Setting Thread Priority

Sharing Memory Between Processes

Signaling Event Completion

Slim Reader/Writer Locks

Synchronization and Resource Sharing Methods

Terminating Threads

Using Handles to Kernel Resources

Wide String Handling

Accessing Private Data Outside Parallel Regions

Assisting the Compiler with Automatic Parallelization

Collapsing Loops to Improve Balance

Controlling the OpenMP Runtime Environment

Dynamic Parallel Tasks in OpenMP

Enforcing Memory Consistency

Ensuring In-Order Execution in Parallel Regions

Identifying and Parallelizing Reductions

Improving Scheduling and Work Distribution

Keeping Data Private to Threads

Nested Parallelism

Parallel Sections for Independent Work

Parallelization Example

Parallelizing Code Containing Calls

Parallelizing Reductions Using OpenMP

Producing a Parallel Application Automatically

Restricting Threads That Execute a Region

Runtime Behavior of OpenMP Applications

Using Automatic Parallelization and OpenMP

Using OpenMP to Create Parallel Applications

Using OpenMP to Parallelize Loops

Variable Scoping in OpenMP Regions

Waiting for Work Completion

Atomic Operations

Compare-and-Swap for Complex Atomics

Compiler Memory-Ordering Directives

Dekker’s Algorithm

Enforcing Memory Ordering

Hand-Coded Synchronization and Sharing

Lockless Algorithms

Modifying Code to Use Atomics

OS-Provided Atomics

Producer–Consumer with Circular Buffer

Reordering by the Compiler

Scaling Consumers or Producers

Scaling Producer–Consumer to Multiple Threads

The ABA Problem

Volatile Variables

Bandwidth Sharing Between Cores

Cache Conflict and Capacity Issues

Constraints to Application Scaling

False Sharing

Hardware Constraints to Scaling

Multicore Processors and Scaling

OS Constraints to Scaling

Pipeline Resource Starvation

Scaling with Multicore Processors