有時候用圖表解釋表達你的想法,也是一個不錯的選擇,個人覺得比全文字的power point來得好。
分享給大家用! https://cacoo.com/diagrams/
| |||||||||||||||
| Blaise Barney, Lawrence Livermore National Laboratory | UCRL-MI-133316 | ||||||||||||||
| Abstract |
In shared memory multiprocessor architectures, such as SMPs, threads can be used to implement parallelism. Historically, hardware vendors have implemented their own proprietary versions of threads, making portability a concern for software developers. For UNIX systems, a standardized C language threads programming interface has been specified by the IEEE POSIX 1003.1c standard. Implementations that adhere to this standard are referred to as POSIX threads, or Pthreads.
The tutorial begins with an introduction to concepts, motivations, and design considerations for using Pthreads. Each of the three major classes of routines in the Pthreads API are then covered: Thread Management, Mutex Variables, and Condition Variables. Example codes are used throughout to demonstrate how to use most of the Pthreads routines needed by a new Pthreads programmer. The tutorial concludes with a discussion of LLNL specifics and how to mix MPI with pthreads. A lab exercise, with numerous example codes (C Language) is also included.
Level/Prerequisites: Ideal for those who are new to parallel programming with threads. A basic understanding of parallel programming in C is assumed. For those who are unfamiliar with Parallel Programming in general, the material covered in EC3500: Introduction To Parallel Computing would be helpful.
| Pthreads Overview |
![]() | ![]() |
| UNIX PROCESS | THREADS WITHIN A UNIX PROCESS |
| Pthreads Overview |
| Pthreads Overview |
For example, the following table compares timing results for the fork() subroutine and the pthread_create() subroutine. Timings reflect 50,000 process/thread creations, were performed with the time utility, and units are in seconds, no optimization flags.
Note: don't expect the sytem and user times to add up to real time, because these are SMP systems with multiple CPUs working on the problem at the same time. At best, these are approximations run on local machines, past and present.
| Platform | fork() | pthread_create() | ||||
|---|---|---|---|---|---|---|
| real | user | sys | real | user | sys | |
| AMD 2.3 GHz Opteron (16cpus/node) | 12.5 | 1.0 | 12.5 | 1.2 | 0.2 | 1.3 |
| AMD 2.4 GHz Opteron (8cpus/node) | 17.6 | 2.2 | 15.7 | 1.4 | 0.3 | 1.3 |
| IBM 4.0 GHz POWER6 (8cpus/node) | 9.5 | 0.6 | 8.8 | 1.6 | 0.1 | 0.4 |
| IBM 1.9 GHz POWER5 p5-575 (8cpus/node) | 64.2 | 30.7 | 27.6 | 1.7 | 0.6 | 1.1 |
| IBM 1.5 GHz POWER4 (8cpus/node) | 104.5 | 48.6 | 47.2 | 2.1 | 1.0 | 1.5 |
| INTEL 2.4 GHz Xeon (2 cpus/node) | 54.9 | 1.5 | 20.8 | 1.6 | 0.7 | 0.9 |
| INTEL 1.4 GHz Itanium2 (4 cpus/node) | 54.5 | 1.1 | 22.2 | 2.0 | 1.2 | 0.6 |
| Platform | MPI Shared Memory Bandwidth (GB/sec) | Pthreads Worst Case Memory-to-CPU Bandwidth (GB/sec) |
|---|---|---|
| AMD 2.3 GHz Opteron | 1.8 | 5.3 |
| AMD 2.4 GHz Opteron | 1.2 | 5.3 |
| IBM 1.9 GHz POWER5 p5-575 | 4.1 | 16 |
| IBM 1.5 GHz POWER4 | 2.1 | 4 |
| Intel 2.4 GHz Xeon | 0.3 | 4.3 |
| Intel 1.4 GHz Itanium 2 | 1.8 | 6.4 |
| Pthreads Overview |

Shared Memory Model:

Thread-safeness:
| The Pthreads API |
| Routine Prefix | Functional Group |
|---|---|
| pthread_ | Threads themselves and miscellaneous subroutines |
| pthread_attr_ | Thread attributes objects |
| pthread_mutex_ | Mutexes |
| pthread_mutexattr_ | Mutex attributes objects. |
| pthread_cond_ | Condition variables |
| pthread_condattr_ | Condition attributes objects |
| pthread_key_ | Thread-specific data keys |
| pthread_rwlock_ | Read/write locks |
| pthread_barrier_ | Synchronization barriers |
| Compiling Threaded Programs |
| Compiler / Platform | Compiler Command | Description |
|---|---|---|
| IBM AIX | xlc_r / cc_r | C (ANSI / non-ANSI) |
| xlC_r | C++ | |
| xlf_r -qnosave xlf90_r -qnosave | Fortran - using IBM's Pthreads API (non-portable) | |
| INTEL Linux | icc -pthread | C |
| icpc -pthread | C++ | |
| PathScale Linux | pathcc -pthread | C |
| pathCC -pthread | C++ | |
| PGI Linux | pgcc -lpthread | C |
| pgCC -lpthread | C++ | |
| GNU Linux, AIX | gcc -pthread | GNU C |
| g++ -pthread | GNU C++ |
| Thread Management |
| pthread_create (thread,attr,start_routine,arg) pthread_exit (status) pthread_attr_init (attr) pthread_attr_destroy (attr) |
Creating Threads:

| Question: After a thread has been created, how do you know when it will be scheduled to run by the operating system? |
Thread Attributes:
Terminating Threads:
| Thread Management |
| Question: How can you safely pass data to newly created threads, given their non-deterministic start-up and scheduling? |
| Thread Management |
| pthread_join (threadid,status) pthread_detach (threadid) pthread_attr_setdetachstate (attr,detachstate) pthread_attr_getdetachstate (attr,detachstate) |
Joining:

Joinable or Not?
Recommendations:
| Thread Management |
| pthread_attr_getstacksize (attr, stacksize) pthread_attr_setstacksize (attr, stacksize) pthread_attr_getstackaddr (attr, stackaddr) pthread_attr_setstackaddr (attr, stackaddr) |
Preventing Stack Problems:
Some Practical Examples at LC:
| Node Architecture | #CPUs | Memory (GB) | Default Size (bytes) |
|---|---|---|---|
| AMD Opteron | 8 | 16 | 2,097,152 |
| Intel IA64 | 4 | 8 | 33,554,432 |
| Intel IA32 | 2 | 4 | 2,097,152 |
| IBM Power5 | 8 | 32 | 196,608 |
| IBM Power4 | 8 | 16 | 196,608 |
| IBM Power3 | 16 | 16 | 98,304 |
This example demonstrates how to query and set a thread's stack size. #include <pthread.h> |
| Thread Management |
| pthread_self () pthread_equal (thread1,thread2) |
| pthread_once (once_control, init_routine) |
pthread_once_t once_control = PTHREAD_ONCE_INIT;
| Mutex Variables |
| Thread 1 | Thread 2 | Balance |
|---|---|---|
| Read balance: $1000 | $1000 | |
| Read balance: $1000 | $1000 | |
| Deposit $200 | $1000 | |
| Deposit $200 | $1000 | |
| Update balance $1000+$200 | $1200 | |
| Update balance $1000+$200 | $1200 |
| Mutex Variables |
| pthread_mutex_init (mutex,attr) pthread_mutex_destroy (mutex) pthread_mutexattr_init (attr) pthread_mutexattr_destroy (attr) |
Usage:
The mutex is initially unlocked.
Note that not all implementations may provide the three optional mutex attributes.
| Mutex Variables |
| pthread_mutex_lock (mutex) pthread_mutex_trylock (mutex) pthread_mutex_unlock (mutex) |
Usage:
Thread 1 Thread 2 Thread 3
Lock Lock
A = 2 A = A+1 A = A*B
Unlock Unlock
| Question: When more than one thread is waiting for a locked mutex, which thread will be granted the lock first after it is released? |
| Condition Variables |
Main Thread
| |
Thread A
| Thread B
|
| Main Thread Join / Continue | |
| Condition Variables |
| pthread_cond_init (condition,attr) pthread_cond_destroy (condition) pthread_condattr_init (attr) pthread_condattr_destroy (attr) |
Usage:
Note that not all implementations may provide the process-shared attribute.
| Condition Variables |
| pthread_cond_wait (condition,mutex) pthread_cond_signal (condition) pthread_cond_broadcast (condition) |
Usage:
![]() | Proper locking and unlocking of the associated mutex variable is essential when using these routines. For example:
|
| LLNL Specific Information and Recommendations |
This section describes details specific to Livermore Computing's systems.
Implementations:
Compiling:
Mixing MPI with Pthreads:
| Topics Not Covered |
Several features of the Pthreads API are not covered in this tutorial. These are listed below. See the Pthread Library Routines Reference section for more information.
| Pthread Library Routines Reference |
pthread_atfork
pthread_attr_destroy
pthread_attr_getdetachstate
pthread_attr_getguardsize
pthread_attr_getinheritsched
pthread_attr_getschedparam
pthread_attr_getschedpolicy
pthread_attr_getscope
pthread_attr_getstack
pthread_attr_getstackaddr
pthread_attr_getstacksize
pthread_attr_init
pthread_attr_setdetachstate
pthread_attr_setguardsize
pthread_attr_setinheritsched
pthread_attr_setschedparam
pthread_attr_setschedpolicy
pthread_attr_setscope
pthread_attr_setstack
pthread_attr_setstackaddr
pthread_attr_setstacksize
pthread_barrier_destroy
pthread_barrier_init
pthread_barrier_wait
pthread_barrierattr_destroy
pthread_barrierattr_getpshared
pthread_barrierattr_init
pthread_barrierattr_setpshared
pthread_cancel
pthread_cleanup_pop
pthread_cleanup_push
pthread_cond_broadcast
pthread_cond_destroy
pthread_cond_init
pthread_cond_signal
pthread_cond_timedwait
pthread_cond_wait
pthread_condattr_destroy
pthread_condattr_getclock
pthread_condattr_getpshared
pthread_condattr_init
pthread_condattr_setclock
pthread_condattr_setpshared
pthread_create
pthread_detach
pthread_equal
pthread_exit
pthread_getconcurrency
pthread_getcpuclockid
pthread_getschedparam
pthread_getspecific
pthread_join
pthread_key_create
pthread_key_delete
pthread_kill
pthread_mutex_destroy
pthread_mutex_getprioceiling
pthread_mutex_init
pthread_mutex_lock
pthread_mutex_setprioceiling
pthread_mutex_timedlock
pthread_mutex_trylock
pthread_mutex_unlock
pthread_mutexattr_destroy
pthread_mutexattr_getprioceiling
pthread_mutexattr_getprotocol
pthread_mutexattr_getpshared
pthread_mutexattr_gettype
pthread_mutexattr_init
pthread_mutexattr_setprioceiling
pthread_mutexattr_setprotocol
pthread_mutexattr_setpshared
pthread_mutexattr_settype
pthread_once
pthread_rwlock_destroy
pthread_rwlock_init
pthread_rwlock_rdlock
pthread_rwlock_timedrdlock
pthread_rwlock_timedwrlock
pthread_rwlock_tryrdlock
pthread_rwlock_trywrlock
pthread_rwlock_unlock
pthread_rwlock_wrlock
pthread_rwlockattr_destroy
pthread_rwlockattr_getpshared
pthread_rwlockattr_init
pthread_rwlockattr_setpshared
pthread_self
pthread_setcancelstate
pthread_setcanceltype
pthread_setconcurrency
pthread_setschedparam
pthread_setschedprio
pthread_setspecific
pthread_sigmask
pthread_spin_destroy
pthread_spin_init
pthread_spin_lock
pthread_spin_trylock
pthread_spin_unlock
pthread_testcancel
This completes the tutorial.
| Please complete the online evaluation form - unless you are doing the exercise, in which case please complete it at the end of the exercise. |
Where would you like to go now?
| References and More Information |