Threading has a lot of concepts, theories,
terms that make sense for a thread programmer. All such terms are listed here
to give the basic understanding. Some of these concepts will be explained in
the coming articles, and some may not be, according to their importance in
thread programming.
Thread: Thread is synonymous to sequence of instructions to be executed in a
particular order within the context of a process. If many such tasks (sequence
of instructions) are to be executed then they will be executed by processor in
time-shared mode or simultaneously on different processors (if available). If a
processor has to execute different threads in time-shared mode then it will do
thread switching, i.e. pause thread execution, save the state and data of that
thread to continue its execution later and execute the next thread in the
queue. Threads are considered to be lightweight because their scope and
capabilities are limited and they are dependent to processes.
Process: Process is a
high-level job or application running with the support of operating system to
accomplish a set of tasks. Process is considered heavy weight because they have
more responsibilities and capabilities when compared to threads. They are more
independent and keeps track of lot more resources and context information. A
process will have one to any number of threads to perform the tasks.
Thread Programming: Real World applications and systems will in
general have multiple threads executing different jobs in parallel for optimal
utilization of resources including the processing speed. So the applications
have to effectively manage the threads to perform the tasks efficiently without
catastrophic situations like data corruption. Such programming done for
managing the threads, their associations, execution of code, doing thread
operations is the thread programming. In the worst case scenario thread
programming will be done to adopt the threading models defined by the libraries
used. For example in some CORBA implementations just define what model is
desired and the CORBA servers will provide such a threading implementation.
Apart from application developers, systems developers and library developers
also do the thread programming to provide the abstraction for threading. For
example MFC simplifies the thread programming considerably by wrapping the
windows threading libraries. Typically
Thread programming is done using the threading APIs provided by the respective
library. But in system programming thread programming will be done at the low
level, ex implementing threading inside operating system.
Thread Library: The library that offers the APIs and runtime
environment for doing the thread programming are generally called thread
libraries. Pthreads, MFC etc are such libraries, which offer the APIs and
runtime environment for thread programming.
Single-threaded model: Allows only 1 thread of execution at a
particular time. This is generally a
restriction enforced by the threading library or application or operating
system. All the tasks will be pipelined to execute one after the other. In such a model a lot of thread related
issues like synchronization, safety etc don’t have to be considered.
Multithreaded model: Allows two or more threads of execution at a
particular time. Parameters of
multithreading model like maximum number of threads, thread scheduling model
etc will be defined by the threading library or the operating system and such
parameters may be configurable. For example an application server may offer a
thread pool that can be configured for maximum number of threads etc. Different
threads will be executing their own tasks in parallel or in a time-shared mode.
Here a lot of issues are to be taken care like thread synchronization, thread
safety etc because threads may share the common resources like data and some
activities are to be performed in a particular order only.
Parallelism: This is the threading model in which at least two threads can be
executed simultaneously. This can be seen in case of multiprocessing systems
where different threads are executed on different processors simultaneously.
Parallelism improves the processing of the system by many folds because all the
processors are utilized optimally based on the parameters like the devices and
data being used by each thread.
Concurrency: This is the threading model in which at least two threads are making
progress. A more generalized form of parallelism that can include time
slicing as a form of virtual parallelism. Concurrency may improve the
performance in some cases when there are a lot of idle times in the jobs; so
that in the idle time of one job, the other can do some work utilizing the
resources. Also concurrency boosts the performance when different activities
are to be performed with timing related dependencies. For example activity A1 is to be performed only after activity A2
and then activity A3 is performed. Here the activities are performed in an
order to fulfill their pre conditions and resources are not being wasted.
Thread States:
Threads have a life cycle in which they undergo the changes between
different states based upon the set of possible thread states defined by the
threading model. For example some
library may define some states and other library may define more number of
states based on the implementation. The most common states of the threads are
idle, running, waiting, terminated etc.
Thread Safe: A piece of code or module or a class is said to be thread safe when more
than one threads execute through it at the same time, nothing bad or unexpected
will happen, like data corruption, dead locks etc. Thread synchronization is
one of the requirements to ensure thread safety, for example, one thread may
have to wait till the other thread is finished using a variable.
Thread Aware: A piece of code or module or a class is said to be thread aware if it
is designed keeping the threading and the related issues in view.
User or Application-level threads: Threads managed by the thread library or the
application itself in user (as opposed to kernel) space using the tables built
and managed by the library/application. These carry a very little overhead
because the threads are scheduled and status also will be maintained locally (mostly
by threads themselves). But they
generally carry the disadvantages related to thread scheduling policies unless
the library is precisely optimized for overcoming such disadvantages. Some Java
implementations provide user threads also called as “Green Threads” where JVM
simulates multithreading if operating system can’t support it.
Kernel Threads: Kernel threads are those maintained by the
operating system kernel. Most of the POSIX and windows operating systems have
kernel threads by default. They have
performance penalties because kernel uses the resources for managing the tables
and for scheduling and monitoring the threads. But these are reliable and
efficient in implementing the policies.
Typically Java threading model is based on the Kernel threads of the
underlying operating system.
Daemon Threads: A
process when exiting checks for all the threads in its context. This is true
for non-daemon threads or normal threads. Process doesn’t count a daemon thread
for its state. Daemon threads are special kind of threads that the processes
wont check them before termination. Daemon threads don’t affect the process
exit process in any way.
Time Slice: Time slice is a small interval of time for which the processor will
execute the code/instructions of a particular thread.
Starving: Starving is the state of
threads where threads keep waiting for their chance of execution for long
periods of time.
Preemptive Multithreading: It is the threading model in which the each
thread will be executed for a time slice and then the next thread in the queue
will be given the next time slice irrespective of whether the executing thread
volunteers for it or not. Even if some threads misbehave or don’t give up the
control, scheduler manages to all the threads their time slices and in that
process it may stop some threads by force. Typically a controller will be used
to manage the time slices and for allocation and de allocation of time slices
to the threads.
Non-Preemptive Multithreading: In this model of threading, the thread is
responsible for giving up the control to a different thread. If the thread is
greedy or if it misbehaves, rest of the threads keep waiting indefinitely,
which is called starving.
Deadlock: Deadlock is a condition of two threads where each one has locked some
resource and both of them are waiting for the other thread’s locked resources.
Each thread waits indefinitely hoping that the other thread will release the
resource held by it. The other thread also does the same and both of them wait infinitely.
In some cases more than one thread may be involved in the deadlock situation.
Intelligent multithreading systems resolve such conflicts based on the
policy.
Parent thread: If thread T1 executes some instructions that
result in creation of the thread T2, then T1 is said to be the parent of T2.
Child thread: If thread T1 executes some instructions that result in creation of the
thread T2 then, T2 is said to be the child of T1.
Priorities: Threads are generally given
the execution time slices based on the thread priority values (as numbers,
enumerated values etc.) associated with them. When allocating the time slices
for threads, priority values of the threads generally play a major role in
deciding which thread will get the next time slice. POSIX and Windows have different model for thread priorities.
Java too has it’s own model of priorities which is a mapping to the operating
system’s threading model for priorities
POSIX threads:
Threads implemented following the POSIX standards for multithreading.
Most of the Unix and Linux operating systems follows these standards along with
their custom implementation of threading implemented as non-standard
extensions.
Java Threads: Java Virtual Machines offer the threading support in different ways. Generally Java maps the Java Threads to the
operating system’s threads but in few implementations Java simulates the
threads by itself called “green threads”.
Windows/MFC Threads: Windows threading model is different for
different operating systems. But regarding the windows server operating
systems, the threading model is similar to Unix but not as sophisticated as
that of popular Unix flavors. MFC
provides the object oriented API/wrappers for the underlying windows threading.
Even though the terminology etc may differ, the concepts are same in MFC.
COM/Apartment Threading Model: COM
has devised an abstraction for threading called as apartments. This model of
threads has evolved considering the component model and other considerations
like data marshalling, process scope etc. COM apartments are built on the
ground of windows threads, so the thread programming is still
possible/necessary for COM applications.
Multithreaded Debugging:
While debugging a typical application, the execution of the code will be
done in a sequential order step by step providing user the minute details of
execution. This is a simple case. In a multithreaded application, multiple
threads of control will be executing the code either in parallel or in
time-shared mode. If it is the case then the debugging will happen in the
context of each thread and multiple threads will be debugged with access to all
the execution details on some or all the threads. This is called Multithreaded
Debugging.
Thread Scheduling: Multiple Threads when being executed, all of
them may not be executing at the same time. In single processor machines only 1
thread will be executing at a particular time.
In such a case threads are to be given the chance to run for a time
period and then the next thread is to be given the chance. Many types of
policies are possible here based on thread priority, waiting time, dependency
over resources etc. Thread scheduling is about giving the threads the
opportunity to run at specific time slices according to the policy. Also it can
be thought as deciding when what thread has to run for what time. Who will do
the thread scheduling is specific to that threading implementation.
Synchronization: Thread Synchronization is changing the
execution of threads in relation to other threads to avoid the problems of
resource sharing or to avoid the timing conflicts. In POSIX, MFC and Java
thread programming; different techniques are available for synchronization.
Mutual exclusion lock: This is the lock that can be applied for
code or resource to ensure that only one thread executes the code or access the
resource at a particular time. So generally the threads that needs to use the
locked code or resource are made to wait till the first thread releases the
lock it has acquired. Some libraries offer this across the processes and some
limit it to a single process. The implementation differs from library to
library. Java offers these with synchronized statement and object locking
facility and it is limited to a process. They can be used in POSIX with the
help of functions and MFC has classes for this.
Mutex: A short name for “Mutual
exclusion lock”, please refer to it
Critical Section:
Critical section can be thought as a scaled down Mutex that is limited
for a process boundary. Java offers these with synchronized statement and
object locking facility. They can be used in POSIX with the help of Mutex
functions to be used with some flags to limit the scope of them to a process
and MFC has classes for this.
Counting Semaphore: This is a mechanism where memory or data is
used to lock the resources. When a thread starts using the resource then the
lock count will change. Policy regarding number of threads that can use a
resource is not standardized. It may be 1 or many depending on the
implementation. POSIX has functions for using semaphores and MFC has classes.
Java doesn’t have semaphores inbuilt.
Semaphore: A short name for “Counting semaphore”, please refer to it
Read-write locks: When multiple threads are running data
integrity needs to be protected for the shared data. Read-write locks are typically functions that allow multiple
read-only accesses to shared data, but single access for modification of that
data. Not all systems offer Read-write locks as a standardized implementation.
Condition variables: In some situations the threads may have to
be programmed to behave according on the state of the system for their
execution. Condition variables are typically the functions that block the
execution of threads based on the values of the variables or conditions that
represent the state of the system. If a Mutex is associated with it, then the
locks will be released if the thread is in blocking state.
Object Locking: Java follows a sophisticated and object
Oriented way of multithreading. In Java objects have monitors, which are
synonymous to locks. Java’s Object class has methods that tell the threads to
get a lock on the object by waiting for it and to release locks on objects when
instructed and in turn notify the threads waiting for lock on the object.
Thread Local Storage: Some
threading models have defined a feature where the memory can be allocated and
used within the scope of a thread. It can’t be shared with other threads. This
can be considered as a low level feature because many threading systems do this
behind the scenes and user doesn’t care much about it. Java, MFC, POSIX support
this feature in different ways.
Thread Group: Thread group is bundling a set of threads into a group if they share
some thing in common. Advantage is that many operations can be done on the
thread group instead of performing the operations on those threads
individually. Java has good support for thread groups.
Thread pool: This is a common practice in thread programming to optimize the create-destroy
cycle of threads. In a thread pool, threads will be created in a predefined
number initially and added to the pool. Then the clients of thread pool
requests the pool for threads to perform the jobs. If threads are available,
they will get the threads, threads will be used and they will be returned back
to the pool. If not available, they may wait for availability of threads or try
later or do something else. A lot of policies can be applied here like “what to
do if the threads are not sufficient”, “ can the number of threads in the pool
be increased if threads in the pool are not sufficient” etc.
Inter Thread Communication: Communication between threads can happen in
the form of data or control. Existing
infrastructure for control propagation like signals, events etc enable threads
to talk. In this case the mechanism for event or signal propagation is to be
proven for thread safe. Data
communication between threads can happen using the shared data resources like
pipes, shared memory, data files etc. Again such devices and their access are
to be thread safe.