Basics Of Thread Programming – Part 2

A Quick Look Into The Jargon

Threading has a lot of concepts, theories, terms that make sense for a thread programmer. All such terms are listed here to give the basic understanding. Some of these concepts will be explained in the coming articles, and some may not be, according to their importance in thread programming.

Thread: Thread is synonymous to sequence of instructions to be executed in a particular order within the context of a process. If many such tasks (sequence of instructions) are to be executed then they will be executed by processor in time-shared mode or simultaneously on different processors (if available). If a processor has to execute different threads in time-shared mode then it will do thread switching, i.e. pause thread execution, save the state and data of that thread to continue its execution later and execute the next thread in the queue. Threads are considered to be lightweight because their scope and capabilities are limited and they are dependent to processes.

Process: Process is a high-level job or application running with the support of operating system to accomplish a set of tasks. Process is considered heavy weight because they have more responsibilities and capabilities when compared to threads. They are more independent and keeps track of lot more resources and context information. A process will have one to any number of threads to perform the tasks.

Thread Programming: Real World applications and systems will in general have multiple threads executing different jobs in parallel for optimal utilization of resources including the processing speed. So the applications have to effectively manage the threads to perform the tasks efficiently without catastrophic situations like data corruption. Such programming done for managing the threads, their associations, execution of code, doing thread operations is the thread programming. In the worst case scenario thread programming will be done to adopt the threading models defined by the libraries used. For example in some CORBA implementations just define what model is desired and the CORBA servers will provide such a threading implementation. Apart from application developers, systems developers and library developers also do the thread programming to provide the abstraction for threading. For example MFC simplifies the thread programming considerably by wrapping the windows threading libraries. Typically Thread programming is done using the threading APIs provided by the respective library. But in system programming thread programming will be done at the low level, ex implementing threading inside operating system.

Thread Library: The library that offers the APIs and runtime environment for doing the thread programming are generally called thread libraries. Pthreads, MFC etc are such libraries, which offer the APIs and runtime environment for thread programming.

Single-threaded model: Allows only 1 thread of execution at a particular time. This is generally a restriction enforced by the threading library or application or operating system. All the tasks will be pipelined to execute one after the other. In such a model a lot of thread related issues like synchronization, safety etc don’t have to be considered.

Multithreaded model: Allows two or more threads of execution at a particular time. Parameters of multithreading model like maximum number of threads, thread scheduling model etc will be defined by the threading library or the operating system and such parameters may be configurable. For example an application server may offer a thread pool that can be configured for maximum number of threads etc. Different threads will be executing their own tasks in parallel or in a time-shared mode. Here a lot of issues are to be taken care like thread synchronization, thread safety etc because threads may share the common resources like data and some activities are to be performed in a particular order only.

Parallelism: This is the threading model in which at least two threads can be executed simultaneously. This can be seen in case of multiprocessing systems where different threads are executed on different processors simultaneously. Parallelism improves the processing of the system by many folds because all the processors are utilized optimally based on the parameters like the devices and data being used by each thread.

Concurrency: This is the threading model in which at least two threads are making progress. A more generalized form of parallelism that can include time slicing as a form of virtual parallelism. Concurrency may improve the performance in some cases when there are a lot of idle times in the jobs; so that in the idle time of one job, the other can do some work utilizing the resources. Also concurrency boosts the performance when different activities are to be performed with timing related dependencies. For example activity A1 is to be performed only after activity A2 and then activity A3 is performed. Here the activities are performed in an order to fulfill their pre conditions and resources are not being wasted.

Thread States: Threads have a life cycle in which they undergo the changes between different states based upon the set of possible thread states defined by the threading model. For example some library may define some states and other library may define more number of states based on the implementation. The most common states of the threads are idle, running, waiting, terminated etc.

Thread Safe: A piece of code or module or a class is said to be thread safe when more than one threads execute through it at the same time, nothing bad or unexpected will happen, like data corruption, dead locks etc. Thread synchronization is one of the requirements to ensure thread safety, for example, one thread may have to wait till the other thread is finished using a variable.

Thread Aware: A piece of code or module or a class is said to be thread aware if it is designed keeping the threading and the related issues in view.

User or Application-level threads: Threads managed by the thread library or the application itself in user (as opposed to kernel) space using the tables built and managed by the library/application. These carry a very little overhead because the threads are scheduled and status also will be maintained locally (mostly by threads themselves). But they generally carry the disadvantages related to thread scheduling policies unless the library is precisely optimized for overcoming such disadvantages. Some Java implementations provide user threads also called as “Green Threads” where JVM simulates multithreading if operating system can’t support it.

Kernel Threads: Kernel threads are those maintained by the operating system kernel. Most of the POSIX and windows operating systems have kernel threads by default. They have performance penalties because kernel uses the resources for managing the tables and for scheduling and monitoring the threads. But these are reliable and efficient in implementing the policies. Typically Java threading model is based on the Kernel threads of the underlying operating system.

Daemon Threads: A process when exiting checks for all the threads in its context. This is true for non-daemon threads or normal threads. Process doesn’t count a daemon thread for its state. Daemon threads are special kind of threads that the processes wont check them before termination. Daemon threads don’t affect the process exit process in any way.

Time Slice: Time slice is a small interval of time for which the processor will execute the code/instructions of a particular thread.

Starving: Starving is the state of threads where threads keep waiting for their chance of execution for long periods of time.

Preemptive Multithreading: It is the threading model in which the each thread will be executed for a time slice and then the next thread in the queue will be given the next time slice irrespective of whether the executing thread volunteers for it or not. Even if some threads misbehave or don’t give up the control, scheduler manages to all the threads their time slices and in that process it may stop some threads by force. Typically a controller will be used to manage the time slices and for allocation and de allocation of time slices to the threads.

Non-Preemptive Multithreading: In this model of threading, the thread is responsible for giving up the control to a different thread. If the thread is greedy or if it misbehaves, rest of the threads keep waiting indefinitely, which is called starving.

Deadlock: Deadlock is a condition of two threads where each one has locked some resource and both of them are waiting for the other thread’s locked resources. Each thread waits indefinitely hoping that the other thread will release the resource held by it. The other thread also does the same and both of them wait infinitely. In some cases more than one thread may be involved in the deadlock situation. Intelligent multithreading systems resolve such conflicts based on the policy.

Parent thread: If thread T1 executes some instructions that result in creation of the thread T2, then T1 is said to be the parent of T2.

Child thread: If thread T1 executes some instructions that result in creation of the thread T2 then, T2 is said to be the child of T1.

Priorities: Threads are generally given the execution time slices based on the thread priority values (as numbers, enumerated values etc.) associated with them. When allocating the time slices for threads, priority values of the threads generally play a major role in deciding which thread will get the next time slice. POSIX and Windows have different model for thread priorities. Java too has it’s own model of priorities which is a mapping to the operating system’s threading model for priorities

POSIX threads: Threads implemented following the POSIX standards for multithreading. Most of the Unix and Linux operating systems follows these standards along with their custom implementation of threading implemented as non-standard extensions.

Java Threads: Java Virtual Machines offer the threading support in different ways. Generally Java maps the Java Threads to the operating system’s threads but in few implementations Java simulates the threads by itself called “green threads”.

Windows/MFC Threads: Windows threading model is different for different operating systems. But regarding the windows server operating systems, the threading model is similar to Unix but not as sophisticated as that of popular Unix flavors. MFC provides the object oriented API/wrappers for the underlying windows threading. Even though the terminology etc may differ, the concepts are same in MFC.

COM/Apartment Threading Model: COM has devised an abstraction for threading called as apartments. This model of threads has evolved considering the component model and other considerations like data marshalling, process scope etc. COM apartments are built on the ground of windows threads, so the thread programming is still possible/necessary for COM applications.

Multithreaded Debugging: While debugging a typical application, the execution of the code will be done in a sequential order step by step providing user the minute details of execution. This is a simple case. In a multithreaded application, multiple threads of control will be executing the code either in parallel or in time-shared mode. If it is the case then the debugging will happen in the context of each thread and multiple threads will be debugged with access to all the execution details on some or all the threads. This is called Multithreaded Debugging.

Thread Scheduling: Multiple Threads when being executed, all of them may not be executing at the same time. In single processor machines only 1 thread will be executing at a particular time. In such a case threads are to be given the chance to run for a time period and then the next thread is to be given the chance. Many types of policies are possible here based on thread priority, waiting time, dependency over resources etc. Thread scheduling is about giving the threads the opportunity to run at specific time slices according to the policy. Also it can be thought as deciding when what thread has to run for what time. Who will do the thread scheduling is specific to that threading implementation.

Synchronization: Thread Synchronization is changing the execution of threads in relation to other threads to avoid the problems of resource sharing or to avoid the timing conflicts. In POSIX, MFC and Java thread programming; different techniques are available for synchronization.

Mutual exclusion lock: This is the lock that can be applied for code or resource to ensure that only one thread executes the code or access the resource at a particular time. So generally the threads that needs to use the locked code or resource are made to wait till the first thread releases the lock it has acquired. Some libraries offer this across the processes and some limit it to a single process. The implementation differs from library to library. Java offers these with synchronized statement and object locking facility and it is limited to a process. They can be used in POSIX with the help of functions and MFC has classes for this.

Mutex: A short name for “Mutual exclusion lock”, please refer to it

Critical Section: Critical section can be thought as a scaled down Mutex that is limited for a process boundary. Java offers these with synchronized statement and object locking facility. They can be used in POSIX with the help of Mutex functions to be used with some flags to limit the scope of them to a process and MFC has classes for this.

Counting Semaphore: This is a mechanism where memory or data is used to lock the resources. When a thread starts using the resource then the lock count will change. Policy regarding number of threads that can use a resource is not standardized. It may be 1 or many depending on the implementation. POSIX has functions for using semaphores and MFC has classes. Java doesn’t have semaphores inbuilt.

Semaphore: A short name for “Counting semaphore”, please refer to it

Read-write locks: When multiple threads are running data integrity needs to be protected for the shared data. Read-write locks are typically functions that allow multiple read-only accesses to shared data, but single access for modification of that data. Not all systems offer Read-write locks as a standardized implementation.

Condition variables: In some situations the threads may have to be programmed to behave according on the state of the system for their execution. Condition variables are typically the functions that block the execution of threads based on the values of the variables or conditions that represent the state of the system. If a Mutex is associated with it, then the locks will be released if the thread is in blocking state.

Object Locking: Java follows a sophisticated and object Oriented way of multithreading. In Java objects have monitors, which are synonymous to locks. Java’s Object class has methods that tell the threads to get a lock on the object by waiting for it and to release locks on objects when instructed and in turn notify the threads waiting for lock on the object.

Thread Local Storage: Some threading models have defined a feature where the memory can be allocated and used within the scope of a thread. It can’t be shared with other threads. This can be considered as a low level feature because many threading systems do this behind the scenes and user doesn’t care much about it. Java, MFC, POSIX support this feature in different ways.

Thread Group: Thread group is bundling a set of threads into a group if they share some thing in common. Advantage is that many operations can be done on the thread group instead of performing the operations on those threads individually. Java has good support for thread groups.

Thread pool: This is a common practice in thread programming to optimize the create-destroy cycle of threads. In a thread pool, threads will be created in a predefined number initially and added to the pool. Then the clients of thread pool requests the pool for threads to perform the jobs. If threads are available, they will get the threads, threads will be used and they will be returned back to the pool. If not available, they may wait for availability of threads or try later or do something else. A lot of policies can be applied here like “what to do if the threads are not sufficient”, “ can the number of threads in the pool be increased if threads in the pool are not sufficient” etc.

Inter Thread Communication: Communication between threads can happen in the form of data or control. Existing infrastructure for control propagation like signals, events etc enable threads to talk. In this case the mechanism for event or signal propagation is to be proven for thread safe. Data communication between threads can happen using the shared data resources like pipes, shared memory, data files etc. Again such devices and their access are to be thread safe.