The program that we run after successfully compiling our source code is executed via a process. Internally,
these processes have their own address space (where binary counterparts to our source code exist) and memory
assigned to it (as well as access to the "heap" for additional, dynamically allocated memory). As you are
likely well aware of, most computers today are quite capable of handling multiple processes at once, although
you can often see performance issues crop up once you reach a threshold of processes. This is because the
operating system is attempting to provide resources (including CPU cycles) evenly among all the processes,
which can cause delays, stutters, or lag with one or more high-demand processes. So while the upper bound
on the number of active processes at anyone time may be theoretically infinite, the practical usefulness of
extremely resource-starved applications approaches null.
Threads represent the tasks any given process is undertaking as part of it's
design. Generally speaking, every process starts out with one thread, executing the code from within Main.
You can even think of the time it takes to complete a program as the length of a finitely long string or
thread. What we can do with multithreading is cut up the single, long thread, into multiple threads of
varying sizes, all starting from time == 0. The goal here, is to identify parts of a program that can be done
in parallel, to then reduce the execution time. It isn't always possible to apply multithreading to every
one of our programs but if you're consistently searching for opportunities to apply this, you can often
provide dramatic improvements to your software's
potential optimal execution. This can still be
ruined by computer systems without enough resources to support it in the first place, though!
It's important to note that multithreading and parallel programming are functionally similar, but executed
differently. Multithreading provides ways to optimize the computational performance of a single process being
executed, while parallel programming builds the source code in a fundamentally different way to allow for
multiple processes (each potentially utilizing multithreading) to tackle the program all at the same time.
You can think of it like one chef in a kitchen chopping vegetables, while a steak gets seared on a stovetop,
while potatoes are baking in the oven (multithreading), as compared to
several chefs all doing this
(parallel programming). There's an entire course worth of material to consider when diving into the topic of
parallel programming and high-performance computing, which we regretably can't fit into this class.
Let's first consider an example where multithreading doesn't provide an improvement to performance. I've
built a piece of software that can automatically design a CS major's Plan document, based on which CSCI
courses they've previously completd, what emphasis they've chosen, etc. Until these parameters are defined,
though, the Plan document cannot begin to be generated, so this part of the overall process has to
wait — generally speaking, this is what tells us multithreading won't optimize our application.
Once these parameters are defined, though, the process by which the Plan's courses are selected is highly
iterative: that is, the selection of what class to include next is dependent on which other courses have been
picked previously. Therefore, the program cannot know how to do any part of this in parallel, because it has
to know what came immediately before it first. There might be a way to optimize this that I haven't thought
of but (1) the application doesn't take longer than second to conclude and (2) the implementation of
multithreading may cause more problems than the marginal benefit it provides. So why even bother?
Next, an application that could be optimized with multithreading: chat rooms or online messaging. If you
consider each of the features and functions of online messaging, you can see how there are events often
happening in parallel. While looking for an appropriate gif response (don't bother telling me this is
pronounced like "
Jif peanut butter", you godless heathen) to what my friend said last (one thread),
the chat window has been updated with another message sent (another thread), which I then quickly type out a
response to in the text field (a third thread). Each of these events do not directly rely on the conclusion
of any other event that's been initiated or waiting to receive additional input, so they can all run in
parallel. This optimizes the execution of this single process.
Finally, we'll consider an application where
parallel programming in particular lends itself, if
just to solidify our understanding of the difference between multithreading and parallel programming, and
when we should consider using either one. A perfect number is one where the factors of the number (
excluding the number itself) add up to that number. For example, 6 is divisible by 1, 2, and 3, which add up
to 6. A
quadperfect number
is one where the sum of all it's factors (including the number itself), add up to four times the original
number. There's a strong suspicion that there's finitely many (exactly 6) triperfect numbers, but perhaps
infinitely many quadperfect numbers.
So, imagine you're on a hunt for the next quadperfect number. You write up a program that determines if the
command line argument (a number) is quadperfect or not. You might even optimize it using multithreading.
But you also have access to a supercomputer with hundreds of CPU cores, each capable of running multiple
instances of this program on them. Running the program by itself, iteratively from 1 trillion to 2 trillion,
would take a substantial amount of time (even if every execution were only took 0.01 second long), so instead
you design a "foreman" program that partitions your original list of numbers evenly among the thousand+
iterations of the original "worker" program running across hundreds of CPU cores. Thus, you reduce the overall
runtime by orders of magnitude by utilizing parallel programming.
To implement multithreading in our applications, we will creat
Thread objects of
one flavor or another, with a void method (that takes no arguments) as the argument to the
Thread constructor. For example:
Thread worker = new Thread(doingStuffMethod);
worker.Start(); // Effectively runs the "doingStuffMethod" method in a separate thread
There's also two states in which threads can be executed in: Background or Foreground, which you can toggle
from the property
public bool IsBackground. A process is unconditionally
terminated if there are no foreground threads active, depsite any active background threads (which become
terminated when the process does.) Foreground threads may keep running even after the application exits
or quits. Generally speaking, we want outcomes and output produced by the program that the user interacts
with to be handled by foreground threads, with "work" being relegated to background threads.
You can also give
Threads values to their
public string Name
(to assist with error logs and debugging) and
public ThreadPriority Priority,
where
ThreadPriority is an enumeration with values Highest, AboveNormal, Normal,
BelowNormal, and Lowest (in descending order). These will (theoretically) given precendence to some threads
more than others.
You might also use the
public static void Sleep(int milliseconds) method to
suspend the execution of the current thread. Or
public void Join() to pause
execution (block) of the calling thread until the thread that called
Join has
terminated. For example, I might do something like this in Main:
// code to get all setup
Thread worker = new Thread(doingStuffMethod);
worker.Join(); // pause the Main thread and execute the "doingStuffMethod" thread first. Then resume
// code to clean everything up and exit
[You]: But wait, wouldn't that be exactly the same thing as simply calling "doingStuffMethod" directly from
the Main thread?
[Me]: You must be a delight at parties.