Threads

The program that we run after successfully compiling our source code is executed via a process. Internally, these processes have their own address space (where binary counterparts to our source code exist) and memory assigned to it (as well as access to the "heap" for additional, dynamically allocated memory). As you are likely well aware of, most computers today are quite capable of handling multiple processes at once, although you can often see performance issues crop up once you reach a threshold of processes. This is because the operating system is attempting to provide resources (including CPU cycles) evenly among all the processes, which can cause delays, stutters, or lag with one or more high-demand processes. So while the upper bound on the number of active processes at anyone time may be theoretically infinite, the practical usefulness of extremely resource-starved applications approaches null.

Threads represent the tasks any given process is undertaking as part of it's design. Generally speaking, every process starts out with one thread, executing the code from within Main. You can even think of the time it takes to complete a program as the length of a finitely long string or thread. What we can do with multithreading is cut up the single, long thread, into multiple threads of varying sizes, all starting from time == 0. The goal here, is to identify parts of a program that can be done in parallel, to then reduce the execution time. It isn't always possible to apply multithreading to every one of our programs but if you're consistently searching for opportunities to apply this, you can often provide dramatic improvements to your software's potential optimal execution. This can still be ruined by computer systems without enough resources to support it in the first place, though!

It's important to note that multithreading and parallel programming are functionally similar, but executed differently. Multithreading provides ways to optimize the computational performance of a single process being executed, while parallel programming builds the source code in a fundamentally different way to allow for multiple processes (each potentially utilizing multithreading) to tackle the program all at the same time. You can think of it like one chef in a kitchen chopping vegetables, while a steak gets seared on a stovetop, while potatoes are baking in the oven (multithreading), as compared to several chefs all doing this (parallel programming). There's an entire course worth of material to consider when diving into the topic of parallel programming and high-performance computing, which we regretably can't fit into this class.

Let's first consider an example where multithreading doesn't provide an improvement to performance. I've built a piece of software that can automatically design a CS major's Plan document, based on which CSCI courses they've previously completd, what emphasis they've chosen, etc. Until these parameters are defined, though, the Plan document cannot begin to be generated, so this part of the overall process has to wait — generally speaking, this is what tells us multithreading won't optimize our application. Once these parameters are defined, though, the process by which the Plan's courses are selected is highly iterative: that is, the selection of what class to include next is dependent on which other courses have been picked previously. Therefore, the program cannot know how to do any part of this in parallel, because it has to know what came immediately before it first. There might be a way to optimize this that I haven't thought of but (1) the application doesn't take longer than second to conclude and (2) the implementation of multithreading may cause more problems than the marginal benefit it provides. So why even bother?

Next, an application that could be optimized with multithreading: chat rooms or online messaging. If you consider each of the features and functions of online messaging, you can see how there are events often happening in parallel. While looking for an appropriate gif response (don't bother telling me this is pronounced like "Jif peanut butter", you godless heathen) to what my friend said last (one thread), the chat window has been updated with another message sent (another thread), which I then quickly type out a response to in the text field (a third thread). Each of these events do not directly rely on the conclusion of any other event that's been initiated or waiting to receive additional input, so they can all run in parallel. This optimizes the execution of this single process.

Finally, we'll consider an application where parallel programming in particular lends itself, if just to solidify our understanding of the difference between multithreading and parallel programming, and when we should consider using either one. A perfect number is one where the factors of the number ( excluding the number itself) add up to that number. For example, 6 is divisible by 1, 2, and 3, which add up to 6. A quadperfect number is one where the sum of all it's factors (including the number itself), add up to four times the original number. There's a strong suspicion that there's finitely many (exactly 6) triperfect numbers, but perhaps infinitely many quadperfect numbers.

So, imagine you're on a hunt for the next quadperfect number. You write up a program that determines if the command line argument (a number) is quadperfect or not. You might even optimize it using multithreading. But you also have access to a supercomputer with hundreds of CPU cores, each capable of running multiple instances of this program on them. Running the program by itself, iteratively from 1 trillion to 2 trillion, would take a substantial amount of time (even if every execution were only took 0.01 second long), so instead you design a "foreman" program that partitions your original list of numbers evenly among the thousand+ iterations of the original "worker" program running across hundreds of CPU cores. Thus, you reduce the overall runtime by orders of magnitude by utilizing parallel programming.

To implement multithreading in our applications, we will creat Thread objects of one flavor or another, with a void method (that takes no arguments) as the argument to the Thread constructor. For example:

Thread worker = new Thread(doingStuffMethod);
worker.Start(); // Effectively runs the "doingStuffMethod" method in a separate thread


There's also two states in which threads can be executed in: Background or Foreground, which you can toggle from the property public bool IsBackground. A process is unconditionally terminated if there are no foreground threads active, depsite any active background threads (which become terminated when the process does.) Foreground threads may keep running even after the application exits or quits. Generally speaking, we want outcomes and output produced by the program that the user interacts with to be handled by foreground threads, with "work" being relegated to background threads.

You can also give Threads values to their public string Name (to assist with error logs and debugging) and public ThreadPriority Priority, where ThreadPriority is an enumeration with values Highest, AboveNormal, Normal, BelowNormal, and Lowest (in descending order). These will (theoretically) given precendence to some threads more than others.

You might also use the public static void Sleep(int milliseconds) method to suspend the execution of the current thread. Or public void Join() to pause execution (block) of the calling thread until the thread that called Join has terminated. For example, I might do something like this in Main:

// code to get all setup
Thread worker = new Thread(doingStuffMethod);
worker.Join(); // pause the Main thread and execute the "doingStuffMethod" thread first. Then resume
// code to clean everything up and exit


[You]: But wait, wouldn't that be exactly the same thing as simply calling "doingStuffMethod" directly from the Main thread?
[Me]: You must be a delight at parties.