Unix Processes

The Program / Process Distinction

A program can be any of several things:

A file containing instructions and data used to initialize a process;
An algorithm represented in the source code of some programming language, probably stored in a file;

A process, briefly, is a running program. Processes are resources that are managed by the operating system (OS). We use system calls to create and terminate processes, query their state, and for interprocess communication. The Unix process model is very simple; compared to many other OS's, it's quite easy to manipulate Unix process, and the processes themselves are a very lightweight resource (even a very small (PC-class) Unix machine can run hundreds of them at a time).

The Components of a Process

A process has three components:

the text segment
the user data segment (on modern Unix systems divided into the initialized and uninitialized (called bss) data segments)
the system data segment

The text segment contains the instructions and is sharable between multiple processes. The user data segment contains the data structures of the process; the process can request more space from the OS and can return space to the OS. The system data segment is not in the process's address space, but in the kernel: it consists of resources managed by the OS on the process's behalf, such as file pointers, file descriptors, the environment, etc. The process can only manipulate the system data via system calls.

You can see the initial sizes of the various components of a process by examining its program file. The Unix size program takes a filename as a comman line argument and prints out the size of the text segment, the initialized user data segment (labelled "data"), and the uninitialized user data segment (labelled "bss"), along with their sums in hex and decimal. The total amount of space taken up by N running copies of a process is 1*text + N*data + N*bss.

$ file /bin/date; ls -l /bin/date; size /bin/date
/bin/date:      sparc pure dynamically linked executable
-rwxr-xr-x  1 root     staff        7472 Jul 23  1992 /bin/date*
text    data    bss     dec     hex
5944    1496    80      7520    1d60
file /local/bin/xmosaic ; ls -l /local/bin/xmosaic ; size /local/bin/xmosaic
/local/bin/xmosaic:     sparc demand paged dynamically linked executable
-rwxr-xr-x  2 chas     sourcers  2514944 Jul 11 20:04 /local/bin/xmosaic*
text    data    bss     dec     hex
2269184 245760  377408  2892352 2c2240

Since a process can change the size of its user data segment as it runs, the numbers from the size program only specify the initial memory use for a process. You can examine the real memory use of any process with the sps program. Here's the output of the Unix sps command for a copy of xmosaic which the OS has swapped out (since I'm not running it at the moment):

Ty User     Status Fl Nice Prv  Shr  Res %M  Time Child %C Proc# Command
   keith    select        2344+2704    0  0  19.4        0  1742 xmosaic

The Prv (for "private") field represents the size of the user data segment; the Shr field (for "shared") represents the size of the text segment. These values don't match the output of size exactly because size describes the sizes as stored in the program file. The Res field (for "resident") represents how much memory the process is actually taking up in real memory (as opposed to virtual memory), and the %M field represents the percent of real memory allocated to the process; these values are zero because the process is swapped out. All these numbers (except %M) are in kilobytes.

I can change the resident memory values (and maybe the Prv value as well) by making xmosaic fetch a page of hypertext:

Ty User     Status Fl Nice Prv  Shr  Res %M  Time Child %C Proc# Command
   keith    SELECT        2940+2704 2284  5  23.7       15  1742 xmosaic

Note that this action swapped in the process, and that it took up 2,284K of real memory, or 5% of the real memory available on my machine.

The `fork` and `exec` System Calls

The exec system call is the only way a process begins execution; the fork system call is the only way to create a new process. These two system calls are somewhat puzzling when first encountered, as it will seem that they ought to be combined into one: each, taken on its own, may seem almost pointless. But the separation of the two is a key idea in Unix and simplifies programming with processes tremendously. Note: the exec system call comes in several flavors, all fundamentally the same; the most commonly used flavor is called execl.

When a process executes the exec system call, the kernel replaces the the code and user data segments of the running process with the code and data segments from a program stored in a file. The process remains the same: no new kernel data structures are allocated, and the process has the same process-ID as before. The process's system data segment is also almost the same as it was before the exec; the only things that change are:

Signals that had handlers are reset to their defaults.
The user-ID and group-ID of the process may be different, if the setuid or setgid bits were on in the inode of the program file.
Profiling is turned off.

But if exec only reinitializes an existing process, then it provides no way to create a new process. For this, we need fork.

The fork system call creates a new process, but it does not initialize it from a new program. When a process executes fork, the kernel clones the running process to make the new one. The text segment, user data segment and system data segment are all copied almost exactly: execution continues in both the old and new process from the same exact point!

The crucial difference is, the fork system call itself returns different values in parent and child. This allows the cloned code to execute differently in the two processes.

With exec, the new program used the orginal, mostly-unchanged, system data segment of the original program. WIth fork, the child process gets a nearly identical copy of the parent's system data segment. The only things that are different are:

The process-ID and parent-process-ID.
Process execution times are set to zero, since the child is a new process.

fork and exec are usually used in tandem to allow a process to create new processes. The classic model is as follows:

The parent forks a new child process.
The parent goes to sleep, waiting for its child to finish execution.
The child, meanwhile, uses exec to run a new program.

The overall effect is of the parent forking off a new program. The different return values from fork are what allow the code to be written to allow for different actions in parent and child; in sketchy C:

/* ignoring errors ... */
if (fork() == 0) {
    /* i am the child */
    exec("new program");
} else {
    /* i am the parent */
    wait();
}

Since the parent and child are nearly identical, they can communicate with one another via a pipe (remember, the child gets a copy of all the parent's open file descriptors). So instead of just waiting for the child to terminate, the parent and child may act together.

The `exit` System Call

The exit system call terminates the execution of a process. All open file descriptors are closed. If the exiting process had any children, they no longer have a parent and so the kernel sets their parent-process-IDs to 1, which is the process-ID of the init process. These child processes are called orphans and init is said to have adopted them.

The exit system call retuns an integer value to the kernel, which makes this value available to the parent process via the wait system call. This value includes the one-byte exit status of the child, and so can be used to pass a very small message back to the parent.

The `wait` System Call

The wait system call causes a process to sleep until a child terminates. If there are no children, wait returns immediately. wait returns a small amount of info to the parent, including the child's process-ID (so the parent knows which child terminated), exit status, reason for termination, and some resource use statistics.

A process can terminate for either of two reasons:

it executes the exit system call;
it is killed by a signal;

these cases (including the identity of the signal) can be distinguished by the parent via the return value from wait.

When a process terminates without its parent having waited for it, the process is called a zombie. It's good practice to wait for your children, but not required.

Getting and Setting IDs

pid
id process
id process parent

The pid command returns, as a decimal number, the process id of the current process. This is useful for creating unique temp file names:

set fp [open "/tmp/foo[pid]" w]

The Extended Tcl id command has several subcommands for accessing process, parent process, user and group ids.

The `nice` System Call

nice ?increment?

Every process has a priority that determines how much of the CPU it is allowed to hog. This priority is called the nice value, since if you lower your own priority you are being nice to other processes. The nice vaule is an integer ranging (traditionally) from -19 to 19. Higher nice values are nicer (to other processes). With no argument, the nice command returns the current nice value; a numeric argument is added to the current nice value. Only root processes can decremnt their nice value.

The `execl` Command

execl ?-argv0 argv0? prog ?arglist?

Extended Tcl's execl command implements the Unix execl system call (actually, it implements execlp): it reinitializes the running Tcl process with a new program. execl only returns in the event of an error, so the usual use is like this:

execl /bin/date
puts stderr "execl failed!"
exit 1

If prog is given with a relative pathname, execl will search your PATH for it. Arguments to the new program can also be specified:

execl cal 7 1959

and the -argv0 option allows you to change the name under which the program is called.

The `fork` Command

fork

The Extended Tcl fork command implements the Unix fork system call. It takes no arguments, and forks a new process as described above. fork returns 0 in the child, and non-zero in the parent. The non-zero value is in fact the process-ID of the child. Any open file descriptors that have been written to in the parent should probably be flushed (with flush) before forking. fork can return a Tcl error if there are not enough resources to fork a new process.

if {[fork] == 0} {
    # i am the child
    execl /bin/date
    puts stderr "execl failed!"
    exit 1
}
# i am the parent
wait

Remember, the child can never execute the parent code because execl won't return (unless there's an error).

Note carefully the output in this annotated version of the above:

if {[fork] == 0} {
    # i am the child
    puts "Child: my pid is [pid]"
    execl /bin/date
    puts stderr "execl failed!"
    exit 1
}
# i am the parent
puts "Parent: [wait]"
-------------------------------
Child: my pid is 2804
Tue Aug 23 12:23:41 CDT 1994
Parent: 2804 EXIT 0
-------------------------------

Note that the output of date appears because after the execl, the date program's standard output is the same as the Tcl child process's and the Tcl parent process's standard output.

The `wait` Command

wait ?-nohang? ?-untraced? ?-pgroup? ?pid?

The Extended Tcl wait command, when invoked with no arguments, sleeps until a child terminates (if there are no children, a Tcl error results). wait then returns a list of three elements. The first element is always the process-ID of the child that terminated. The second and third elements differ depending on the reason for termination.

If the child exited, the second element is the string EXIT and the third element is the exit status of the child. If the child was killed by a signal, the second element is the string SIG and the third element is the signal name.

The `pipe` Command

pipe ?read write?

The Extended Tcl pipe command creates a pipe. A pipe is a pair of file dexcriptors, one for reading and one for writing, connected through a buffer in the kernel. Remember that a pipe automatically synchronizes two processes, making for easy interprocess communication.

To creat a pipe, you simply invoke the pipe command with two variable names to hold the read and write file descriptors. Any data written to the write file descriptor is available for reading on the read file descriptor. Don't forget to flush!

pipe r w
puts $w Hey!; flush $w
gets $r
=> Hey!

Remember that the size of the pipe buffers in the kernel is small (on the order of 4K), and a write that's bigger than the buffer size will fill the buffer and the cause the writing process to block until the reading process reads some of the contents. In a single process, this is a problem! Try writing a hundred K or so of data in place of Hey! in the above and you'll see the problem. This problem goes away when the pipe is connecting two processes.

Although it works, a pipe is an IPC mechanism and so is not very useful in a single process. But it's just what you need to communicate between a parent and child. The parent creates the pipe, and since file descriptors are shared with the child, they can send messages back and forth across the pipe. Note that for one-way communication, each process only needs one of the two pipe file descriptors, so the usual practice is for each to close the appropriate one (just because file descriptors are a finite resource (per process)). Here's an example of one-way communication where the parent sends the child a message:

pipe read write
if {[fork] == 0} {
    # child
    close $write
    gets $read message
    puts "Child: got $message"
} else {
    # parent
    close $read
    puts $write "Hey, child!"
    flush $write
}

Remember, you should always flush after each write to a pipe (or turn off buffering). In the above, the parent can write any size message to the child.

A single pipe has two ends, and so can be used for two way communication, but a tricky problem exists, so we normally use two separate pipes; careful naming of the file descriptors minimizes confusion:

pipe childread parentwrite
pipe parentread childwrite
if {[fork] == 0} {
    # child
    close $parentwrite
    close $parentread
    gets $childread message
    puts $childwrite "Child: got $message"
    flush $childwrite
} else {
    # parent
    close $childread
    close $childwrite
    puts $parentwrite "Hey, child!"
    flush $parentwrite
    gets $parentread message
    puts "Parent: Child sez: \"$message\""
}

The problem with a single pipe arises because a single buffer is being used for both directions. Suppose the parent writes to the pipe; the child is supposed to read the data in the buffer, then send a message back to the parent in the same buffer. The problem is, after the parent writes, it does a read on the pipe; if the parent beats the child to the read, the parent will read its own data back!

The `dup` Command

dup fileId ?targetFileId?

The dup implements the dup system call, which duplicates one desired open file descriptor into another. This can be used connect standard input or standard output to a pipe. This sample code shows how a parent process can fork the standard Unix sort command and then feed it data to be sorted. A simple extension would allow the child to write the results back to the parent.

pipe read write
if {[fork] == 0} {
    # child
    dup $read stdin
    close $write
    execl sort
    puts stderr "Can't execl!"
    exit 1
}
close $read
foreach word [list zoo ylem quark flake dog aarhus] {
    puts $write $word
    flush $write
}
close $write
wait

The `exec` Command

The standard Tcl exec command encapsulates several complex yet common combinations of fork, execl, and pipe into one handy command.

Unix Processes

The Program / Process Distinction

The Components of a Process

The fork and exec System Calls

The exit System Call

The wait System Call

Getting and Setting IDs

The nice System Call

The execl Command

The fork Command

The wait Command

The pipe Command

The dup Command

The exec Command