Linux Programming Notes

Page Contents

To Read

  7. Todo: async-signal safe

Processes, Process Groups & Sessions




A process has its own independent address space, isolating it from all other processes in the system. I.e., a process cannot access the memory of another processes directly. This first process in a Linux system is the init process, with a PID of 1. Each process in the system is assigned a unique integer to identify it, called the Process IDentifier, or PID.

Processes are created in Linux by fork()ing an existing process. In the beginning Linux would copy the processes in its entirety: the parent process' memory would be cloned for the new child process and the page tables for the child would be created to "point" correctly to the new memory. That's expensive as the system has to copy a potentially large amount of memory. For example, if a huge processes using, say, 1.5GB of RAM just wanted to exec a really small utility, the 1.5GB of memory is copied only to be immediately used for a process requiring minimal memory, say 5MB! What a waste of time!

That is why modern Linux now uses copy-on-write pages. This way the memory space of the parent processes is only copied to the child if the child tries to write to it. Therefore, in the above example, the parent and child will share the same memory until the child execs another program. Therefore the potentially huge memory copy is avoided. Should the child modify the shared memory, a copy of the addressed memory page(s) are created for the child, but only the modified pages need be copied, so it is again as efficient as possible.

... Under Linux, fork(2) is implemented using copy-on-write pages, so the only penalty incurred by fork(2) is the time and memory required to duplicate the parent's page tables, and to create a unique task structure for the child. However, in the bad old days a fork(2) would require making a complete copy of the caller's data space, often needlessly...

-- Linux man page for vfork

Create Daemons



What Is A Daemon?

A daemon is a Linux process that runs "in the background". This means that it is not visible to the user: it does not output anything to the screen, via a terminal, for example. It is also a direct child of init so that it is not dependent on any other process staying alive (at least directly).

File System Notifications


You can use the inotify APIs to "listen" for events relating to individual files or even directories.

You create inotify handles to which you can add watch groups to. This handle can then be used to receive events on all of the groups of files/directories that you are watching.

So lets, for example, watch a directory. You can get the example code here. I won't just splurge it all out here, we'll just look at the important bits.

To start receiving events relating to files/directory you need to create an inotify file descriptor:

int inotifyFd = inotify_init();

To tie a directory/file to this file descriptor use the following:

watchDescriptor = inotify_add_watch(inotifyFd, argv[1], IN_ALL);

In the example code I do no command line checking so the first argument to the script is the file or directory being watched. The macro IN_ALL is my own macro that is just a combination of all the types of events that can be received.

To receive events you must read() from the inotify file descriptor:

bytesRead = read(inotifyFd, buffer, sizeof(buffer))

Here is the main point to note here is that the size of the buffer is much larger than sizeof(struct inotify_event)! The reason for this is that the inotify_event strcture contains as its last member an unsized array. I think this was a C99 thing and I'm not sure it is even officially supported in C++, but not getting any errors or warnings so it looks fine.

The last element name is an unsized array, which means that the actual size of the array is the sizeof(struct inotify_event) + inotify_event.len, where the len field gives the byte-length (includes all null bytes after string).

This is why I read data into buffer. To read at least one event buffer needs to be at least sizeof(struct inotify_event) + NAME_MAX + 1 bytes in size. Note that reading events will only read entire events, it will never split an event structure across two reads, for example, therefore you can be certain to only ever read an integer number of events.

So the buffer has one or possibly more events in it. Hence once read() fills the buffer we must traverse across all the inotify_events contained within:

const char *const bufferEnd = buffer + bytesRead;
while(buffer < bufferEnd)
   struct inotify_event *iNotifyEvent = reinterpret_cast<struct inotify_event *>(buffer);
   buffer += sizeof(struct inotify_event) + iNotifyEvent->len;

This code does a reinterpret_cast, which means the buffer must be correctly aligned: if you statically allocate a buffer you must make sure it is correctly aligned. To work around this I've dynamically allocated the buffer which guarantees correct alignment.

The address of buffer[0] is the start of the first event struct. To get to the start of the next structure we forward the pointer sizeof(struct inotify_event) + iNotifyEvent->len bytes. This is the size of the structure plus the size of the file name string and all of the NULL bytes after it: the name has the terminating NULL byte but also as many extra NULL bytes required to pad the subsequent structure to the correct alignment. Thus we can increment the pointer in this way without worrying about alignment within the buffer. Happy days!

To cleanup we must remove the watch on the directory/file and then close the inotify descriptor:

inotify_rm_watch(inotifyFd, watchDescriptor);

Select, Poll, EPoll


The Linux man page says that:

select() ... allow[s] a program to monitor multiple file descriptors, waiting until one or more of the file descriptors become "ready" for some class of I/O operation ... (e.g. without blocking or a sufficiently small write) ...

Select will wait on a set of file descriptors with a timeout. You specify three sets of file descriptor to watch for three different events:

  1. File descriptors becoming ready for reading,
  2. File descriptors becoming ready for writing,
  3. File descriptors suffering exceptional conditions.

The sets of file descriptors being watched for events are described by fd_sets. A set is created and manipulated as follows:

fd_set fd_set;
FD_ZERO(&fd_set);                        // CLEAR the set
FD_SET(file_descriptor, &fd_set);        // ADD a file descriptor to the set
FD_CLR(file_descriptor, &fd_set);        // REMOVE a file descriptor to the set
if (FD_ISSET(file_descriptor, &fd_set))  // TEST if fd is part of set
   ; // file_descriptor is part of fd_set

Note that select() overwrites the fd_set variables you pass it so, if you use it in a loop remember to re-initialise the set each time!



Select Vs Poll

I found Daniel Stenberg's analysis a very good read for this.