...making Linux just a little more fun!

<-- prev | next -->

Inter-Process Communication - Part II

By Hiran Ramankutty

I hope everyone enjoyed Part I of this article, where we had an introductory discussion of IPC mechanisms and then continued with detailed descriptions of mechanisms such as pipes, fifos and shared memory. We shall now continue with the other IPC mechanisms such as memory mapping, semaphores, message queues, and sockets.

Memory Mapping

We have seen that fifos are basically named pipes. Similarly, mapped memory is like shared memory with a name. With this little bit of information, we can conclude that there is some sort of relationship between files in the file system and memory mapping techniques - and, in fact, there is. Simply put, the memory mapping mechanism deals with the mapping of a file in the file system to a portion of memory.

Say, for example, we have a file that contains some data in a specific format that the user knows. That file may be required by some application. We will have to write some parsing function for the application that will search specific data in the file. To be more specific, we will have the parsing function return some data to the application whenever any requests for specific data come from the application. Then, our parsing function will have to access the file for every request.

This is only acceptable if the application makes requests for data a few times. But this is time-consuming when the application sends frequent requests for data and processing. One may wonder what makes it time-consuming.

Let us look at the operations involved in reading data from a file in the file system in the hard drive. The major operations involved are as follows:

  1. invoking system calls like open, read, write and close, and
  2. I/O operations by accessing a device (the hard drive).

The two operations mentioned above consume considerable system services as compared to simply reading the data from memory. A program working in user space, when invoking a system call, transfers control from the user program to the kernel, which places it in the kernel space. That is, kernel space is consumed in the execution of the system calls. Also, accessing the disk (hard drive) frequently will be comparatively slow, since it requires I/O operations on a physical device.

The overhead caused by frequently invoking system calls and accessing disk to obtain data can be avoided by the memory mapping technique, which involves logically associating the file with a portion of the virtual memory or the program's user space. Any file I/O operations can then be replaced with simple memory access.

Linux provides system calls such as mmap and munmap for the purpose of using the memory mapping mechanism, which allows the sharing of mapped files. We have to associate the file in such a manner that the mapping is accessible to any process. We may have an application or a program running several processes, and one or more of the processes may access data from one or more files. So if we have a mapping for a file with sharing rights for any process, we can use the same mapping in all processes. See the manual page for mmap for more details.

Again, synchronization problems can arise. Will the reads happen before or after the writes? An entirely unrelated process might have memory-mapped the file and might be modifying it, unknown to us, at the same time as our process. A number of problems may come from the design of the application and the usage of the mechanism as well. So, let's take a look at the proper usage of this mechanism. (See listing1.c.txt)

The logic behind the program is simple. The code first creates a temporary file with name test_file and writes the string HELLO_STR (defined as macro) into the file. We then map the file test_file to memory using mmap. The format for this call, as described in the manual page for mmap is void * mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset). The first field, which specifies the starting address, is usually given as '0'. Of course, the starting address of the memory location from which length bytes (second field) of the file are mapped is non-zero. Instead, mmap returns the starting address. The third argument is the protection that is to be provided to the mapped memory region. This should be same as the mode with which the file was opened. We have given here PROT_READ, which gives read permissions to the memory page to which mapping has been done; the various memory protection options are further detailed in the above man page. The fourth argument is a flag which specifies the type of the mapped object. The flag MAP_FLAG specifies that the mapping is for a regular file or a character special device file. The fifth argument is a valid file descriptor of the file to be mapped, and the sixth argument is the offset, which has to be a multiple of the page size as returned by getpagesize() (see manual page for details) or is the starting offset for the file. A zero as the sixth argument means that the starting offset for reading the file is zero (i.e., the file is to be read starting from byte 0.) One notable point in the mmap man page is that the memory mapped using mmap is preserved across fork calls with the same attributes. This feature can be useful when we have more than one process accessing some file regularly.

So, just how much benefit do we gain from memory mapping versus reading a file? We can demonstrate the gains by comparing this program against the use of the open, read and close calls that we mentioned above. (See listing2.c.txt)

This program gives the output in the same format as the previous one. One can compare the performance of both the programs by running their respective executables with the time command, that is, if test1 is the executable for listing1.c then run the program as time ./test1. Do the same for listing2.c. Note the difference between the performance of the two programs. My Pentium III machine with a processor speed of 733MHz and a 128MB SD RAM showed following results:

#Time for listing1.c
real    0m20.360s
user    0m1.600s
sys     0m18.670s

and

#Time for listing2.c
real    0m24.325s
user    0m1.990s
sys     0m22.300s

It is obvious that the results may vary from machine to machine. It is also possible that a machine with a similar configuration like that of my test machine gives different results. One can also do the testing with different values of NTHLOOP, which will help you to get a better understanding of the difference in performance as the number of accesses to a file increases.

What I have shown here is a very simple scenario. In practical cases, we will often deal with files that are larger in size and have more and more complex data. The efficiency of the mapping method becomes much more apparent as files sizes go up and the complexity of parsing increases.

One thing to note is that when the process terminates the mapped region is automatically unmapped. Closing the file descriptor does not unmap the region automatically. To unmap a memory region we can use the system call munmap. The address and the offset are the two arguments needed to unmap a memory region.

Experimenting with the third and the fourth arguments to mmap can be very useful, and will promote a deeper understanding of where the use of memory mapping mechanism will be most effective.

Message Queues

Message queues are one of the three different types of System V IPC mechanisms. This mechanism enables processes to send information to each other asynchronously. The word asynchronous in the present context signifies that the sender process continues with its execution without waiting for the receiver to receive or acknowledge the information. On the other side, the receiver does not wait if no messages are there in the queue. The queue being referred to here is the queue implemented and maintained by the kernel.

Let us now take a look at the system calls associated with this mechanism.

  1. msgget: This, in a way similar to shmget, gets a message queue identifier. The format is
    int msgget(ket_t key, int msgflg);
    
    The first argument is a unique key, which can be generated by using ftok algorithm. The second argument is the flag which can be IPC_CREAT, IPC_PRIVATE, or one of the other valid possibilities (see the man page); the permissions (read and/or write) are logically ORed with the flags. msgget returns an identifier associated with the key. This identifier can be used for further processing of the message queue associated with the identifier.
  2. msgctl: This controls the operations on the message queue. The format is
    int msgctl(int msqid, int cmd, struct msqid_ds *buf);
    
    Here msqid is the message queue identifier returned by msgget. The second argument is cmd, which indicates which action is to be taken on the message queue. The third argument is a buffer of type struct msqid_ds. Each message queue has this structure associated with it; it is composed of records for queues to be identified by the kernel. This structure also defines the current status of the message queue. If one of the cmds is IPC_SET, some fields in the msqid_ds structure (pointed by the third argument) will be set to the specified values. See the man page for the details.
  3. msgsnd: This is for sending messages. The format is
    int msgsnd(int msqid, struct msgbuf *msgp, size_t msgsz, int msgflg);
    
    The first argument is the message queue identifier returned by msgget. The second argument is a structure that the calling process allocates. A call to msgsnd appends a copy of the message pointed to by msgp to the message queue identified by msqid. The third argument is the size of the message text within the msgbuf structure. The fourth argument is the flag that specify one of several actions to be taken as and when a specific situation arises.
  4. msgrcv: This is for receiving messages. The format is
    ssize_t  msgrcv(int msqid, struct msgbuf *msgp, size_t msgsz, long msgtyp, int msgflg);
    
    Besides the four arguments mentioned above for msgsnd, we also have msgtyp, which specifies the type of message requested. See the man page for the options.

Let's take a look at an example of using message queues. (See send.c.txt)

First, we create a message queue using msgget with the flag set to IPC_CREAT and permissions set to read/write. Once we get the message queue identifier, we send the messages that in the above code are read from standard input. Note the output of the command ipcs before and after running the above program. Try to interpret the output. One command that will be of use in doing this is ipcrm; see its man page for more details.

The code for receiving the sent messages is given below. (See recv.c.txt)

Note that here we are using msgget without the field IPC_CREAT. This is because we have not destroyed the message queue associated with the key 9999. This can be seen in send.c. When we call msgrcv, we get the messages that were typed into the send program. This is confirmed by writing the received messages to the standard output.

Experimenting with message queues can be fun, but you have to realize their limitations. Only repeated experimentation with this mechanism will enable one to actually learn it. Using message queues in applications has to be done only after careful consideration. Before thinking more on this, let us move on to the last of the System V IPC mechanisms, the Semaphores.

Semaphores

A semaphore is best defined as a synchronization variable among processes communicating with each other. The processes may communicate data among themselves, or it may be communicated by a shared data object. The application should then see that the shared data object is not being written simultaneously by two or more process. A simple solution will be to have a locking mechanism that will prevent simultaneous changes in the shared data object. To be more precise, if we have a semaphore variable - say, S - then give a process access to the shared data object if S is not set. Once access is granted, S is set, and is reset once operations on the shared data object are over.

The setting and resetting of the semaphore variable are commonly known as wait and post operations. The pseudo code for the wait and post operations are given below:

wait(S) {
	while(S <= 0) 
		; /* some operations. */
	S--;
}


post(S) {
	S++;
}

Linux provides three system calls that enable the use of the semaphore mechanism.

  1. semget: Counterpart of shmget and msgget, the format is
    int semget(key_t key, int nsems, int semflg);
    
    This returns a semaphore identifier for a set of nsems number of semaphores for a set, associated with the key where semflg determines permissions and action to be taken (like IPC_CREAT etc.)
  2. semop: This is for controlling operations on the semaphore. The format is
    int semop(int semid, struct sembuf *sops, unsigned nsops);
    
    The semid is an identifier that points to the semaphore set. The second field is a structure that describes the operations. The structure as defined in /usr/include/sys/sem.h is:
    struct sembuf {
    	unsigned short int sem_num;   /* semaphore number */
    	short int sem_op;             /* semaphore operation */
    	short int sem_flg;            /* operation flag */
    };
    
    The third argument nsops is the index in the array pointed to by sops. It is set to 1 if there is one semaphore in the complete set.
  3. semctl: Counterpart of shmctl and msgctl, the format is
    int semctl(int semid, int semnum, int cmd, ...);
    
    This is the administrator of the semaphore set. It performs operations specified by cmd on the nth semaphore where n is same as semnum and the semaphore set is identified by semid.
(See semaphore.c.txt)

If you take a look at the 'keyboard-hit' program that I used in my previous article to explain the use of pipes, fifos and shared memory mechanisms, the code above is essentially the same. We accept input from the keyboard and check whether the input is a digit between 0 and 9 inclusive; however, we now have two processes trying to take inputs from the keyboard. Since this can lead to conflict, we keep a lock, so that when one process is active, the other one waits.

First, we register a signal SIGINT, so that whenever the signal is raised, the signal handler process_exit is executed, bringing about the termination of the process for which the signal is registered. We then have init_semaphore, which creates a semaphore set with NUM members in the set (here it is 1). On successful creation of the semaphore set, semget returns a semaphore set identifier, which will be used for further operations on the semaphore set. We then invoke set_sem_val. Initially, this may seem useless, but it can actually be used to confirm that you can run your program with the code above and with the code that does not call set_sem_val. The manual page for semctl gives a brief description on the union semun. The union as defined is

union semun {
	int val;                  /* value for SETVAL */
	struct semid_ds *buf;     /* buffer for IPC_STAT, IPC_SET */
	unsigned short int *array;    /* array for GETALL, SETALL */
	struct seminfo *__buf;    /* buffer for IPC_INFO */
};

We set the field val in the union to 1. This signifies that access to the critical resource can be provided only once (or one process). It the value is set as 2, then only two process can hold access to the shared resource at a time.

Next, we initialize two processes that take the same action. We fork it in such a manner that only the child processes run. If semaphore operation for the lock is successful, then we invoke critical_resource; otherwise we try again (much like an infinite loop). If a lock is successful, then that process is provided access to critical_resource. Once it comes out of the critical_resource, we remove the lock. Note that we have a sleep(1) in the process, which enables the scheduling of the two process to provide enough time slices to both the process. Confusing, right? Try running the code without that sleep; the results are educational.

The critical_resource simply reads keyboard characters other than '\n' and when a digit between 0 and 9 is read, it asks the user to raise the keyboard interrupt.

This may look as simple as the concept explained in almost all textbooks on Operating Systems. But this is because we are dealing with only one member in the semaphore set. The complexity increases as the number of semaphores increases. One may also wonder, why not use a simple flag kept in shared memory instead? That would be fine if we were dealing with programs as simple as the one given as example here, but would restrict the usage and features that are available with semaphores. I suggest experimenting with this mechanism, with more complex programs where you deal with more than just one semaphore in the set.

Sockets

Sockets are the most popular of all the IPC mechanisms for a simple reason: other mechanisms do not support communication between two machines. Sockets provide a full-duplex communication channel between processes that do not necessarily run on the same machine.

Generally, sockets are associated with the concept of network communication in the form of client-server programming; a pair of processes of which one will be a client and one a server. The client process will send requests to the server, and the server process will send replies as an acknowledgment to the client requests. Of course, when creating a socket, we have to specify the type of communication that will be needed, since different modes of communication requires different protocols.

The different system calls associated with this form of IPC mechanism are given below:

  1. socket: This system call returns the descriptor, which is used for further communication. The format is
    int socket(int domain, int type, int protocol);
    
    The first argument domain specifies the communication domain. The domain here indicates which protocol family is to be used for communication and the third argument protocol specifies a particular protocol to be used with the socket. See the manual page for socket for more details on the protocols supported. The second argument type specifies the properties, features or limitations of the socket.
  2. bind: This binds the socket with a name. The format is
    int bind(int sockfd, struct sockaddr *my_addr, socklen_t addrlen);
    
    Here, the address my_addr of length addrlen is given to the socket whose descriptor is sockfd. The address is initialized before a call to bind is made.
  3. listen: listen for connections on a socket. The format is
    int listen(int s, int backlog);
    
    This tells the kernel to queue up a maximum of backlog number of connections for the socket with the descriptor s. If the queue becomes full, then appropriate action is taken based on the protocol being used.
  4. accept: accept the connection on a socket. The format is
    int accept(int s, struct sockaddr *addr, socklen_t *addrlen);
    
    This accepts connections and fills up the addr with the socket address of the client. Sockets support connection-oriented as well as connectionless communication. This system call is associated only with connection-oriented communication.
    For more details on connection-oriented and connectionless communication, please refer the book Computer Networks by Andrew S Tanenbaum.
  5. connect: initiate a connection in a socket. The format is
    int  connect(int  sockfd,  const  struct sockaddr *serv_addr, socklen_t addrlen);
    
    This is invoked by the client to connect to a socket with descriptor sockfd on a server with the address serv_addr. Once connect call succeeds, it returns zero, otherwise errno is set appropriately.
(See client.c.txt and server.c.txt)

The structure of the two programs client.c and server.c are similar; both programs initially create sockets. The sockets are created using the AF_INET family of protocols, which is same as PF_INET. The field SOCK_STREAM indicates that it will be a sequenced, reliable, full-duplex and connection-based socket. The protocol specified during creation of socket is '0'. This can also be IPPROTO_TCP for tcp sockets and IPPROTO_UDP for udp sockets.

In the case of the server, we name the sockets using the bind system call. We have a structure sockaddr and sockaddr_in. The former specifies the generic socket address, whereas the latter describes the internet socket address. The structure as one can see in the manual page for ip is

struct sockaddr_in {
	sa_family_t    sin_family; /* address family: AF_INET */
	u_int16_t      sin_port;   /* port in network byte order */
	struct in_addr  sin_addr;  /* internet address */
}
and in_addr is
struct in_addr {
	u_int32_t s_addr;	/* address in network byte order */
}

Here, the sin_family field is same as the family of protocols being used, that is, AF_INET. The sin_port field is the port number (in network byte order), which will be used by the socket. Finally, sin_addr field is the internet address (again in network byte order), which here I have given as the loopback address. The socket is bound to these details, once a successful call is made on bind.

The listen system call specifies that the created socket is willing to accept incoming connections, but with a restriction that there will be a limitation for the number of connections. In our code, the limitation is given as '1', which specifies the queue size for the number of pending connections.

An accept call returns the descriptor of the accepted connection. If the server needs to serve n > 1 number of connections, then we have to make repeated calls to accept.

After that, we can do the usual file operations (e.g., read and write) on the descriptor returned by the accept system call. Alternate read and write operations will help the server to communicate with the client.

Coming to the structure of the client program, once a socket is created, the client tries to connect to the server using the connect system call. This, again, is done after setting the internet socket address before binding the socket to a name. Once the call to connect is successful, the client program can start communicating with the server. Note: obviously, the client program can only connect to a running server. However, for testing purposes, please run the client program without running the server.

The example shown here illustrates the use of sockets for loopback communication. The client and server programs may be run on different machines by changing the internet address to the required ones. There are also several other families of protocols which support communication via sockets. Some of the interesting ones are PF_LOCAL for local communication, PF_PACKET for low-level packet interface, PF_NETLINK for communication between the kernel and the user, etc. It's a good idea to try out these options to understand the flexibility and domains (in the application level) where one can use this mechanism. The example above is for connection oriented communication. Exploring the use of sockets in an connectionless-oriented communication is left as an exercise to the readers.

Conclusion

Finally, I come to an end of my article. I have tried to present the concepts of Inter-Process Communication in a manner which will make it easier for novice programmers to write their own code whenever they come across any of the problems that I have mentioned here. This two-part article will not, of course, make you an expert in this field, but should form a good base for future exploration.

 


[BIO] I completed my B. Tech in Computer Science & Engineering from a small town called Trichur, in Kerala, God's Own Country in India. Presently I am working in Naturesoft Pvt. Ltd, Chennai, India as a Programmer. I spend my free time reading books on Linux and exploring the same. Also, I have a good appetite for Physics. My motive in life is to go forward, onward and upward.

Copyright © 2004, Hiran Ramankutty. Released under the Open Publication license unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 105 of Linux Gazette, August 2004

<-- prev | next -->
Tux