[ Prev ][ Table of Contents ][ Front Page ][ Talkback ][ FAQ ][ Next ]
LINUX GAZETTE
...making Linux just a little more fun!
Process Tracing Using Ptrace - Part III
By Sandeep S

The basic features of ptrace were explained in Part I. In Part II we saw a small program which accessed the registers of a process and modified them so as to change the output of that process, by injecting some extra code. This time we are going to access the memory of a process. The purpose of this article is to introduce a methods for infecting binaries on runtime. There are many possible areas of use for this technique.


1. Introduction.

We are familiar with ptrace and know the techniques of attaching a process, how to trace it and finally to free it. We also have an idea about the structure of the Linux binary format - ELF.

Our plan is to fetch/modify a running binary. So we have to locate the symbols inside the binary. There we need link_map. link_map is the dynamic linker's internal structure with which it keeps track of loaded libraries and symbols within libraries.

The foramt of link_map is (from /usr/include/link.h)

struct link_map
  {
    ElfW(Addr) l_addr;      /* Base address shared object is loaded at.  */
    char *l_name;           /* Absolute file name object was found in.  */
    ElfW(Dyn) *l_ld;        /* Dynamic section of the shared object.  */
    struct link_map *l_next, *l_prev; /* Chain of loaded objects.  */
  };

A small explanation for the fields.

Link-map is a linked list, each item on list having a pointer to loaded library. What we have to do is, to follow this chain, go through every library and find our symbol. Now we have a question. Where we can find this link_map?

For every object file, there is a global offset table (GOT) which contains many details of the binary. In GOT, the second entry is dedicated for the link_map. So we get the address of link_map from GOT[1] and we go on searching our symbol.

2. Straight to code.

Now we have collected the basic information needed to access the memory. Let's start now. First of all we attach the process 'pid' for tracing. Now we go for finding out the link_map we require. You will find functions read_data, read_str etc. These are helper functions to make working with ptrace easier. Helper functions are self explaining.

The function for locating the link_map is:

struct link_map *locate_linkmap(int pid)
{
    Elf32_Ehdr *ehdr = malloc(sizeof(Elf32_Ehdr));
    Elf32_Phdr *phdr = malloc(sizeof(Elf32_Phdr));
    Elf32_Dyn *dyn = malloc(sizeof(Elf32_Dyn));
    Elf32_Word got;
    struct link_map *l = malloc(sizeof(struct link_map));
    unsigned long phdr_addr, dyn_addr, map_addr;
    
     read_data(pid, 0x08048000, ehdr, sizeof(Elf32_Ehdr));
    phdr_addr = 0x08048000 + ehdr->e_phoff;
    printf("program header at %p\n", phdr_addr);
    read_data(pid, phdr_addr, phdr, sizeof(Elf32_Phdr));

    while (phdr->p_type != PT_DYNAMIC) {
        read_data(pid, phdr_addr += sizeof(Elf32_Phdr), phdr,
                             sizeof(Elf32_Phdr));
    }
    
    read_data(pid, phdr->p_vaddr, dyn, sizeof(Elf32_Dyn));
    dyn_addr = phdr->p_vaddr;

    while (dyn->d_tag != DT_PLTGOT) {
        read_data(pid, dyn_addr += sizeof(Elf32_Dyn), dyn, sizeof(Elf32_Dyn));
    }

    got = (Elf32_Word) dyn->d_un.d_ptr;
    got += 4;           /* second GOT entry, remember? */

    read_data(pid, (unsigned long) got, &map_addr, 4);
    read_data(pid, map_addr, l, sizeof(struct link_map));
    free(phdr);
    free(ehdr);
    free(dyn);
    return l;
}

We start from the location 0x08048000 to get elf header of the process we are tracing. We get the elf header and from its fields we can get the program header. (The fields of headers were discussed in Part II.) Once we get the program header, we go on checking for the header with dynamic linking information. From the header/struct with dynamic linking information, we fetch the location of the information. Go on searching until we get the base address of global offset table.

Now we have the address of GOT with us and take the second entry of GOT (there we have link_map). From there get the address of the link_map which we require and return.

We have the struct link_map and we have to get symtab and strtab. For this, we move to l_ld field of link_map and traverse through dynamic sections until DT_SYMTAB and DT_STRTAB have been found, and finally we can seek our symbol from DT_SYMTAB. DT_SYMTAB and DT_STRTAB are the addresses of symbol table and string table respectively.

The function resolv_tables is:

void resolv_tables(int pid, struct link_map *map)
{
    Elf32_Dyn *dyn = malloc(sizeof(Elf32_Dyn));
    unsigned long addr;
    addr = (unsigned long) map->l_ld;
    read_data(pid, addr, dyn, sizeof(Elf32_Dyn));
    while (dyn->d_tag) {
        switch (dyn->d_tag) {
        case DT_HASH:
            read_data(pid, dyn->d_un.d_ptr + map->l_addr + 4, 
                       &nchains, sizeof(nchains));
            break;
        case DT_STRTAB:
            strtab = dyn->d_un.d_ptr;
            break;
        case DT_SYMTAB:
            symtab = dyn->d_un.d_ptr;
            break;
        default:
            break;
        }
        addr += sizeof(Elf32_Dyn);
        read_data(pid, addr, dyn, sizeof(Elf32_Dyn));
    }
    free(dyn);
}

What we actually do here is just reading dynamic sections one by one and checks whether the tag is DT_STRTAB or DT_SYMTAB. If yes, we can get their respective pointers and assign to strtab and symtab. Once the dynamic sectoins are over, we can stop.

Our next step is getting the value of symbol from the symbol table. For this we take every symbol table entry one by one and check it whether it's a function name. (We are interested in finding the value of a library function). If it is then it's compared with the function name given by us. If here also they match now the value of the symbol is returned.

Now we have got the value of the symbol what we actually required. What help will the value do for us? The answer depends upon the reader. As I have already stated we may use this for both good and evil purposes.

You might be thinking that everything is over. We forgot a step that we shouldn't forget - detaching the traced process. This may leave the process in a stopped state for ever and the consequences are already discussed in Part I. So our last and final step is to detach the traced process.

The program may be obtained from. Ptrace.c Almost the whole code is self explaining.

Compile it by typing

#cc Ptrace.c -o symtrace

Now we want to test the program. Run some process in some other console, come back and type. (Here my test program is emacs and the symbol I give is strcpy). You may trace any program that is traceable instead of emacs and any symbol you want to inspect.

#./symtrace `ps ax | grep 'emacs' | cut -f 2 -d " "` strcpy
and watch what is going on.

3. Conclusion.

So, we come to the end of a series of three articles which has gone through the basic programming with ptrace. Once you have understood the basic concept it is not difficult to make steps by your own. More details on ptrace and elf are available at www.phrack.org. One more thing I have to write is that, we reached here without even mentioning a major topic. One major feature of ptrace is its play with system calls. In User Mode Linux, this feature is used in a large scale. I am busy with my classes and final year project, and I promise, if time permits we will continue this series and then we will have a look at those features of ptrace.

All Suggestions, Criticisms, Contributions etc. are welcome. You can contact me at busybox@sancharnet.in


Copyright © 2002, Sandeep S. Copying license http://www.linuxgazette.net/copying.html
Published in Issue 85 of Linux Gazette, December 2002

[ Prev ][ Table of Contents ][ Front Page ][ Talkback ][ FAQ ][ Next ]