Contents (hide)

Lab 9 (ASM) - Reading Material

Introduction

In this lab you will create a simple virus. Your virus will only need to be able to infect very simple ELF executables. Naturally, you will need to write it in strictly position-independent code. You will need to learn to manipulate the ELF header and the program headers in an ELF executable, make sure you fully understand these parts of the ELF specification. You will be using Linux system calls for all access to the files, make sure you know how to use them: open, read, write, lseek, close, and exit. In order to simplify your task we have provided some skeleton code with macros for calling these system services (see task 0b).

From the Specification

Program headers and static program loading are described in pages 2-1 up to 2-9 of the ELF file specification.

Some additional information is given below.

Virtual Memory

Modern operating systems employ a scheme called Virtual Memory. This scheme enables each process to have its own view of memory, independent of other processes. The operating system (with help from the hardware) maps process memory (virtual pages) into real memory (real pages).

This way, each process can pretend it is the only process running on the system, and trust the operating system to ensure its memory does not collide with other processes.

The Linker's Job

The introduction of virtual memory makes life much easier on the compiler and linker. The compiler generates code which thinks it is located at the beginning of memory (address 0x0), and leaves information for the linker on where corrections need be made. The linker is in charge of choosing the memory layout of the program, and can decide to which address in virtual memory the code will be loaded to.

A simple example will help illustrate the point. Look at the following code:

char *message = "hello linker";

void foo(){
    printf("%s\n",message);
}

Compiling this code to produce an object file ends up looking like this:

lab9/object.jpg

When the linker is asked to link this code, and make it an executable, it needs to decide on the memory layout first: what will the virtual address of foo() will be, and where in virtual memory will message be located?

After deciding on the memory layout, the linker needs to inform the loader how to load the executable. This is done by using program headers in the ELF format.

The Static Loader

The loader runs when the exec system call is invoked, i.e. whenever an executable file is run. The job of the Loader is to load the executable into main memory. It does so by reading the program headers located in the ELF formatted executable, and acting accordingly. Your virus should change a program header in the ELF executable file it infects so as to make the loader load your virus code when the infected ELF executable is run. Let us take a look at the program header structure in an ELF file:

typedef struct {
        Elf32_Word      p_type;         /* entry type */
        Elf32_Off       p_offset;       /* file offset */
        Elf32_Addr      p_vaddr;        /* virtual address */
        Elf32_Addr      p_paddr;        /* physical address */
        Elf32_Word      p_filesz;       /* file size */
        Elf32_Word      p_memsz;        /* memory size */
        Elf32_Word      p_flags;        /* entry flags */
        Elf32_Word      p_align;        /* memory/file alignment */
} Elf32_Phdr;

  • p_type: The type of the entry. We are only interested in PT_LOAD, which means the loader must load the appropriate data from the file into memory.
  • p_offset: The offset in the file, from which we start to load data.
  • p_vaddr: The virtual address to which we load the data.
  • p_paddr: The physical address. On x86 we can safely ignore this.
  • p_filesz: Total amount of data which need to be mapped from the file.
  • p_memsz: Total amount of data which needs to be mapped (can differ from p_filesz).
  • p_flags: The flags:
    • PF_R: map for reading
    • PF_w: map for writing
    • PF_X: map for execution
  • p_align: The alignment needed. The linker must make sure this section's virtual address equals 0 module p_align.

One remarks is in order: p_filesz can be different from p_memsz. This can happen when, for example, we need to allocate space for uninitialized variables in memory. There is no point in wasting space in the executable file for such variables (the section which holds these variables is traditionally called the ".bss" section). But in this lab, p_filesz should be the same as p_memsz.

Use this picture, which illustrates the ELF format of executable files.

Make sure that you understand this picture, which describes the structure of the memory when infected file is run.