CSE 643: Computer Security (Fall 2019)

Basic Information

Lab 1: Environment Variables and Set-UID Programs

User id & privilege

In Linux, each user is associated with a unique user id. The root user (who has the greatest privilege) has user id 0. The permission granting mechanism verify a user’s id, instead of name when giving out permissions. Therefore, if one can somehow change his/her user id to 0, he/she can execute the rights of the root.

In Linux, each running process has three types of user ids, which is introduced here in detail. We are only interested in real user id and effective user id. Real user id is the id of owner of this process; effective user id is the id of the user whose rights will be given to the process. Usually, real user id is equal to effective user id.

One can finds out his/her own user id by typing id in the terminal.

[09/02/19]seed@VM:~$ id
uid=1000(seed) gid=1000(seed) groups=1000(seed),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),113(lpadmin),128(sambashare)

Linux file permission

Linux uses 10 bits to describe file permissions. If we type command ls -l in terminal, we can get the permission information of the files in current folder.

[09/02/19]seed@VM:~$ ls -l
total 1688
drwxrwxr-x 4 seed seed    4096 May  1  2018 android
drwxrwxr-x 2 seed seed    4096 Jan 14  2018 bin
drwxrwxr-x 2 seed seed    4096 Jan 14  2018 Customization
drwxr-xr-x 3 seed seed    4096 Aug 31 10:37 Desktop
drwxr-xr-x 2 seed seed    4096 Jul 25  2017 Documents
drwxr-xr-x 2 seed seed    4096 May  9  2018 Downloads
-rw-r--r-- 1 seed seed    8980 Jul 25  2017 examples.desktop
-rw-rw-r-- 1 seed seed 1661676 Jan  2  2019 get-pip.py
drwxrwxr-x 3 seed seed    4096 May  9  2018 lib
drwxr-xr-x 2 seed seed    4096 Jul 25  2017 Music
drwxr-xr-x 3 seed seed    4096 Jan 14  2018 Pictures
drwxr-xr-x 2 seed seed    4096 Jul 25  2017 Public
drwxrwxr-x 4 seed seed    4096 May  9  2018 source
drwxr-xr-x 2 seed seed    4096 Jul 25  2017 Templates
drwxr-xr-x 2 seed seed    4096 Jul 25  2017 Videos

In the output:

The first column indicates permission. We can split them into four segments as follows.

d rwx rwx rwx
if directory permission of owner permission of owner’s group permission of others

Notice that the meaning of each bit is different for files and directories. For normal files, “r” indicates the right to read; “w” indicates the right to write and “x” indicates the right to execute a file. For directories, “r” indicates the right to list its contents; “w” indicates the right to modify the content inside and “x” indicates the right to cd into the directory. We can compress the last nine file permission bits into three decimal digits. The way of doing it is to treat “rwx” as a three-bit binary representation of a decimal number. For example, “rwxrwxr-x” can be expressed as “775”.

There are also three other special permission bits that are not displayed by default. This page describes all three of them in detail. Here, we are only interested in the Set-UID bit. For a program, if this bit is set, then when someone runs the program, its effective user id would equal to its owner’s, instead of its runner’s (i.e. the effective user id is set to be the owner’s user id, instead of assigning real user id).

In order to make a program Set-UID program, we can use the following command:

[09/02/19]seed@VM:~$ chmod 4755 progname

The “4” preceding normal file permissions indicates toggling the Set-UID bit only (in this demonstration, the command also changes the file permission to 755). When we run ls in a folder, Set-UID programs will be highlighted with red background. For example:

[09/02/19]seed@VM:~$ cd /bin
[09/02/19]seed@VM:/bin$ ls
bash             date           hostname    mountpoint        ntfswipe    sleep                           uname
bash_shellshock  dd             ip          mt                open        ss                              uncompress
bunzip2          df             journalctl  mt-gnu            openvt      static-sh                       unicode_start
busybox          dir            kbd_mode    mv                pidof       stty                            vdir
bzcat            dmesg          kill        nano              ping        su                              wdctl
bzcmp            dnsdomainname  kmod        nc                ping6       sync                            which
bzdiff           domainname     less        nc.openbsd        plymouth    systemctl                       whiptail
bzegrep          dumpkeys       lessecho    netcat            ps          systemd                         ypdomainname
bzexe            echo           lessfile    netstat           pwd         systemd-ask-password            zcat
bzfgrep          ed             lesskey     networkctl        rbash       systemd-escape                  zcmp
bzgrep           efibootmgr     lesspipe    nisdomainname     readlink    systemd-hwdb                    zdiff
bzip2            egrep          ln          ntfs-3g           red         systemd-inhibit                 zegrep
bzip2recover     false          loadkeys    ntfs-3g.probe     rm          systemd-machine-id-setup        zfgrep
bzless           fgconsole      login       ntfs-3g.secaudit  rmdir       systemd-notify                  zforce
bzmore           fgrep          loginctl    ntfs-3g.usermap   rnano       systemd-tmpfiles                zgrep
cat              findmnt        lowntfs-3g  ntfscat           run-parts   systemd-tty-ask-password-agent  zless
chacl            fuser          ls          ntfscluster       rzsh        tailf                           zmore
chgrp            fusermount     lsblk       ntfscmp           sed         tar                             znew
chmod            getfacl        lsmod       ntfsfallocate     setfacl     tempfile                        zsh
chown            grep           mkdir       ntfsfix           setfont     touch                           zsh5
chvt             gunzip         mknod       ntfsinfo          setupcon    true
cp               gzexe          mktemp      ntfsls            sh          udevadm
cpio             gzip           more        ntfsmove          sh_backup   ulockmgr_server
dash             hciconfig      mount       ntfstruncate      sh.distrib  umount

When the file permission details are displayed, it can be seen that an “s” bit is present in place of the usual “x” bit.

[09/02/19]seed@VM:/bin$ ls -l mount
-rwsr-xr-x 1 root root 34812 Dec 16  2016 mount

Password dilemma & Set-UID programs

In Linux, the password entries of each user are stored in /etc/shadow, which is owned by root user with permission 640. That is to say, normal users cannot read or modify this file. However, normal users should be able to change their own passwords freely. But if we allow them to access /etc/shadow, they may break the security of the system by changing other user’s password entry. This is the password dilemma: if we enable users to access the password file, they may compromise system security; but if no access is allowed, they are unable to change their own passwords.

Linux resolves this dilemma by implementing Set-UID mechanism, which was initially invented by Dennis Ritchie at Bell Labs [wiki]. As it is mentioned above, Set-UID programs would have their owner’s privilege. In Linux, the password utility passwd is a Set-UID program owned by root. That is, when a normal user run the program, the program itself would have root privilege, which means it can modify the password file (also any other file on the system). But (ideally) this does not bring any security hazard because the passwd program itself is programmed to handle the password file with grace and the password file only. That is, it only updates the password file when it finds out the the user provides the correct old password, and it would only change the user’s entry, leaving other ones untouched. It will never read or modify any other unrelated files.

Potential attacks on Set-UID programs

Because Set-UID programs allows a normal user to gain (limited) root privilege, attacks on them become very appealing. We can first analyze the attack surface of Set-UID programs. The attack surface of a software environment is the sum of the different points (the “attack vectors”) where an unauthorized user (the “attacker”) can try to enter data to or extract data from an environment. For Set-UID programs, the attack surface is the sum of places where the program gets its inputs.

User inputs

If a Set-UID program fails to sanitize inputs from a user, it may create a security loophole. In Linux, the user information is stored in /etc/passwd, where each line represents one user. Two sample lines are shown as below. There are multiple fields in each line, which are separated by colons. The detailed explanation of each filed can be seen here. Now we are only interesed in the last field, which specifies the default shell program of the user, i.e. the first program that would be run after logging in.

[09/02/19]seed@VM:~$ sudo cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
seed:x:1000:1000:seed,,,:/home/seed:/bin/bash

The Linux provides a utility called chsh to modify the default shell program. If it fails to sanitize user input properly, a user can provide a new shell program string with a line break character inside, effectively creating a new user in the system. By assigning 0 to user id, the normal user can even plat a new root account in the system.

System inputs

Sometimes even system inputs can be the point of attack on Set-UID programs. Suppose a privileged Set-UID program needs to write some file in /tmp/abc. Because the /tmp folder is globally accessible, one can create a symlink in /tmp with the name abc, but actually points to /etc/shadow. When the Set-UID program writes its output, it actually destroys the password file.

Environment variables

If a Set-UID program needs to read values of environment variables directly, it is very import to sanitize the values. However, threats from environment variables can occur even when the Set-UID program is not reading them directly.

  1. When invoking other programs.

    Problems can occur when a Set-UID program is calling external procedures with system(). For example, a program can call system(date) to show the date. What this command does is to tell the bash program to search for a program named date in directories specified by PATH environment variable. However, if the user changes the PATH variable to a folder that includes his/her own version of date, the wrong version of date gets invoked instead. If the program happends to be a privileged Set-UID program, then the wrong date program gets executed with root privilege.

  2. During dynamic linking.

    In Linux, the dynamic linker finds dynamic libraries in folders specified by LD_LIBRARY_PATH and LD_PRELOAD. If one changes these two environment variables to directories containing their own malicious versions of dynamic libraries, they can change the behavior of the program.

Capability leaking

It is very common for a privileged program to relinquish the privileges after completing certain operations. A common mistake made by programmers is capability leaking. It happens when the programmer forgets to release some resource gained when the program still owns the privileges, even tough he/she had explicitly downgraded the program’s rights. For example, consider the pseudocode below.

At the first glimpse, everything looks fine. However, that is not the case. When calling fork, the child process gets a copy of all parent process’s file handles, which includes the one that has access to the important file. Although the programmer has explicitly relinquished the privileges before forking, indicating that he/she does not want the child process to access the file, the attempt obviously failed. Therefore, it is important for programmers to relinquish privileged resource as soon as possible, i.e. in this case, to close the file before forking.

Invoking other programs

We have discussed using system() to call an external procedure above. Now this concept is to be elaborated. When calling an external procedure with system(command), it actually calls /bin/sh -c command. For a privileged program, this can be very sloppy. Consider a program that scans virus on any given file/directory on the system. Because it has to be able to open any file on the system, it is a privileged Set-UID program. Suppose this program takes path as input from users, and calls system("ls " + path) to check the content of the folder, a normal user can easily run a root shell by passing "/abc;/bin/sh" as the path. Here, /abc is just a path name that is not relevant. What matters is the semicolon, which acts as a separator between two commands in shell syntax, and /bin/sh, which invokes the shell. Together with the environment variable vulnerability introduced above, it should be noted that system() is a very dangerous way of invoking other programs.

A safer approach is to use execve function. It takes three arguments: the filename of the executable; the arguments passed to the executable and the environment variables defined during this call. Instead of using /bin/sh -c command, this function uses an internal system call to invoke the program, which eliminates many potential safety hazards. In this case, "ls" would be passed as the first argument, and the path would be passed as the second argument. If one tries to pass "/abc;/bin/sh" to the virus scanner program, this entire string will be recognized as a path argument, which stops the attack. The ability to specify environment variables also provides stronger security.

The principle of isolation

The difference between system() and execve() reflects the principle of isolation in computer security, which states that data should be clearly isolated from code. In the virus scanner example above, "ls" is code, which specifies what program we would like to call and should not be changed; path is data, which determines which folder to scan. The system() approach violates this principle by blending data the code together into one string, which introduces many security problems.

The principle of least privilege

The principle of least privilege is introduced by J.H. Saltzer and M.D. Schroeder in 1975. It states that every program and every privileged user of the system should operate using the least amount of privileges necessary to complete the job. In Linux, privileged Set-UID programs violates this principle because they have the power of the root user, which has every possible privilege. Modern operating systems like Android provide find-grained privileges. When we open the application settings page on Android phones, we can set the privilege for a program to access location, camera, microphone, etc.

Another implication derived from this principle is that if a privileged program does not need some privileges for part of its execution, it should disable the privileges either temporarily or permanently, depending on whether this privilege needs to be reused later on. For example, if we close the privileged file handle as soon as possible in the capability leaking case, then this problem will not occur.

Lab 2: Buffer-Overflow Vulnerability

Process memory layout

Thanks to virtual memory, each Linux process has its own address space. The layout of the address space can be seen as below.

Process memory layout

There are mainly five segments:

  1. Text segment: stores the executable code of the program. This segment is usually read-only.
  2. Data segment: stores (initialized) global and static variables.
  3. BSS segment: stores uninitialized global and static variables.
  4. Heap: provides space for dynamic memory allocation. The heap grows from low address to high address.
  5. Stack: stores local variables and maintains the structure of the program. The stack grows from high address to low address.

x86 stack layout

Each function has its corresponding stack frame, which stores its local variables and other important status. When a new function is called, a new stack frame is created, and the stack grows (from high address to low address).

void bar(){
    // <--here
}

void foo(){
    bar();
}

int main(){
    foo();
    return 0;
}

For the following above, when the execution goes inside bar(), the stack frames can be illustrated below.

Stack overview when bar() is executed

Inside the stack frame are arguments, local variables and other values that keep the program running correctly.

int func(int a, int b){
    int x, y;
}

For the func(int, int) above, its stack frame is shown as below. The arguments of this function are at the top of the stack. The return address stores where the execution should continue after func() returns. The %ebp register (aka “stack base pointer”) points to the base of the stack frame, allowing the program to access saved %ebp, return address and its arguments by a fixed offset. The saved %ebp value stores the former %ebp of func()’s caller. When func() returns, the %ebp value will be reverted to the saved value. Below %ebp are the local variables of func(). Its layout is completely determined by the compiler, so we do not know where x and y are exactly. We only know that they are roughly in that region.

Stack layout of func()

There is also another “stack pointer” that points to the top of the stack, which is stored in register %esp.

Buffer overflow attack

Copying data to buffer

Usually, we use the strcpy(char *dest, const char *src) function to copy strings from source to destination. We do not need to specify how long src is, since the copying is automatically terminated when a '\0' is encountered in src.

Buffer overflow

Since the strcpy function does not compare the length of src and dest, it may be the case that the length of src exceeds the maximum length of dest. This is called buffer overflow. This is dangerous, because depending on the layout, other local variables can be overwritten. In the worst case, even the saved %ebp and return address will be modified. When the return address is changed, the program will almost always malfunction since the execution flow is broken. Usually, there are four consequences:

  1. Jumping to a invalid address: the memory protection mechanism of OS prevents the program from accessing unallocated memory space.
  2. Jumping to a protected address: the target memory space is allocated, but it is protected (e.g. it is reserved for the kernel). Usually these memory access violations will causes segmentation faults.
  3. Invalid instruction: the target memory space is allocated, and it is accessible. However, the data there is not a valid machine instruction.
  4. Normal execution: the target memory space is allocated, and it is accessible. The data there happens to be valid machine instructions.

By making use of buffer overflow carefully, an attacker can deliberately change the return address so that it points to some malicious code in another memory location, thus compromising the logic of the program.

The vulnerable program

Since analyzing complicated programs for buffer overflow vulnerabilities is difficult, we would try our attacks with the following program. The program has a buffer overflow problem, since we are copying str, a string of maximun length of 517 into buffer, whose maximum length is 24.

/* stack.c */

/* This program has a buffer overflow vulnerability. */
/* Our task is to exploit this vulnerability */
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int bof(char *str)
{
    char buffer[24];

    /* The following statement has a buffer overflow problem */
    strcpy(buffer, str);

    return 1;
}

int main(int argc, char **argv)
{
    char str[517];
    FILE *badfile;

    badfile = fopen("badfile", "r");
    fread(str, sizeof(char), 517, badfile);
    bof(str);

    printf("Returned Properly\n");
    return 1;
}

Now we analyze how to exploit the buffer overflow vulnerability of this problem. The bof function’s stack frame is shown in the figure below. When we call the strcpy function, it copies data from str[0] to buffer[0], str[1] to buffer[1], etc. If the length of str is longer than 24, it can be seen that we will tamper the variables above buffer, namely saved ebp, return address and arguments. We may also modify the main function’s stack frame.

Stack layout of bof()

Our goal is to modify the return address slot so that when the execution of bof finishes, we can mislead the program to “return” to our own malicious code segment. It is worth noticing that our malicious code segment is transferred into the victim computer using the input buffer (in this case, a file named badfile; but it could be input acquired from standard input). This immediately brings us the two major challenges of a buffer overflow attack:

  1. How to determine the offset of the return address? Since we need to overwrite the old return address with our own desired one, we need to guess where to place the return address value in our input buffer. If there is a misalignment, the attack will not succeed.
  2. How to determine which address to return to? The return address will point to the absolute location of the code that will be executed next. In order to trick the program into running our own code, we have to know the absolute location of where our input will be put into the memory.

It is very difficult to meet these two requirements when we do not have access to the vulnerable program’s source code. However, as we can see that there are ways to make these two conditions “fuzzy”, i.e. sometimes we can launch the attack successfully without knowing the exact address.

Conducting a buffer overflow attack

To conduct a (toy-level) buffer overflow attack, we need to follow these steps:

  1. Turn off countermeasures

    There are a lot of countermeasures that are already implemented in the Linux system to defend against buffer overflow attack. We need to disable all of them to make our life easier.

    • Turn off address space randomization

      Address space randomization will shuffle the address space, which makes it more difficult to guess the correct value of return address. It can be turned off using the following command:

      [09/13/19]seed@VM:~$ sudo sysctl -w kernel.randomize_va_space=0
      kernel.randomize_va_space = 0
      
    • Turn off compiler’s protections

      We need to compile stack.c with the following command:

      [09/13/19]seed@VM:~/.../lab2$ gcc -z execstack -fno-stack-protector -o stack stack.c
      

      The -z execstack flag disables non-executable stack protection. Because our malicious code is stored inside the stack frames, this mechanism will stop the code from executing.

      The -fno-stack-protector flag disables stack protector, which is a special code segment added to the program to detect if there is a stack overflow. Again, if it is not canceled, our attack will not succeed.

Index

attack surface · buffer overflow · capability leaking · effective user id · file permission · memory layout · principle of isolation · principle of least privilege · real user id · Set-UID programs · stack base pointer · stack frame · stack layout · stack pointer