UNIX VIRUSES - Silvio Cesare CONTENTS -------- IMPROVING THIS MANUAL THE UNIX-VIRUS MAILING LIST INTRODUCTION THE NON ELF INFECTOR FILE VIRUS (FILE INFECTION) MEMORY LAYOUT OF AN ELF EXECUTABLE ELF INFECTION THE TEXT SEGMENT PADDING VIRUS (PADDING INFECTION) INFECTING INFECTIONS THE DATA SEGMENT VIRUS (DATA INFECTION) VIRUS DETECTION THE TEXT SEGMENT VIRUS (TEXT INFECTION) INFECTION USING OBJECT CODE PARASITES OBJECT CODE LINKING THE IMPLEMENTED INFECTOR NON (NOT AS) TRIVIAL PARASITE CODE BEYOND ELF PARASITES AND ENTER VIRUS IN UNIX THE LINUX PARASITE VIRUS DEVELOPMENT OF THE LINUX VIRUS IMPROVING THE LINUX VIRUS VIRUS DETECTION EVADING VIRUS DETECTION IN ELF INFECTION CONCLUSION SOURCE (UUENCODED) IMPROVING THIS MANUAL For any comments or suggestions (even just to say hi) please contact the author Silvio Cesare, . This paper already has future plans to include more parasite techniques and shared object infection. More to come. THE UNIX-VIRUS MAILING LIST This is the charter for the unix-virus mailing list. Unix-virus was created to discuss viruses in the unix environment from the point of view of the virus creator, and the security developer writing anti-virus software. Anything related to viruses in the unix environment is open for discussion. Low level programming is commonly seen on the list, including source code. The emphasis is on expanding the knowledge of virus technology and not on the distribution of viruses, so binaries are discouraged but not totally excluded. The list is archived at http://virus.beergrave.net and it is recommended that the new subscriber read the existing material before posting. To subscribe to the list send a message to majordomo@virus.beergrave.net with 'subscribe unix-virus' in the body of the message. INTRODUCTION This paper documents the algorithms and implementation of UNIX parasite and virus code using ELF objects. Brief introductions on UNIX virus detection and evading such detection are given. An implementation of various ELF parasite infectors for UNIX is provided, and an ELF virus for Linux on x86 architecture is also supplied. Elementary programming and UNIX knowledge is assumed, and an understanding of Linux x86 architecture is assumed for the Linux implementation. ELF understanding is not required but will help. This paper does not document any significant virus programming techniques except those that are only applicable to the UNIX environment. Nor does it try to replicate the ELF specifications. The interested reader is advised to read the ELF documentation if this paper is unclear in ELF specifics. THE NON ELF INFECTOR FILE VIRUS (FILE INFECTION) An interesting, yet simple idea for a virus takes note, that when you append one executable to another, the original executable executes, but the latter executable is still intact and retrievable and even executable if copied to a new file and executed. # cat host >> parasite # mv parasite host # ./host PARASITE Executed Now.. if the parasite keeps track of its own length, it can copy the original host to a new file, then execute it like normal, making a working parasite and virus. The algorithm is as follows: * execute parasite work code * lseek to the end of the parasite * read the remaining portion of the file * write to a new file * execute the new file The downfall with this approach is that the remaining executable no longer remains strip safe. This will be explained further on when a greater understanding of the ELF format is obtained, but to summarize, the ELF headers no longer hold into account every portion of the file, and strip removes unaccounted portions. This is the premise of virus detection with this type of virus. This same method can be used to infect LKM's following similar procedures. MEMORY LAYOUT OF AN ELF EXECUTABLE A process image consists of a 'text segment' and a 'data segment'. The text segment is given the memory protection r-x (from this its obvious that self modifying code cannot be used in the text segment). The data segment is given the protection rw-. The segment as seen from the process image is typically not all in use as memory used by the process rarely lies on a page border (or we can say, not congruent to modulo the page size). Padding completes the segment, and in practice looks like this. key: [...] A complete page M Memory used in this segment P Padding Page Nr #1 [PPPPMMMMMMMMMMMM] \ #2 [MMMMMMMMMMMMMMMM] |- A segment #3 [MMMMMMMMMMMMPPPP] / Segments are not bound to use multiple pages, so a single page segment is quite possible. Page Nr #1 [PPPPMMMMMMMMPPPP] <- A segment Typically, the data segment directly proceeds the text segment which always starts on a page, but the data segment may not. The memory layout for a process image is thus. key: [...] A complete page T Text D Data P Padding Page Nr #1 [TTTTTTTTTTTTTTTT] <- Part of the text segment #2 [TTTTTTTTTTTTTTTT] <- Part of the text segment #3 [TTTTTTTTTTTTPPPP] <- Part of the text segment #4 [PPPPDDDDDDDDDDDD] <- Part of the data segment #5 [DDDDDDDDDDDDDDDD] <- Part of the data segment #6 [DDDDDDDDDDDDPPPP] <- Part of the data segment pages 1, 2, 3 constitute the text segment pages 4, 5, 6 constitute the data segment From here on, the segment diagrams may use single pages for simplicity. eg Page Nr #1 [TTTTTTTTTTTTPPPP] <- The text segment #2 [PPPPDDDDDDDDPPPP] <- The data segment For completeness, on x86, the stack segment is located after the data segment giving the data segment enough room for growth. Thus the stack is located at the top of memory (remembering that it grows down). In an ELF file, loadable segments are present physically in the file, which completely describe the text and data segments for process image loading. A simplified ELF format for an executable object relevant in this instance is. ELF Header . . Segment 1 <- Text Segment 2 <- Data . . Each segment has a virtual address associated with its starting location. Absolute code that references within each segment is permissible and very probable. ELF INFECTION To insert parasite code means that the process image must load it so that the original code and data is still intact. This means, that inserting a parasite requires the memory used in the segments to be increased. The text segment compromises not only code, but also the ELF headers including such things as dynamic linking information. It may be possible to keep the text segment as is, and create another segment consisting of the parasite code, however introducing an extra segment is certainly questionable and easy to detect. Page padding at segment borders however provides a practical location for parasite code given that its size is able. This space will not interfere with the original segments, requiring no relocation. Following the guideline just given of preferencing the text segment, we can see that the padding at the end of the text segment is a viable solution. Extending the text segment backwards is a viable solution and is documented and implemented further in this article. Extending the text segment forward or extending the data segment backward will probably overlap the segments. Relocating a segment in memory will cause problems with any code that absolutely references memory. It is possible to extend the data segment, however this isn't preferred, as its not UNIX portable that properly implement execute memory protection. An ELF parasite however is implemented using this technique and is explained later in this article. THE EXECUTABLE AND LINKAGE FORMAT A more complete ELF executable layout is (ignoring section content - see below). ELF Header Program header table Segment 1 Segment 2 Section header table optional In practice, this is what is normally seen. ELF Header Program header table Segment 1 Segment 2 Section header table Section 1 . . Section n Typically, the extra sections (those not associated with a segment) are such things as debugging information, symbol tables etc. From the ELF specifications: "An ELF header resides at the beginning and holds a ``road map'' describing the file's organization. Sections hold the bulk of object file information for the linking view: instructions, data, symbol table, relocation information, and so on. ... ... A program header table, if present, tells the system how to create a process image. Files used to build a process image (execute a program) must have a program header table; relocatable files do not need one. A section header table contains information describing the file's sections. Every section has an entry in the table; each entry gives information such as the section name, the section size, etc. Files used during linking must have a section header table; other object files may or may not have one. ... ... Executable and shared object files statically represent programs. To execute such programs, the system uses the files to create dynamic program representations, or process images. A process image has segments that hold its text, data, stack, and so on. The major sections in this part discuss the following. Program header. This section complements Part 1, describing object file structures that relate directly to program execution. The primary data structure, a program header table, locates segment images within the file and contains other information necessary to create the memory image for the program." An ELF object may also specify an entry point of the program, that is, the virtual memory location that assumes control of the program. Thus to activate parasite code, the program flow must include the new parasite. This can be done by patching the entry point in the ELF object to point (jump) directly to the parasite. It is then the parasite's responsibility that the host code be executed - typically, by transferring control back to the host once the parasite has completed its execution. From /usr/include/elf.h typedef struct { unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */ Elf32_Half e_type; /* Object file type */ Elf32_Half e_machine; /* Architecture */ Elf32_Word e_version; /* Object file version */ Elf32_Addr e_entry; /* Entry point virtual address */ Elf32_Off e_phoff; /* Program header table file offset */ Elf32_Off e_shoff; /* Section header table file offset */ Elf32_Word e_flags; /* Processor-specific flags */ Elf32_Half e_ehsize; /* ELF header size in bytes */ Elf32_Half e_phentsize; /* Program header table entry size */ Elf32_Half e_phnum; /* Program header table entry count */ Elf32_Half e_shentsize; /* Section header table entry size */ Elf32_Half e_shnum; /* Section header table entry count */ Elf32_Half e_shstrndx; /* Section header string table index */ } Elf32_Ehdr; e_entry is the entry point of the program given as a virtual address. For knowledge of the memory layout of the process image and the segments that compromise it stored in the ELF object see the Program Header information below. e_phoff gives use the file offset for the start of the program header table. Thus to read the header table (and the associated loadable segments), you may lseek to that position and read e_phnum*sizeof(Elf32_Pdr) bytes associated with the program header table. It can also be seen, that the section header table file offset is also given. It was previously mentioned that the section table resides at the end of the file, so after inserting of data at the end of the segment on file, the offset must be updated to reflect the new position. /* Program segment header. */ typedef struct { Elf32_Word p_type; /* Segment type */ Elf32_Off p_offset; /* Segment file offset */ Elf32_Addr p_vaddr; /* Segment virtual address */ Elf32_Addr p_paddr; /* Segment physical address */ Elf32_Word p_filesz; /* Segment size in file */ Elf32_Word p_memsz; /* Segment size in memory */ Elf32_Word p_flags; /* Segment flags */ Elf32_Word p_align; /* Segment alignment */ } Elf32_Phdr; Loadable program segments (text/data) are identified in a program header by a p_type of PT_LOAD (1). Again as with the e_shoff in the ELF header, the file offset (p_offset) must be updated in later phdr's to reflect their new position in the file. p_vaddr identifies the virtual address of the start of the segment. As mentioned above regarding the entry point. It is now possible to identify where program flow begins, by using p_vaddr as the base index and calculating the offset to e_entry. p_filesz and p_memsz are the file sizes and memory sizes respectively that the segment occupies. The use of this scheme of using file and memory sizes, is that where its not necessary to load memory in the process from disk, you may still be able to say that you want the process image to occupy its memory. The .bss section (see below for section definitions), which is for uninitialized data in the data segment is one such case. It is not desirable that uninitialized data be stored in the file, but the process image must allocated enough memory. The .bss section resides at the end of the segment and any memory size past the end of the file size is assumed to be part of this section. /* Section header. */ typedef struct { Elf32_Word sh_name; /* Section name (string tbl index) */ Elf32_Word sh_type; /* Section type */ Elf32_Word sh_flags; /* Section flags */ Elf32_Addr sh_addr; /* Section virtual addr at execution */ Elf32_Off sh_offset; /* Section file offset */ Elf32_Word sh_size; /* Section size in bytes */ Elf32_Word sh_link; /* Link to another section */ Elf32_Word sh_info; /* Additional section information */ Elf32_Word sh_addralign; /* Section alignment */ Elf32_Word sh_entsize; /* Entry size if section holds table */ } Elf32_Shdr; The sh_offset is the file offset that points to the actual section. The shdr should correlate to the segment its located it. It is highly suspicious if the vaddr of the section is different to what is in from the segments view. THE TEXT SEGMENT PADDING VIRUS (PADDING INFECTION) The resulting segments after parasite insertion into text segment padding looks like this. key: [...] A complete page V Parasite code T Text D Data P Padding Page Nr #1 [TTTTTTTTTTTTVVPP] <- Text segment #2 [PPPPDDDDDDDDPPPP] <- Data segment ... After insertion of parasite code, the layout of the ELF file will look like this. ELF Header Program header table Segment 1 - The text segment of the host - The parasite Segment 2 Section header table Section 1 . . Section n Thus the parasite code must be physically inserted into the file, and the text segment extended to see the new code. To insert code at the end of the text segment thus leaves us with the following to do so far. * Increase p_shoff to account for the new code in the ELF header * Locate the text segment program header * Increase p_filesz to account for the new code * Increase p_memsz to account for the new code * For each phdr who's segment is after the insertion (text segment) * increase p_offset to reflect the new position after insertion * For each shdr who's section resides after the insertion * Increase sh_offset to account for the new code * Physically insert the new code into the file - text segment p_offset + p_filesz (original) There is one hitch however. Following the ELF specifications, p_vaddr and p_offset in the Phdr must be congruent together, to modulo the page size. key: ~= is denoting congruency. p_vaddr (mod PAGE_SIZE) ~= p_offset (mod PAGE_SIZE) This means, that any insertion of data at the end of the text segment on the file must be congruent modulo the page size. This does not mean, the text segment must be increased by such a number, only that the physical file be increased so. This also has an interesting side effect in that often a complete page must be used as padding because the required vaddr isn't available. The following may thus happen. key: [...] A complete page T Text D Data P Padding Page Nr #1 [TTTTTTTTTTTTPPPP] <- Text segment #2 [PPPPPPPPPPPPPPPP] <- Padding #3 [PPPPDDDDDDDDPPPP] <- Data segment This can be taken advantage off in that it gives the parasite code more space, such a spare page cannot be guaranteed. To take into account of the congruency of p_vaddr and p_offset, our algorithm is modified to appear as this. * Increase p_shoff by PAGE_SIZE in the ELF header * Locate the text segment program header * Increase p_filesz by account for the new code * Increase p_memsz to account for the new code * For each phdr who's segment is after the insertion (text segment) * increase p_offset by PAGE_SIZE * For each shdr who's section resides after the insertion * Increase sh_offset by PAGE_SIZE * Physically insert the new code and pad to PAGE_SIZE, into the file - text segment p_offset + p_filesz (original) Now that the process image loads the new code into being, to run the new code before the host code is a simple matter of patching the ELF entry point and the virus jump to host code point. The new entry point is determined by the text segment v_addr + p_filesz (original) since all that is being done, is the new code is directly prepending the original host segment. For complete infection code then. * Increase p_shoff by PAGE_SIZE in the ELF header * Patch the insertion code (parasite) to jump to the entry point (original) * Locate the text segment program header * Modify the entry point of the ELF header to point to the new code (p_vaddr + p_filesz) * Increase p_filesz by account for the new code (parasite) * Increase p_memsz to account for the new code (parasite) * For each phdr who's segment is after the insertion (text segment) * increase p_offset by PAGE_SIZE * For each shdr who's section resides after the insertion * Increase sh_offset by PAGE_SIZE * Physically insert the new code (parasite) and pad to PAGE_SIZE, into the file - text segment p_offset + p_filesz (original) This, while perfectly functional, can arouse suspicion because the the new code at the end of the text segment isn't accounted for by any sections. Its an easy matter to associate the entry point with a section however by extending its size, but the last section in the text segment is going to look suspicious. Associating the new code to a section must be done however as programs such as 'strip' use the section header tables and not the program headers. The final algorithm is using this information is. * Increase p_shoff by PAGE_SIZE in the ELF header * Patch the insertion code (parasite) to jump to the entry point (original) * Locate the text segment program header * Modify the entry point of the ELF header to point to the new code (p_vaddr + p_filesz) * Increase p_filesz by account for the new code (parasite) * Increase p_memsz to account for the new code (parasite) * For each phdr who's segment is after the insertion (text segment) * increase p_offset by PAGE_SIZE * For the last shdr in the text segment * increase sh_len by the parasite length * For each shdr who's section resides after the insertion * Increase sh_offset by PAGE_SIZE * Physically insert the new code (parasite) and pad to PAGE_SIZE, into the file - text segment p_offset + p_filesz (original) infect-elf-p is the supplied program (complete with source) that implements the elf infection using text segment padding as described. INFECTING INFECTIONS In the parasite described, infecting infections isn't a problem at all. By skipping executables that don't have enough padding for the parasite, this is solved implicitly. Multiple parasites may exist in the host, but their is a limit of how many depending on the size of the parasite code. THE DATA SEGMENT VIRUS (DATA INFECTION) The new method of ELF infection as briefly described in the last section means that the data segment is extended and the parasite is located in the new extended space. In x86 architecture, at least, code that is in the data segment may be executed. To extend the data segment means we simply have to extend the program header in the ELF executable. Note must be taken though, that the .bss section ends the data segment normally. This section is used for uninitialized data and occupies no file space but does occupy memory space. If we extend the data segment we have to leave space for the .bss section. The memory layout is as follows. original: [text] [data] parasite: [text] [data] [parasite] The algorithm for the data segment parasite is show below. * Patch the insertion code (parasite) to jump to the entry point (original) * Locate the data segment * Modify the entry point of the ELF header to point to the new code (p_vaddr + p_memsz) * Increase p_filesz to account for the new code and .bss * Increase p_memsz to account for the new code * Find the length of the .bss section (p_memsz - p_filesz) * For each phdr who's segment is after the insertion (text segment) * increase p_offset to reflect the new position after insertion * For each shdr who's section resides after the insertion * Increase sh_offset to account for the new code * Physically insert the new code into the file The algorithm shown works for an ELF executable but the parasite inserted into the host becomes strip unsafe because no section matches the parasite. A new section could be created for this purpose to become strip safe again. This however has not been implemented. This type of virus is easy to spot if you know what your looking for. For starters no section matches the entry point and more suspect is the fact that the entry point is in the data segment. VIRUS DETECTION The detection of the data segment virus is extremely easy taking into account that the entry point of the ELF image is in the data segment not in the text segment. An implementation of a simple virus scanner is supplied. THE TEXT SEGMENT VIRUS (TEXT INFECTION) The text segment virus works under the premise that the text segment can be extended backwards and new parasite code can run in the extension. The memory layout is as follows. original: [text] [data] parasite: [parasite] (new start of text) [text] [data] The algorithm is as follows: * Patch the insertion code (parasite) to jump to the entry point (original) * Locate the text segment * For each phdr who's segment is after the insertion (text segment) * increase p_offset to reflect the new position after insertion * For each shdr who's section resides after the insertion * Increase sh_offset to account for the new code * Physically insert the new code into the file INFECTION USING OBJECT CODE PARASITES It is often desireable not to use assembler for parasite code but use direct C code instead. This can make writing a pure C virus possible avoiding the messy steps of converting code to asm which require extra time and skill. This can be acheived through the use of relocatable or object code. Because we cant just extract an executeable image as the parasite image because the image is fixed at a certain memory location we can use a relocatable image and link into the desired location. OBJECT CODE LINKING ELF is the typical standard used to represent object code on Linux. The paper will thus only refer to linking using ELF objects. An object code file is referred to as relocatable code when using ELF because that summarizes what it is. It is not fixed to any memory position. It is the responsibility of linking that makes an executable image out of a relocatable object and binds symbols to addresses. Linking of code is done by relocating the code to a fixed positing. For the most part, the object code does not need to be changed heavily. Consider the following C code. #include #include static inline _syscall3(ssize_t, write, int, fd, const void *, buf, size_t, count); int main() { write(1, "INFECTED Host\n", 14); } The string 's' being part of the relocatable text section in the object has no known absolute position in memory at compile time. Likewise, printk, is an externally defined symbol and its address is also not known at compile time. Relocation sections in the ELF object are used for describing what needs to be modified (relocated) in the object. In the above case, relocation entries would be made for printk's reference and the string's reference. The format for an ELF relocatable object (object code) is as follows. ELF header Program header table Section 1 Section n Section header table From the ELF specifications. "String Table String table sections hold null-terminated character sequences, commonly called strings. The object file uses these strings to represent symbol and section names. One references a string as an index into the string table section. The first byte, which is index zero, is defined to hold a null character. Likewise, a string tables last byte is defined to hold a null character, ensuring null termination for all strings. A string whose index is zero specifies either no name or a null name, depending on the context. An empty string table section is permitted; its section headers sh_size member would contain zero. Non-zero indexes are invalid for an empty string table." . . . Symbol Table An object file's symbol table holds information needed to locate and relocate a program's symbolic definitions and references. A symbol table index is a subscript into this array. Index 0 both designates the first entry in the table and serves as the undefined symbol index. The contents of the initial entry are specified later in this section." /* Symbol table entry. */ typedef struct { Elf32_Word st_name; /* Symbol name (string tbl index) */ Elf32_Addr st_value; /* Symbol value */ Elf32_Word st_size; /* Symbol size */ unsigned char st_info; /* Symbol type and binding */ unsigned char st_other; /* No defined meaning, 0 */ Elf32_Section st_shndx; /* Section index */ } Elf32_Sym; #define SHN_UNDEF 0 /* No section, undefined symbol. */ /* How to extract and insert information held in the st_info field. */ #define ELF32_ST_TYPE(val) ((val) & 0xf) #define ELF32_ST_INFO(bind, type) (((bind) << 4) + ((type) & 0xf)) /* Legal values for ST_BIND subfield of st_info (symbol binding). */ #define STB_LOCAL 0 /* Local symbol */ #define STB_GLOBAL 1 /* Global symbol */ #define STB_WEAK 2 /* Weak symbol */ #define STB_NUM 3 /* Number of defined types. */ #define STB_LOPROC 13 /* Start of processor-specific */ #define STB_HIPROC 15 /* End of processor-specific */ From the ELF specifications. "A relocation section references two other sections: a symbol table and a section to modify. The section headers sh_info and sh_link members, described in ``Sections'' above, specify these relationships. Relocation entries for different object files have slightly different interpretations for the r_offset member. In relocatable files, r_offset holds a section offset. That is, the relocation section itself describes how to modify another section in the file; relocation offsets designate a storage unit within the second section." From /usr/include/elf.h /* Relocation table entry without addend (in section of type SHT_REL). */ typedef struct { Elf32_Addr r_offset; /* Address */ Elf32_Word r_info; /* Relocation type and symbol index */ } Elf32_Rel; /* How to extract and insert information held in the r_info field. */ #define ELF32_R_SYM(val) ((val) >> 8) #define ELF32_R_TYPE(val) ((val) & 0xff) #define ELF32_R_INFO(sym, type) (((sym) << 8) + ((type) & 0xff)) These selected paragraphs and sections from the ELF specifications and header files give us a good high level concept of how a relocatable ELF file can be linked to produce an image capable of being executed. The process of linking the image is as follows. * Identify the file as being in relocatable ELF format * Load each relevant section into memory * For each PROGBITS section set the section address in memory * For each REL (relocation) section, carry out the relocation * Assemble the executable image by copying the sections into their respective positions in memory The relocation step may be expanded into the following algorithm. * Evaluate the target section of the relocation entry * Evaluate the symbol table section of the relocation entry * Evaluate the location in the section that the relocation is to apply * Evaluate the address of the symbol that is used in the relocation * Apply the relocation The actual relocation is best presented by looking at the source. For more information on the relocation types refer to the ELF specifications. Note that we ignore the global offset table completely and any relocation types of its nature. switch (ELF32_R_TYPE(rel->r_info)) { case R_386_NONE: break; case R_386_PLT32: case R_386_PC32: *loc -= dot; /* *loc += addr - dot */ case R_386_32: *loc += addr; break; THE IMPLEMENTED INFECTOR The implemented infector must use C parasite code that avoids libc and uses Linux syscalls exclusively. This means that plt/got problems are avoided. Likewise the parasite code must end in the following asm: loop1: popl %eax cmpl $0x22223333, %eax jne loop1 popl %edx popl %ecx popl %ebx popl %eax popl %esi popl %edi movl $0x11112222, %ebp jmp *%ebp This is so it can jump back to the host correctly. It uses a little trickery to do this properly. Why the popl loop? - well.. the jump back to host goes in _before_ the end of main, so there are still some variables to be pop'd back before your back to where you start. you dont know how many variables have been pushed, so a unique magic number is used to mark the start/end of it - check the initcode in relocater.c. The movl $0x11112222,%ebp ? - well.. u dont know where abouts this jmp (back to host) is going to be in the code, so you substitute a unique magic number where you want the host entry point to go. Then you search the object code for the magic and replace. NON (NOT AS) TRIVIAL PARASITE CODE Parasite code that requires memory access requires the stack to be used manually naturally. No bss section can be used from within the virus code in the padding and text infectors because it can only use part of the text segment. It is strongly suggested that rodata not be used, in-fact, it is strongly suggested that no location specific data be used at all that resides outside the parasite at infection time. Thus, if initialized data is to be used, it is best to place it in the text segment, ie at the end of the parasite code - see below on calculating address locations of initialized data that is not known at compile/infection time. If the heap is to be used, then it will be operating system dependent. In Linux, this is done via the 'brk' syscall. The use of any shared library calls from within the parasite should be removed, to avoid any linking problems and to maintain a portable parasite in files that use varying libraries. It is thus naturally recommended to avoid using libc. Most importantly, the parasite code must be relocatable. It is possible to patch the parasite code before inserting it, however the cleanest approach is to write code that doesn't need to be patched. In x86 Linux, some syscalls require the use of an absolute address pointing to initialized data. This can be made relocatable by using a common trick used in buffer overflow code. jmp A B: pop %eax ; %eax now has the address of the string . ; continue as usual . . A: call B .string \"hello\" By making a call directly proceeding the string of interest, the address of the string is pushed onto the stack as the return address. BEYOND ELF PARASITES AND ENTER VIRUS IN UNIX In a UNIX environment the most probably method for a typical garden variety virus to spread is through infecting files that it has legal permission to do so. A simple method of locating new files possible to infect, is by scanning the current directory for writable files. This has the advantage of being relatively fast (in comparison to large tree walks) but finds only a small percentage of infect-able files. Directory searches are however very slow irrespectively, even without large tree walks. If parasite code does not fork, its very quickly noticed what is happening. In the sample virus supplied, only a small random set of files in the current directory are searched. Forking, as mentioned, easily solves the problem of slowing the startup to the host code, however new processes on the system can be spotted as abnormal if careful observation is used. The parasite code as mentioned, must be completely written in machine code, this does not however mean that development must be done like this. Development can easily be done in a high level language such as C and then compiled to asm to be used as parasite code. A bootstrap process can be used for initial infection of the virus into a host program that can then be distributed. That is, the ELF infector code is used, with the virus as the parasite code to be inserted. THE LINUX PARASITE VIRUS This virus implements the ELF infection described by utilizing the padding at the end of the text segment. In this padding, the virus in its entirety is copied, and the appropriate entry points patched. At the end of the parasite code, are the instructions. movl %ebp, $XXXX jmp *%ebp XXXX is patched when the virus replicates to the host entry point. This approach does have the side effect of trashing the ebp register which may or may not be destructive to programs who's entry points depend on ebp being set on entry. In practice, I have not seen this happen (the implemented Linux virus uses the ebp approach), but extensive replicating has not been performed. On execution of an infected host, the virus will copy the parasite (virus) code contained in itself (the file) into memory. The virus will then scan randomly (random enough for this instance) through the current directory, looking for ELF files of type ET_EXEC or ET_DYN to infect. It will infect up to Y_INFECT files, and scan up to N_INFECT files in total. If a file can be infected, ie, its of the correct ELF type, and the padding can sustain the virus, a a modified copy of the file incorporating the virus is made. It then renames the copy to the file its infecting, and thus it is infected. Due to the rather large size of the virus in comparison to the page size (approx 2.3k) not all files are able to be infected, in fact only near half on average. DEVELOPMENT OF THE LINUX VIRUS The Linux virus was completely written in C, and strongly based around the ELF infector code. The C code is supplied as elf-p-virus.c The code requires the use of no libraries, and avoids libc by using a similar scheme to the _syscall declarations Linux employs modified not to use errno. Heap memory was used for dynamic allocation of the phdr and shdr tables using 'brk'. Linux has some syscalls which require the address of initialized strings to be passed to it, notably, open, rename, and unlink. This requires initialized data storage. As stated before, rodata cannot be used, so this data was placed at the end of the code. Making it relocatable required the use of the above mentioned algorithm of using call to push the address (return value) onto the stack. To assist in the asm conversion, extra variables were declared so to leave room on the stack to store the addresses as in some cases the address was used more than once. The C code form of the virus allowed for a debugging version which produces verbose output, and allows argv[0] to be given as argv[1]. This is advantageous because you can setup a pseudo infected host which is non replicating. Then run the virus making argv[0] the name of the pseudo infected host. It would replicate the parasite from that host. Thus it was possible to test without having a binary version of a replicating virus. The C code was converted to asm using the c compiler gcc, with the -S flag to produce assembler. Modifications were made so that use of rodata for initialized data (strings for open, unlink, and rename), was replaced with the relocatable data using the call address methodology. Most of the registers were saved on virus startup and restored on exit (transference of control to host). The asm version of the virus, can be improved tremendously in regards to efficiency, which will in turn improve the expected life time and replication of the virus (a smaller virus can infect more objects, where previously the padding would dictate the larger virus couldn't infect it). The asm virus was written with development time the primary concern and hence almost zero time was spent on hand optimization of the code gcc generated from the C version. In actual fact, less than 5 minutes were spent in asm editing - this is indicative that extensive asm specific skills are not required for a non optmised virus. The edited asm code was compiled (elf-p-virus-egg.c), and then using objdump with the -D flag, the addresses of the parasite start, the required offsets for patching were recorded. The asm was then edited again using the new information. The executable produced was then patched manually for any bytes needed. elf-text2egg was used to extract hex-codes for the complete length of the parasite code usable in a C program, ala the ELF infector code. The ELF infector was then recompiled using the virus parasite. # objdump -D elf-p-virus-egg . . 08048143