Hello World x86
2024-02-01
Hello, World!
Programming is one of the most important skills in computer science. Typically, beginners take their first step into a new programming language by writing the iconic “Hello, World!” program. To maintain this tradition and encourage beginners to establish a strong foundation for their future work, my first blog post will be about a “Hello, World!” program, this time written in x86
assembly.
Source Code
To be precise, this program is written in C with inline x86 assembly. It is not only able to demonstrate a Hello, world! program in x86, but it also gives you an idea of how to add inline assembly code to C code. The source code is shown below:
1#include <stdio.h>
2
3int main()
4{
5 __asm {
6 ; Step 1: Push the string "Hello, World!\n" onto the stack in reverse order
7 push 0x0a21 ; Push the null terminator and "\n!"
8 push 0x646c726f; Push "dlro"
9 push 0x57202c6f; Push "W ,o"
10 push 0x6c6c6548; Push "lleH"
11
12 ;Step 2: Prepare the input for printf
13 mov eax, esp ; Move the stack pointer to eax(which now points to the string)
14 push eax ; Push the address of the string onto the stack
15
16 ;Step 3: Call printf
17 call printf ; Call printf function
18
19 ;Step 4: Clean up
20 add esp, 20 ; 4 bytes for the address of the stack string + 16 bytes for the string parts
21 }
22 return 0;
23}
Explanations
If you are a beginner, you might have many questions about this small program. I added some comments in the source code to help you understand what is going on in the code. This blog will not answer most of your questions, but I can point you to the directions to find the answers.
Keywords
To understand the source code, you will need the knowledge about the following topics (keywords
):
x86 assembly language
: learn the instructionspush
,mov
,call
,add
.stack
: a fundamental concept in computer science. You can learn the general definition of a stack and then focus onprogram stack
to understand why steps 1-4 are laid out the way they are.stack string
: To understand step 1 in the source code. Note that this is not a recommended method to allocate and store strings, but it’s been used by malware developers.cdecl calling convention
: This explains how we pass the input string toprintf
, and why we need to clean the stack in step 4. There are other calling conventions in x86, such as:stdcall
,fastcall
.
Notes
You might be concerned about step 4 when I clean 4 bytes for the address of the stack string and 16 bytes for the string parts. Because looking closely at step 1 and counting, only 14 bytes were pushed on the stack!?
That is a legitimate concern. You can verify how many bytes were pushed on the stack using a debugger
or a disassembler
to decompile the program.
I included a screenshot of the disassembled main function below:
The right-hand side looks familiar to us. It includes our x86 code (this obviously won’t help clarify our concern). The new information presented on the left-hand side, called opcode
(aka. machine code
), shows that the first push
instruction actually pushes 0x210a0000
instead of only 0x210a
. This reveals the two mysterious bytes.
Open-ended
Aristotle, an ancient Greek philosopher and polymath, wrote, “The more you know, the more you realize you don’t know.”. Similarly, the more you study this simple Hello, World! program, the more you will realize how complicated it is. You will never know everything, but it helps to know more. I hope you will learn something new through this blog!