Hello World x86

2024-02-01

Hello, World!

Programming is one of the most important skills in computer science. Typically, beginners take their first step into a new programming language by writing the iconic “Hello, World!” program. To maintain this tradition and encourage beginners to establish a strong foundation for their future work, my first blog post will be about a “Hello, World!” program, this time written in x86 assembly.

Source Code

To be precise, this program is written in C with inline x86 assembly. It is not only able to demonstrate a Hello, world! program in x86, but it also gives you an idea of how to add inline assembly code to C code. The source code is shown below:

 1#include <stdio.h>
 2
 3int main()
 4{
 5    __asm {
 6        ; Step 1: Push the string "Hello, World!\n" onto the stack in reverse order
 7        push 0x0a21    ; Push the null terminator and "\n!" 
 8        push 0x646c726f; Push "dlro"
 9        push 0x57202c6f; Push "W ,o"
10        push 0x6c6c6548; Push "lleH"
11
12        ;Step 2: Prepare the input for printf
13        mov eax, esp   ; Move the stack pointer to eax(which now points to the string)
14        push eax       ; Push the address of the string onto the stack
15        
16        ;Step 3: Call printf
17        call printf    ; Call printf function
18
19        ;Step 4: Clean up
20        add esp, 20    ; 4 bytes for the address of the stack string + 16 bytes for the string parts
21    }
22    return 0;
23}

Explanations

If you are a beginner, you might have many questions about this small program. I added some comments in the source code to help you understand what is going on in the code. This blog will not answer most of your questions, but I can point you to the directions to find the answers.

Keywords

To understand the source code, you will need the knowledge about the following topics (keywords):

  • x86 assembly language: learn the instructions push, mov, call, add.
  • stack: a fundamental concept in computer science. You can learn the general definition of a stack and then focus on program stack to understand why steps 1-4 are laid out the way they are.
  • stack string: To understand step 1 in the source code. Note that this is not a recommended method to allocate and store strings, but it’s been used by malware developers.
  • cdecl calling convention: This explains how we pass the input string to printf, and why we need to clean the stack in step 4. There are other calling conventions in x86, such as: stdcall, fastcall.

Notes

You might be concerned about step 4 when I clean 4 bytes for the address of the stack string and 16 bytes for the string parts. Because looking closely at step 1 and counting, only 14 bytes were pushed on the stack!?

That is a legitimate concern. You can verify how many bytes were pushed on the stack using a debugger or a disassembler to decompile the program.

I included a screenshot of the disassembled main function below:

disassembled-code

The right-hand side looks familiar to us. It includes our x86 code (this obviously won’t help clarify our concern). The new information presented on the left-hand side, called opcode (aka. machine code), shows that the first push instruction actually pushes 0x210a0000 instead of only 0x210a. This reveals the two mysterious bytes.

Open-ended

Aristotle, an ancient Greek philosopher and polymath, wrote, “The more you know, the more you realize you don’t know.”. Similarly, the more you study this simple Hello, World! program, the more you will realize how complicated it is. You will never know everything, but it helps to know more. I hope you will learn something new through this blog!



More posts like this

A Simple Windows Shellcode

2024-03-01 | #category_shellcode #topic_pe_header #topic_peb #topic_stack_string #topic_teb #topic_windows_internals #topic_x86

1. What Is A shellcode?

The name shellcode came from its original use to spawn a system shell in exploits after attackers successfully exploit vulnerabilities in software and redirect execution to the injected code. In general, a shellcode is a set of instructions that can be loaded and executed at any memory address (i.e. Position-independent code). Therefore, it cannot contain hard-coded addresses and must use reliable techniques to load or resolve addresses of the APIs/functions it needs.

Continue reading 