Introduction
Nowadays, assembly is barely used in programming day to day programs. Low-level calls are handled by libraries and compilers optimize code pretty well these days.
In fact, so well that switching between versions of your favorite compiler can boost heavy programs (like FFmpeg) by several times, especially on older hardware.
When writing in assembly, usually, you strip down the task to small pieces, find those that can operate directly with registers or memory heap to accelerate the algorithm. This is used in vector and matrix operations, processor initialization for kernel, etc.
But, if you want to write entirely in assembly and a lot of code, you can make it easier by using macros. Though, they make difficult for someone else to read your code, similarities with high level languages made up by macros will eliminate that gap.
What is missing?
Assembly language doesn't provide luxuries like variables, loops, if-else, classes and structures in a friendly way. You can work with memory and registers. You need to keep track of everything stored, address offsets, variables types, etc.
Modern assemblers provide some of these luxuries, but they still require you to check everything. For example, they won't make type checks, so you can add a number to an ASCII character. Memory bounds are not checked and you can overwrite another variable.
Calculating addresses can be a chore, so assemblers provide labels, that serve a purpose of providing offsets to a chosen command/line.
One thing they don't provide is functions and local variables as a feature. You can write them, but there is no keyword for that.
Writing our helpers
I assume that you know a bit how compiler and linker works together ;)
NASM (and YASM) provides an extensive macro support. So extensive that we will create a pseudo-HLL wrapper.
What NASM macros allow us to do: - Contexts with local labels and single-line macros - Constants definition - Single-line macros - Macros with variable parameters and default values - Loops - Conditions - String operations
These features will allow us to write code like this:
Function functionName, firstParameter, secondParameter
Locals firstVariable, secondVariable
mov eax, %$pfirstParameter
add eax, %$lsecondVariable
End
1. Function block
In assembly function blocks are made up of a label marking first instruction and a return instruction. Defining a single-line macro for a simple and short instruction or a colon with a name is a waste of time and space. But, we will need to place some more instructions for variables and parameters later.
%macro Function 1
%1:
%endmacro
%macro End 0
ret
%endmacro
In this example, we created macro called Function that accepts one parameter - name of the function. %1
will output value of first parameter. That will produce a syntactically correct label for a subroutine. Macro End doesn't accept any parameters.
Now, our code will look like this:
Function Start
mov eax, 0x04
add eax, 0x02
End
Calling convention
High-level langages use an identical way of transferring parameters to functions and local variables. It's called calling convention. For compatibility with binaries compiled from C we will use cdecl
calling convention. It's easy to implement and it uses only stack and a return register.
cdecl
defines these rules:
- Parameters passed through stack in reverse order
- Caller cleans up the stack for parameters
- Callee cleans up the stack for local variables
- Three registers should be saved by caller (EAX, ECX, EDX)
- Three registers must be saved by callee (EBX, ESI, EDI) if they will be used
Here is an example of a stack after subroutine is ready to use parameters and local variables:
How it works
For example, we want to implement a function that checks if a number is in bounds.
First, let's push parameters onto stack:
mov eax, 0x08 ; Value, calculated earlier
push eax ; Third parameter, value
push dword 0x10 ; Second parameter, right bound
push dword 0x02 ; First parameter, left bound
call boundsCheck
Then, we can access our parameters and proceed with writing an algorithm:
Function boundsCheck
mov eax, dword [ss:ebp + 0x08] ; Left bound
mov ebx, dword [ss:ebp + 0x10] ; Value
cmp ebx, eax
jg .leftOK
mov eax, 0x00
jmp .fail
.leftOK:
mov eax, dword [ss:ebp + 0x0C] ; Right bound
cmp ebx, eax
jl .rightOk
mov eax, 0x00
jmp .fail
.rightOk:
mov eax, 0x01
.fail:
End
After we return from subroutine, we must clear the stack by returning stack pointer to it's previous value (pop back 12 bytes):
sub esp, 0x0C
2. Improving our code
Callee side
First, we will write a macro to call a function. But, wait, isn't call
sufficient? Well, no. It doesn't know about our stack manipulations.
Let's define a macro that accepts one mandatory parameter (name of a function) and several optional parameters (* says about that).
%macro @ 1-*
%endmacro
Not all functions have parameters, so we need to check total amount of parameters (%0
tells us about that):
%macro @ 1-*
%if %0 > 1
%endif
%endmacro
Then, we will push parameters to stack. NASM allows to rotate parameters, so you don't need to worry about indexes and evaluate a macro parameter concatenated with other's parameter value (although you can).
%assign pCount (%0 - 1)
%rep pCount
%rotate -1
push DWORD %1
%rotate 1
%endrep
%rotate -1
At last, we can call our subroutine and add a cleanup code for the stack, if we passed parameters:
call %1
%if %0 > 1
add esp, (pCount * 0x04)
%endif
Function side
As with the macro to call a function, macro that defines a function should be able to define parameters and their names:
%macro Function 1-*
To be able to use our function from other object files, we need to tell linker about it:
type %1 function ; Type of symbol
global %1:function ; Declaring symbol as global
Next, we will create local context for macro. It's required if we will call a function from another function. Context isolates local-defined macros, so it won't conflict with each other.
%push %1
Writing [ss:ebp + N]
is quite tiresome and error-prone. So, we can use loops and defines to create local macro parameters, as we did earlier:
%if %0 > 1
%assign pCount (%0 - 1)
%assign pIdx 0
%rotate 1
%rep pCount
%xdefine %$p%2 [ss:ebp + 0x08 + 0x04 * pIdx]
%rotate 1
%assign pIdx (pIdx + 1)
%endrep
%endif
%$
is used to create a local single-line macro, that will be saved in local context.
0x08 is offset of two values in stack – return address and saved EBP. Each value pushed to stack is doubleword (4 bytes) on x86. You can use prefixes to enable 16 bit stack values, but that will make thing even harder.
After that, we can put a label for subroutine, save old EBP and move current stack pointer to EBP:
%1:
push ebp
mov ebp, esp
%endmacro
In the macro End we need to close macro context, remove local variables from stack by moving EBP value to ESP, and return saved EBP:
%macro End 0
mov esp, ebp
pop ebp
ret
%pop
%endmacro
Local variables
As a final touch, we will add local variables. It is similar to function parameters, but in normal order:
%macro Locals 1-*
%assign lIdx 1
%assign lCount %0
%rep lCount
%xdefine %$l%1 [ss:ebp - 0x04 * lIdx]
%rotate 1
%assign lIdx (lIdx + 1)
%endrep
sub esp, (lCount * 0x04) ; Equals to push, but several at once
%endmacro
We are subtracting from EBP because it equals to pushing a new value to stack. Previously, we added offset to EBP because values were already in stack. EBP (in our case) is the address that divides stack into two halves — parameters + return address + old EBP, and local variables and consecutive values that pushed to stack.
Calling our function from C
We will be testing three scenarios:
- Function that returns a constant number
- Function that returns a sum of constant and a parameter
- Function that returns a sum of parameter and product of other two
To compile our code we will use these commands:
$ yasm -f elf -o test.o test.asm
$ gcc main.c -o main test.o
1. Returning a constant number
cdecl
defines that function must place return value in EAX register. So, our function will be very simple.
section .text
Function testResult
mov eax, 0xDEAD
End
In C we need to tell compiler that function is defined (as in header file):
int testResult();
After that, we can simply call our function and receive a result:
int result = testResult();
printf("Return = 0x%X\n", result);
2. Returning a sum of constant and a parameter
In assembly, things a quite simple now. We add a parameter named val
and use local macro %$pval
to use it. Most of x86 instructions can use memory location as operands, so we don't need to always move variable from stack to register.
section .text
Function testSum, val
mov eax, 1000
add eax, %$pval
End
As with previous function, we need to define it and call:
int testSum(int value);
...
result = testSum(236);
printf("1000 + 236 = %d\n", result);
3. Returning a sum of parameter and product of other two
To test everyting together — parameters and local variables — we will first multiply two parameters and then add third to them.
section .text
Function testMulSum, First, Second, Third
Locals mul
push ebx
mov eax, %$pFirst
mov ebx, %$pSecond
mul ebx
mov %$lmul, eax
pop ebx
mov edx, %$pThird
add %$lmul, edx
mov eax, %$lmul
End
As we are using EBX, we need to save it to stack and restore after using. Also, multiplication result is saved to local variable and then moved to EAX as a return value.
Calling C function from assembly
We will be testing two scenarios:
- Function that calls
getpid
and returns it value back - Function that calls
printf
with format string as as a constant (in assembly), string and number from parameters and an immediate constant
1. getpid
Now, we will do everything in reverse. We need to tell assembler that we have function that is inplemented elsewhere and needs to be linked.
extern getpid
After that, we can call getpid
. We won't write any other code, because it returns process ID to EAX and our function return value is also in EAX.
section .text
Function testCall
@ getpid
End
2. printf
Previous example used function that doesn't accept any parameters. To prove that we can pass values from assembly, let's use printf
:
extern printf
...
section .text
Function testPrint, str, num
@ printf, helloStr, %$pstr, %$pnum, 0xBEEF
End
...
section .data
helloStr: db "Hello, world! Your name is %s, your age is %d. Constant is 0x%X", 0x0A, 0
helloStr
is a global variable that contains format string. 0x0A is a newline character, 0 is a string terminator. After it we pass parameters that contain a string and a number. As a last parameter we pass a constant value.
Results
As we can see, everything works correctly. To further investigate how it works and how compiler calls our assembly functions, you can try objdump -d main -M intel
and see disassembly of a final executable.
toreonify@localhost:/local$ ./main
Return = 0xDEAD
1000 + 236 = 1236
(16 * 3) + 11 = 59
PID = 28486
Hello, world! Your name is Sam, your age is 22. Constant is 0xBEEF
Conclusion
Now you know how to spend less time writing boilerplate code in assembly!
One of downsides of this calling declaration is a lot of instructions. And those instructions are accessing memory, so it's not cheap on time. If you want to write performance optimized software, well, this may need some modifications to use registers and not stack for parameters.
Linux and other *nix systems use calling convention (System V ABI) that passes parameters through registers and if function needs more than a few parameters, through stack.
Git repository with code: Bitbucket