toreonify's notes

Making life easier with macros

Introduction

Nowadays, assembly is barely used in programming day to day programs. Low-level calls are handled by libraries and compilers optimize code pretty well these days.

In fact, so well that switching between versions of your favorite compiler can boost heavy programs (like FFmpeg) by several times, especially on older hardware.

When writing in assembly, usually, you strip down the task to small pieces, find those that can operate directly with registers or memory heap to accelerate the algorithm. This is used in vector and matrix operations, processor initialization for kernel, etc.

But, if you want to write entirely in assembly and a lot of code, you can make it easier by using macros. Though, they make difficult for someone else to read your code, similarities with high level languages made up by macros will eliminate that gap.

What is missing?

Assembly language doesn't provide luxuries like variables, loops, if-else, classes and structures in a friendly way. You can work with memory and registers. You need to keep track of everything stored, address offsets, variables types, etc.

Modern assemblers provide some of these luxuries, but they still require you to check everything. For example, they won't make type checks, so you can add a number to an ASCII character. Memory bounds are not checked and you can overwrite another variable.

Calculating addresses can be a chore, so assemblers provide labels, that serve a purpose of providing offsets to a chosen command/line.

One thing they don't provide is functions and local variables as a feature. You can write them, but there is no keyword for that.

Writing our helpers

I assume that you know a bit how compiler and linker works together ;)

NASM (and YASM) provides an extensive macro support. So extensive that we will create a pseudo-HLL wrapper.

What NASM macros allow us to do: - Contexts with local labels and single-line macros - Constants definition - Single-line macros - Macros with variable parameters and default values - Loops - Conditions - String operations

These features will allow us to write code like this:

Function functionName, firstParameter, secondParameter
Locals firstVariable, secondVariable

    mov eax, %$pfirstParameter
    add eax, %$lsecondVariable

End

1. Function block

In assembly function blocks are made up of a label marking first instruction and a return instruction. Defining a single-line macro for a simple and short instruction or a colon with a name is a waste of time and space. But, we will need to place some more instructions for variables and parameters later.

%macro Function 1
    %1:
%endmacro

%macro End 0
    ret
%endmacro

In this example, we created macro called Function that accepts one parameter - name of the function. %1 will output value of first parameter. That will produce a syntactically correct label for a subroutine. Macro End doesn't accept any parameters.

Now, our code will look like this:

Function Start
    mov eax, 0x04
    add eax, 0x02
End

Calling convention

High-level langages use an identical way of transferring parameters to functions and local variables. It's called calling convention. For compatibility with binaries compiled from C we will use cdecl calling convention. It's easy to implement and it uses only stack and a return register.

cdecl defines these rules:

  • Parameters passed through stack in reverse order
  • Caller cleans up the stack for parameters
  • Callee cleans up the stack for local variables
  • Three registers should be saved by caller (EAX, ECX, EDX)
  • Three registers must be saved by callee (EBX, ESI, EDI) if they will be used

Here is an example of a stack after subroutine is ready to use parameters and local variables:

Stack

How it works

For example, we want to implement a function that checks if a number is in bounds.

First, let's push parameters onto stack:

mov eax, 0x08 ; Value, calculated earlier 

push eax ; Third parameter, value
push dword 0x10 ; Second parameter, right bound
push dword 0x02 ; First parameter, left bound
call boundsCheck

Then, we can access our parameters and proceed with writing an algorithm:

Function boundsCheck
    mov eax, dword [ss:ebp + 0x08] ; Left bound
    mov ebx, dword [ss:ebp + 0x10] ; Value 

    cmp ebx, eax
    jg .leftOK

    mov eax, 0x00
    jmp .fail

    .leftOK:
    mov eax, dword [ss:ebp + 0x0C] ; Right bound
    cmp ebx, eax
    jl .rightOk

    mov eax, 0x00
    jmp .fail

    .rightOk:
    mov eax, 0x01

    .fail:
End

After we return from subroutine, we must clear the stack by returning stack pointer to it's previous value (pop back 12 bytes):

sub esp, 0x0C

2. Improving our code

Callee side

First, we will write a macro to call a function. But, wait, isn't call sufficient? Well, no. It doesn't know about our stack manipulations.

Let's define a macro that accepts one mandatory parameter (name of a function) and several optional parameters (* says about that).

%macro @ 1-*

%endmacro

Not all functions have parameters, so we need to check total amount of parameters (%0 tells us about that):

%macro @ 1-*
    %if %0 > 1

    %endif
%endmacro

Then, we will push parameters to stack. NASM allows to rotate parameters, so you don't need to worry about indexes and evaluate a macro parameter concatenated with other's parameter value (although you can).

%assign pCount (%0 - 1)
    %rep pCount
        %rotate -1
        push DWORD %1
    %rotate 1
    %endrep

%rotate -1 

At last, we can call our subroutine and add a cleanup code for the stack, if we passed parameters:

  call %1
  %if %0 > 1
       add esp, (pCount * 0x04)
  %endif

Function side

As with the macro to call a function, macro that defines a function should be able to define parameters and their names:

%macro Function 1-*

To be able to use our function from other object files, we need to tell linker about it:

type %1 function ; Type of symbol
global %1:function ; Declaring symbol as global

Next, we will create local context for macro. It's required if we will call a function from another function. Context isolates local-defined macros, so it won't conflict with each other.

%push %1

Writing [ss:ebp + N] is quite tiresome and error-prone. So, we can use loops and defines to create local macro parameters, as we did earlier:

   %if %0 > 1
     %assign pCount (%0 - 1)
     %assign pIdx 0

     %rotate 1
     %rep pCount
       %xdefine %$p%2 [ss:ebp + 0x08 + 0x04 * pIdx]
       %rotate 1
       %assign pIdx (pIdx + 1)
     %endrep
   %endif

%$ is used to create a local single-line macro, that will be saved in local context.

0x08 is offset of two values in stack – return address and saved EBP. Each value pushed to stack is doubleword (4 bytes) on x86. You can use prefixes to enable 16 bit stack values, but that will make thing even harder.

After that, we can put a label for subroutine, save old EBP and move current stack pointer to EBP:

%1:
     push ebp
     mov ebp, esp

%endmacro

In the macro End we need to close macro context, remove local variables from stack by moving EBP value to ESP, and return saved EBP:

%macro End 0

     mov esp, ebp
     pop ebp
  ret
%pop
%endmacro

Local variables

As a final touch, we will add local variables. It is similar to function parameters, but in normal order:

%macro Locals 1-*
   %assign lIdx 1
   %assign lCount %0

   %rep lCount
     %xdefine %$l%1 [ss:ebp - 0x04 * lIdx]
     %rotate 1
     %assign lIdx (lIdx + 1)
   %endrep

   sub esp, (lCount * 0x04) ; Equals to push, but several at once
%endmacro

We are subtracting from EBP because it equals to pushing a new value to stack. Previously, we added offset to EBP because values were already in stack. EBP (in our case) is the address that divides stack into two halves — parameters + return address + old EBP, and local variables and consecutive values that pushed to stack.

Calling our function from C

We will be testing three scenarios:

  • Function that returns a constant number
  • Function that returns a sum of constant and a parameter
  • Function that returns a sum of parameter and product of other two

To compile our code we will use these commands:

$ yasm -f elf -o test.o test.asm
$ gcc main.c -o main test.o

1. Returning a constant number

cdecl defines that function must place return value in EAX register. So, our function will be very simple.

section .text

Function testResult
        mov eax, 0xDEAD
End

In C we need to tell compiler that function is defined (as in header file):

int testResult();

After that, we can simply call our function and receive a result:

int result = testResult();

printf("Return = 0x%X\n", result);

2. Returning a sum of constant and a parameter

In assembly, things a quite simple now. We add a parameter named val and use local macro %$pval to use it. Most of x86 instructions can use memory location as operands, so we don't need to always move variable from stack to register.

section .text

Function testSum, val
        mov eax, 1000
        add eax, %$pval
End

As with previous function, we need to define it and call:

int testSum(int value);

...

result = testSum(236);

printf("1000 + 236 = %d\n", result);

3. Returning a sum of parameter and product of other two

To test everyting together — parameters and local variables — we will first multiply two parameters and then add third to them.

section .text

Function testMulSum, First, Second, Third
Locals mul
        push ebx
        mov eax, %$pFirst
        mov ebx, %$pSecond
        mul ebx
        mov %$lmul, eax
        pop ebx

        mov edx, %$pThird
        add %$lmul, edx

        mov eax, %$lmul
End

As we are using EBX, we need to save it to stack and restore after using. Also, multiplication result is saved to local variable and then moved to EAX as a return value.

Calling C function from assembly

We will be testing two scenarios:

  • Function that calls getpid and returns it value back
  • Function that calls printf with format string as as a constant (in assembly), string and number from parameters and an immediate constant

1. getpid

Now, we will do everything in reverse. We need to tell assembler that we have function that is inplemented elsewhere and needs to be linked.

extern getpid

After that, we can call getpid. We won't write any other code, because it returns process ID to EAX and our function return value is also in EAX.

section .text

Function testCall
        @ getpid
End

2. printf

Previous example used function that doesn't accept any parameters. To prove that we can pass values from assembly, let's use printf:

extern printf

...
section .text

Function testPrint, str, num
        @ printf, helloStr, %$pstr, %$pnum, 0xBEEF
End

...

section .data
        helloStr: db "Hello, world! Your name is %s, your age is %d. Constant is 0x%X", 0x0A, 0

helloStr is a global variable that contains format string. 0x0A is a newline character, 0 is a string terminator. After it we pass parameters that contain a string and a number. As a last parameter we pass a constant value.

Results

As we can see, everything works correctly. To further investigate how it works and how compiler calls our assembly functions, you can try objdump -d main -M intel and see disassembly of a final executable.

toreonify@localhost:/local$ ./main
Return = 0xDEAD
1000 + 236 = 1236
(16 * 3) + 11 = 59
PID = 28486
Hello, world! Your name is Sam, your age is 22. Constant is 0xBEEF

Conclusion

Now you know how to spend less time writing boilerplate code in assembly!

One of downsides of this calling declaration is a lot of instructions. And those instructions are accessing memory, so it's not cheap on time. If you want to write performance optimized software, well, this may need some modifications to use registers and not stack for parameters.

Linux and other *nix systems use calling convention (System V ABI) that passes parameters through registers and if function needs more than a few parameters, through stack.

Git repository with code: Bitbucket

References

The 32 bit x86 C Calling Convention — Aaron Bloomfield

YASM User Manual

Thoughts? Leave a comment