- What is Assembly Language?
Assembly Language is a
low level programming language and considered as the native language of
computers. It is a close approximation of the binary machine code and is
referred to as Assembly Code.
It is the same as machine
language, only that instead of numbers, letter sequences which are easier to
memorize and understand are used to write commands. It maps human readable
mnemonics to machine instructions, thus allows machine level programming
without writing in the machine language.
- History of Assembly Language
Early computer systems
were programmed literally by hand. Front panel switches were used as input
device for entering instructions and data. They represented addresses, data,
and other significant function in the system. Specific switches were toggled to
operate.
For example, to be able
to run a specific program, a certain switch which represented a certain address
needs to be toggled. After that, another switch representing the data that
would be used in that address would also be toggled. When all of the
preparations were already made, the final run switch was then toggled signaling
the run of the program.
Basically, programming
back then required a certain talent in memorizing and focus for you to remember
every procedure that was needed for a certain program. The programmer also
needed to know every instruction set in a processor. It would allow the
programmer to convert those instructions into bit patterns so that the panel
switches would be set correctly.
Because of the fact that everything was being manipulated
manually, the programs were very much prone to errors. Not only that, they were
also very likely to be very slow because raw manpower was needed for the
program to run.
With the advent of new technologies, programs were written
to perform those manual entries on the premise of having a larger memory. Small
monitor programs that used hex keypads or terminals to enter instructions
became popular as well as paper tapes and punch cards which were used as storage
media for programs.
Since programs were still hand-coded, conversion from
mnemonics to instructions were still performed manually. Because of that,
programmers thought of a way to increase the efficiency of their every work
thus the idea of writing a program to interpret another was a major
breakthrough. This program would run as a translator of mnemonics to
instructions. Advantages of having such changes reduced errors, provided faster
translation times and easier editing.
|
Nathaniel Rochster
|
Nathaniel Rochster wrote the first assembler that was used in IBM 701 in 1954.
Assemblers are programs which generate machine code instructions from a
source code program written in assembly language. The features provided by an
assembler are:
- allows the programmer to use mnemonics
when writing source code programs.
- variables are represented by symbolic
names, not as memory locations
- symbolic code is easier to read and follow
- error checking is provided
- changes can be quickly and easily incorporated
with a re-assembly
- programming aids are included for relocation
and expression evaluation
In
writing assembly language programs for micro-computers, it is essential that a
standardized format be followed. Most manufacturers provide assemblers which
are programs used to generate machine code instructions for the actual
processor to execute.
The assembler converts the written assembly language source
program into a format which runs on the processor. Each machine code instruction
(the binary or hex value) is replaced by a mnemonic. A mnemonic is an abbreviation
which represents the actual instruction.
Binary
|
Hex
|
Mnemonic
|
01001111
|
4f
|
Clra
|
00110110
|
36
|
Psha
|
01001101
|
4d
|
tsta
|
CLRA - Clears the A accumulator
PSHA - Saves A accumulator on Stack
TSTA - Test A accumulator for 0
Mnemonics are used because they:
· are more meaningful than hex or binary values
· can reduce the risks of commiting errors
· are easier to remember than bit values
Assemblers also accept certain characters to represent number bases and addressing modes.
$ prefix or h suffix for hexadecimal
D for decimal numbers
24D 67
# for immediate addressing
,X for indexed addressing
Assembly language statements are written one
per line. They are machine code programs that consist of sequence of assembly
language statements, each of which contains a mnemonic. Each line of an
assembly language program is split into four fields, as shown below:
“LABEL” “OPCODE” “OPERAND” “COMMENTS"
The label field is optional. A label is an
identifier (or text string symbol). Labels are used extensively in programs to
reduce reliance upon programmers remembering where data or code is located. A
label can be used to refer to:
- a memory location
- the value of a piece of data
- the address of a program, sub-routine, code
portion etc.
The maximum length of a label differs between assemblers. Some accepts up
to 32 characters long while others accept only four characters. A label, when declared, is
suffixed by a colon, and begins with a valid character (A..Z). Consider the
following example.
Here, the label START is equal to the address of the instruction LDAA
#24H. The label is used in the program as a reference, eg,
This would result in the processor jumping to the location (address)
associated with the label START, thus executing the instruction LDAA #24H
immediately after the JMP instruction. When a label is referenced later on in
the program, it is done so without the colon suffix.
An advantage of using labels is that inserting or re-arranging code
statements do not necessitate re-working actual machine instructions. A simple
re-assembly is all that is required. In hand-coding, such changes can take
hours to perform.
Each instruction consists of an opcode and possible one or more
operands. In the instruction:
- the opcode is JMP and the operand is the address of the label START
The opcode field contains a mnemonic. Opcode stands for operation code,
ie, a machine code instruction. The opcode may also require additional
information (operands). This additional information is separated from the
opcode by using a space (or tab stop).
The operand field consists of additional information or data that the opcode
requires. In certain types of addressing modes, the operand is used to specify:
- constants or labels
- immediate data
- data contained in another accumulator or register
- an address
Examples of operands are:
- TAB
; operand specified by opcode
- LDAA
0100H ; two byte operand
- LDAA
START ; label operand
- LDAA
#0FH ; immediate operand
The comment field is optional and is used by the programmer to explain
how the coded program works. Comments are preceded by a semi-colon. The
assembler, when generating instructions from the source file, ignores all
comments. Consider the following examples:
ORG 0100H ;H means hexadecimal values
;This program starts at address 0100 hex
STATUS: DFB 23H ;This byte is identified as STATUS, and is
;initialized to a value of 23 hex
CODE: LDAA STATUS ;The label called CODE is identified as a
;machine code instruction which loads the
;An accumulator with the contents of the
;memory location associated with the label
;STATUS, ie, the value 23
JMP CODE ;Jump to the address associated with CODE
Note
that the programmer does not need to worry about bit patterns, hex values, and
the addresses of STATUS or CODE. The assembler, when fed the above program, will
generate the correct code. The code output from the assembler will be:
Memory
location Byte value
Location 0100 holds the value associated with the
label STATUS
Locations 0101 to 0103 perform the LDAA STATUS
instruction
Locations 0104 to 0106 perform the JMP
CODE instruction
The statement ORG 0100H in the above program is not a
machine code instruction. It is an instruction to the assembler, which
instructs the assembler to generate the code to run at the designated origin
address. Instructions to assemblers are called pseudo-ops. These are used for:
- reserving memory
for data variables, arrays and structures
- determining the
start address of the program
- determining the
entry address of the program
- initializing
variable values
The assembler does not generate any machine code instructions for
pseudo-ops or comments. Assemblers scan the source program generating machine
instructions. Sometimes, the assembler reaches a reference to a variable which
has not yet been defined. This is referred to as a forward reference problem.
The assembler can tackle this problem in a number of ways. It is resolved in a
two pass assembler as follows:
On the first pass, the assembler simply reads the source
file, counts up the number of locations that each instruction will take and
builds a symbol table in memory which lists all the defined variables
cross-referenced to their associated memory address.
On the second pass, the
assembler substitutes opcodes for the mnemonics and variable names are
replaced by the memory locations obtained from the symbol table.
- What do CS instructors
say about Assembly Programming?
- What can we say
about Assembly?
"Assembly
Language is like studying. It takes so much time. There are other programming
languages that are easier, faster and more comfortable to use, but we just have
to deal with it. We can't settle taking short cuts. "
- Mary Grace R. Lumenario
“Assembly
Language is sooooooo tiring… and at the same time mind bugging because of those
mnemonics. Several lines of codes in assembly language can just be a single
line or even a single built-in function in a certain high level language. I
don’t want assembly..:D”
- Angelo Paolo V. Aruta
include '%fasminc%/win32ax.inc'
.code
start:
invoke MessageBox,HWND_DESKTOP,"Hello
World!","Win32 Assembly",MB_OK
invoke ExitProcess,0
.end start
CMSC
124 T-4L
Aruta,
Angelo Paolo V. 2007-17678
Lumenario,
Mary Grace R. 2007-63926