Tuesday, September 18, 2012

NO Detours, NO Shortcuts - Just Assembly.

  • What is Assembly Language?

Assembly Language is a low level programming language and considered as the native language of computers. It is a close approximation of the binary machine code and is referred to as Assembly Code.

It is the same as machine language, only that instead of numbers, letter sequences which are easier to memorize and understand are used to write commands. It maps human readable mnemonics to machine instructions, thus allows machine level programming without writing in the machine language.

  •  History of Assembly Language
Early computer systems were programmed literally by hand. Front panel switches were used as input device for entering instructions and data. They represented addresses, data, and other significant function in the system. Specific switches were toggled to operate. 

For example, to be able to run a specific program, a certain switch which represented a certain address needs to be toggled. After that, another switch representing the data that would be used in that address would also be toggled. When all of the preparations were already made, the final run switch was then toggled signaling the run of the program.


Basically, programming back then required a certain talent in memorizing and focus for you to remember every procedure that was needed for a certain program. The programmer also needed to know every instruction set in a processor. It would allow the programmer to convert those instructions into bit patterns so that the panel switches would be set correctly.


 Because of the fact that everything was being manipulated manually, the programs were very much prone to errors. Not only that, they were also very likely to be very slow because raw manpower was needed for the program to run.

 With the advent of new technologies, programs were written to perform those manual entries on the premise of having a larger memory. Small monitor programs that used hex keypads or terminals to enter instructions became popular as well as paper tapes and punch cards which were used as storage media for programs.

 Since programs were still hand-coded, conversion from mnemonics to instructions were still performed manually. Because of that, programmers thought of a way to increase the efficiency of their every work thus the idea of writing a program to interpret another was a major breakthrough. This program would run as a translator of mnemonics to instructions. Advantages of having such changes reduced errors, provided faster translation times and easier editing.

  • Who's Who?    
Nathaniel Rochster
               Nathaniel Rochster wrote the first assembler that was used in IBM 701 in 1954.

Stan Poley 


                  Stan Poley is the author of SOPA (Symbolic Optimal Assembly Program). SOPA was written in 1955 and used as the assembly language for IBM 650.

  • What does it do?
          Assemblers are programs which generate machine code instructions from a source code program written in assembly language. The features provided by an assembler are:

  • allows the programmer to use mnemonics when writing source code programs.
  • variables are represented by symbolic names, not as memory locations
  • symbolic code is easier to read and follow
  • error checking is provided
  • changes can be quickly and easily incorporated with a re-assembly
  • programming aids are included for relocation and expression evaluation

           In writing assembly language programs for micro-computers, it is essential that a standardized format be followed. Most manufacturers provide assemblers which are programs used to generate machine code instructions for the actual processor to execute.

           The assembler converts the written assembly language source program into a format which runs on the processor. Each machine code instruction (the binary or hex value) is replaced by a mnemonic. A mnemonic is an abbreviation which represents the actual instruction.

Binary
Hex
Mnemonic
01001111
4f
Clra
00110110
36
Psha
01001101
4d
tsta


                              CLRA - Clears the A accumulator
                           PSHA - Saves A accumulator on Stack
                               TSTA - Test A accumulator for 0

            Mnemonics are used because they:
          · are more meaningful than hex or binary values
          · can reduce the risks of commiting errors
          · are easier to remember than bit values

            Assemblers also accept certain characters to represent number bases and addressing modes.

           $ prefix or h suffix for hexadecimal
           $24 or 24h


           D for decimal numbers

           24D 67
           B for binary numbers
           0101111B


           O or Q for octal numbers
           377O 232Q

           # for immediate addressing
           LDAA #$34

           ,X for indexed addressing
           LDAA 01,X

            Assembly language statements are written one per line. They are machine code programs that consist of sequence of assembly language statements, each of which contains a mnemonic. Each line of an assembly language program is split into four fields, as shown below:

LABEL”  “OPCODE”  “OPERAND”  “COMMENTS"

           The label field is optional. A label is an identifier (or text string symbol). Labels are used extensively in programs to reduce reliance upon programmers remembering where data or code is located. A label can be used to refer to:
  • a memory location 
  • the value of a piece of data 
  • the address of a program, sub-routine, code portion etc.

          The maximum length of a label differs between assemblers. Some accepts up to 32 characters long while others accept only four characters. A label, when declared, is suffixed by a colon, and begins with a valid character (A..Z). Consider the following example.

           START: LDAA #24H

           Here, the label START is equal to the address of the instruction LDAA #24H. The label is used in the program as a reference, eg,
  
           JMP START


         This would result in the processor jumping to the location (address) associated with the label START, thus executing the instruction LDAA #24H immediately after the JMP instruction. When a label is referenced later on in the program, it is done so without the colon suffix.

         An advantage of using labels is that inserting or re-arranging code statements do not necessitate re-working actual machine instructions. A simple re-assembly is all that is required. In hand-coding, such changes can take hours to perform.

         Each instruction consists of an opcode and possible one or more operands. In the instruction:

           JMP START

        - the opcode is JMP and the operand is the address of the label START

        The opcode field contains a mnemonic. Opcode stands for operation code, ie, a machine code instruction. The opcode may also require additional information (operands). This additional information is separated from the opcode by using a space (or tab stop).

         The operand field consists of additional information or data that the opcode requires. In certain types of addressing modes, the operand is used to specify:
  •    constants or labels
  •    immediate data
  •    data contained in another accumulator or register
  •    an address

         Examples of operands are:

  •    TAB ; operand specified by opcode 
  •    LDAA 0100H ; two byte operand
  •    LDAA START ; label operand
  •    LDAA #0FH ; immediate operand

         The comment field is optional and is used by the programmer to explain how the coded program works. Comments are preceded by a semi-colon. The assembler, when generating instructions from the source file, ignores all comments. Consider the following examples:

         ORG 0100H ;H means hexadecimal values
                          ;This program starts at address 0100 hex
         STATUS: DFB 23H ;This byte is identified as STATUS, and is
                                   ;initialized to a value of 23 hex
         CODE: LDAA STATUS ;The label called CODE is identified as a
                                       ;machine code instruction which loads the
                                       ;An accumulator with the contents of the
                                       ;memory location associated with the label
                                       ;STATUS, ie, the value 23
         JMP CODE  ;Jump to the address associated with CODE


          Note that the programmer does not need to worry about bit patterns, hex values, and the addresses of STATUS or CODE. The assembler, when fed the above program, will generate the correct code. The code output from the assembler will be:


                             Memory location          Byte value
                                  0100                             23
                                  0101                             B6
                                  0102                             01
                                  0103                             00
                                  0104                             7E
                                  0105                             01
                                  0106                             01

         Location 0100 holds the value associated with the label STATUS
         Locations 0101 to 0103 perform the LDAA STATUS instruction
         Locations 0104 to 0106 perform the JMP CODE instruction

         The statement ORG 0100H in the above program is not a machine code instruction. It is an instruction to the assembler, which instructs the assembler to generate the code to run at the designated origin address. Instructions to assemblers are called pseudo-ops. These are used for:

  •   reserving memory for data variables, arrays and structures
  •   determining the start address of the program
  •   determining the entry address of the program
  •   initializing variable values

         The assembler does not generate any machine code instructions for pseudo-ops or comments. Assemblers scan the source program generating machine instructions. Sometimes, the assembler reaches a reference to a variable which has not yet been defined. This is referred to as a forward reference problem. The assembler can tackle this problem in a number of ways. It is resolved in a two pass assembler as follows:

         On the first pass, the assembler simply reads the source file, counts up the number of locations that each instruction will take and builds a symbol table in memory which lists all the defined variables cross-referenced to their associated memory address. 

         On the second pass, the assembler substitutes opcodes for the mnemonics and variable names are replaced by the memory locations obtained from the symbol table.

  • What do CS instructors say about Assembly Programming?









  • What can we say about Assembly? 
"Assembly Language is like studying. It takes so much time. There are other programming languages that are easier, faster and more comfortable to use, but we just have to deal with it. We can't settle taking short cuts. "
      Mary Grace R. Lumenario

“Assembly Language is sooooooo tiring… and at the same time mind bugging because of those mnemonics. Several lines of codes in assembly language can just be a single line or even a single built-in function in a certain high level language. I don’t want assembly..:D”
      - Angelo Paolo V. Aruta

  • Sample Hello World
include '%fasminc%/win32ax.inc'

.code

  start:
        invoke  MessageBox,HWND_DESKTOP,"Hello World!","Win32 Assembly",MB_OK
        invoke  ExitProcess,0

.end start


  • References



  • Bloggers
CMSC 124 T-4L
Aruta, Angelo Paolo V.              2007-17678
Lumenario, Mary Grace R.       2007-63926


No comments:

Post a Comment