You are on page 1of 63

ECSE 426 Fall 2015

Microprocessor Systems

Dept. Electrical and Computer Engineering


McGill University

Instructor & TAs


Prof. Mark Coates

Office: McConnell Eng Bldg, Room 759


Phone: 514-398-7137
Email: mark.coates@mcgill.ca
Office hours: Mon. 11:30-12:30 or by appointment

Teaching Assistants
Ashraf Suyyagh, Zaid Al-bayati, Harsh Aurora
Andrey Tolstikhin, Loren Lugosch

Mark Coates, Fall 2015

ECSE 426, Lecture 1

Course Format
Lectures:
Monday, 10:05 AM - 11:25 AM, Trottier 0060

Tutorials:
Friday, 11:35 AM 12:55 PM, Trottier 0060
Start: Sep. 11

Labs

Monday-Friday, Trottier 4160


Demos on Friday (usually, although earlier options)
TAs available for 2-3 hours Mon.-Thurs. afternoon (calendar in
myCourses)
Start: Sep. 21 (Tutorial on Sep. 18)

Mark Coates, Fall 2015

ECSE 426, Lecture 1

Other Logistics
Course communications all through myCourses
Lectures, labs, manuals, discussion,
Respect others, use common sense
If you do feel the need to email me directly about course-related
matters, please include ECSE-426 in the subject line

Lab Groups and Room Access:


The 4 experiments will be conducted in pairs.
Reserve time slots for demonstrations.
The project will be conducted in groups of 4.

Mark Coates, Fall 2015

ECSE 426, Lecture 1

(No) Textbook; References

No textbook. Class notes, manuals.


Useful:
J. Yiu, The Definitive Guide to ARM Cortex-M3 and Cortex-M4
Processors, Newnes, 2013.
A. Tanenbaum and T. Austin, Structured Computer Organization,
sixth edition, Prentice Hall, 2012.
Mark Coates, Fall 2015

ECSE 426, Lecture 1

Grading
Labs 48%
4 labs in pairs of two : demonstrations and reports
Late reports: 5 percent per day penalty (Fri-Mon: one day)
Missed demo: reschedule for 65 percent of grade

Project 40%
Group of 4: demonstration and report

Quizzes 12%
Four 15 minute quizzes in class
Short answer & multiple choice
covers current lab and tutorial and recent lectures
Mark Coates, Fall 2015

ECSE 426, Lecture 1

Grading
Demos

performance, robustness, code quality, performance testing


individual grade for the demonstration

Reports or Lab notes

concise but comprehensive, 1 per group


detailed report guidelines will be posted. Follow these closely.
common grade for reports

Generally grades of group members will be very similar

Differentiation: response to questions or quality of components for


which each member is responsible.

Mark Coates, Fall 2015

ECSE 426, Lecture 1

Academic Integrity
McGill University values academic integrity. Therefore all students
must understand the meaning and consequences of cheating,
plagiarism and other academic offences under the Code of Student
Conduct and Disciplinary Procedures.
See http://www.mcgill.ca/integrity/ for more information.

What does this mean for this course?


Feel free to discuss solutions with classmates
And consult and use online resources but
Do your own programming & write your own reports

Mark Coates, Fall 2015

ECSE 426, Lecture 1

Avoid Plagiarism
Please make sure to reference any text, code, or online resources
you use to develop your solution
If you reproduce any figures, clearly state this in the figure caption:
Reproduced from [1]
If you re-use code from another source, clearly indicate this in your
code with appropriate comments

Mark Coates, Fall 2015

ECSE 426, Lecture 1

What Is This Course About?

Mark Coates, Fall 2015

ECSE 426, Lecture 1

10

Microprocessors
Enabling technology for general purpose computers and embedded
systems
Many, many applications
General purpose computers
PCs, workstations, servers, supercomputers
Embedded systems
Phones, medical, aviation, automobile, industrial
Real-time systems

10-Sep-15

ECSE 426
Microprocessor Systems

Course Goals
Provide the necessary understanding and skills to design and build
microprocessor systems.
By the end of the course, you should:
understand the organization and design principles of modern
microprocessor-based systems;
be proficient in assembly and high-level (C language) programming
for embedded systems;
understand the performance impact of the embedded software,
including the energy and memory-limited design techniques;
know how to connect peripheral devices and networking interfaces,
and how to write programs for the efficient interface use;
have experience in developing a realistic embedded system solution
through teamwork;
Mark Coates, Fall 2015

ECSE 426, Lecture 1

12

Course Goals (Restated)


Understand microprocessor-based systems
Become familiar with basic development tools
Develop skills in machine interfacing, assembler and embedded C
programming
Design a sizeable embedded system
Previous projects: Music player, file swapping system, PDAs (with
handwriting recognition), wireless data collection systems
Our project: indoor tracking
Build teamwork skills

Mark Coates, Fall 2015

ECSE 426, Lecture 1

13

Lecture Structure
Background
Computer architecture basics

Microprocessor Instruction Set Architecture


Embedded Processors
Embedded System Design
Hardware and software techniques

Building Real Systems


Techniques and tools
10-Sep-15

ECSE 426
Microprocessor Systems

Lab Structure
Four experiments + final project.

Experiment
Experiment
Experiment
Experiment

1 : Assembly and C
2: Intro. to hardware interfacing; drivers, timing
3: I/O, Interrupts, Servo-motors, Advanced Sensor Use
4: Real-time OS, Networking

Project:

Tracking through dead-reckoning


Drawing on a map
Wireless communication
LCD display, keypad interface

10-Sep-15

ECSE 426
Microprocessor Systems

Schedule of Lectures and Labs


There are (on average) per week 1 lecture hour, 1 tutorial hour, 4 lab hours, and 3 preparation hours
associated with this course. Over the course of the semester there will be 10 lectures, 5 tutorials, 4
labs and a project.

Class Schedule

Week

Lecture Material

Tutorials

Labs and Project

1 (Sep 7)

Introduction

Form lab groups

2 (Sep 14)

Assemblers, Lab Intro

Tutorial A
Assembly and C
Tutorial 1
Introduction to IDE and
assembly

3 (Sep 21)

Linker, loader,
processor architecture

4 (Sep 28)

Processor
Microarchitecture; Q1
Embedded Processors

Tutorial 2 - Introduction to
embedded C, IDE and drivers

IO/Processor
Interfacing; Q2
Buses, Networking,
Operating System; Q3

Tutorial 3 - Introduction to
timers, interrupts and MEMS

5 (Oct. 5)

6 (Oct 12)
7 (Oct 19)

Mark Coates, Fall 2015

8 (Oct 26)

Lab 1

Lab 2

Lab 2 and Demo


Lab 3

ECSE 426, Lecture 1

Embedded OS
Services

Lab 1 and Demo

Tutorial 4 Real time


Operating Systems

16

Lab 3 and Demo

3 (Sep 21)

Linker, loader,
processor architecture

4 (Sep 28)

Processor
Microarchitecture; Q1
Embedded Processors

Tutorial 2 - Introduction to
embedded C, IDE and drivers

IO/Processor
Interfacing; Q2
Buses, Networking,
Operating System; Q3

Tutorial 3 - Introduction to
timers, interrupts and MEMS

Embedded OS
Services
Real-time processing;
Q4
Project Intro

Tutorial 4 Real time


Operating Systems
Tutorial 5 Wireless and
writing drivers

5 (Oct. 5)

6 (Oct 12)
7 (Oct 19)

8 (Oct 26)
9 (Nov 2)
10 (Nov 9)

Lab 1

Class Schedule

Lab 1 and Demo


Lab 2

Lab 2 and Demo


Lab 3

Lab 3 and Demo


Lab 4 and Demo
Project

11 (Nov 16)
12 (Nov 23)

Project
Project

13 (Nov 30)

Project
Project Demo

14 (Dec 7)
As with any plan, this schedule is subject to some change.
Mark Coates, Fall 2015

ECSE 426, Lecture 1

17

Intro Material

Mark Coates, Fall 2015

ECSE 426, Lecture 1

18

Computer Architecture
Application of design principles on state-of-art architecture

ARM Cortex M processor family

The course focuses primarily on experimental work.


Present microprocessor principles mainly by example

Mark Coates, Fall 2015

ECSE 426, Lecture 1

19

Applications
Deciding Factors: cost, size, power, quantity

Mark Coates, Fall 2015

ECSE 426, Lecture 1

20

Example: Camera
Computer system with:
Image control
Hardware (lenses, motors)
Interfaces
Added sophistication to
consumer electronics
Expandability (of functions)
Connectivity

Lens

Optical
sensors

A/D
conversion
Motor

User
switches

Image
storage

System
controller

LCD
screen

Flash
unit

Computer
interface

Cable to PC

Mark Coates, Fall 2015

Figure 9.2. A simplified block diagram of a digital camera.

ECSE 426, Lecture 1

21

Views of Computers: Levels of Abstraction


Logic Level - Circuits
Logic functions implemented by gates (interfaces, buses, etc)

Architectural Level - Microarchitecture


Operations performed by resources (ALUs, registers, etc)

Instruction Set Level - Instructions


Program execution

Operating System Level - Complete system


System operation

Mark Coates, Fall 2015

ECSE 426, Lecture 1

22

Layered Computer Architecture


temp := v[k];
v[k] := v[k+1];
v[k+1] := temp;

Problem-oriented Language Level

Translation (Compiler)
Assembly Language Level

Translation
(Assembler)

Operating System Machine Level

lw $15, 0($2)
lw $16, 4($2)
sw $16, 0($2)
sw $15, 4($2)

OS - Partial Interpretation
Instruction Set Architecture Level

Interpreter
Microarchitecture Level

Digital Logic level


Mark Coates, Fall 2015

ECSE 426, Lecture 1

0000
1010
1100
0101

1001
1111
0110
1000

1100
0101
1010
0000

0110
1000
1111
1001

1010
0000
0101
1100

1111
1001
1000
0110

0101
1100
0000
1010

1000
0110
1001
1111

Hardware
23

Computer Organization
Processor
Microprocessor

Memory
Peripherals
Common Bus

Reproduced from Tanenbaum &


Austin
Mark Coates, Fall 2015

ECSE 426, Lecture 1

24

Microprocessor Operation
Start here at power-on or when a reset signal is
received

Reset

1.Output inst. address on address bus

Fetch

2. Read inst. pattern from memory onto data bus


3. Increment inst. pointer (program counter)

Decode

Determine what type of instruction was fetched

1. If necessary, read data from memory

Execute

2. Execute instruction
3. if necessary, write results to memory

ECSE 426, Lecture 1


Repeat this
process until power is turned off or the
processor is halted.

25

Common Processors
Mainly von Neumann
architecture
Arithmetic-logic unit
Registers
Auxiliary registers

Mark Coates, Fall 2015

ECSE 426, Lecture 1

Reproduced from Tanenbaum &


Austin
26

Processor Execution - Java code


public class Interp {
static int PC;
static int AC;
static int instr;
static int instr_type;
static int data_loc;
static int data;
static boolean run_bit = true;

// program counter holds address of next instr


// the accumulator, a register for doing arithmetic
// a holding register for the current instruction
// the instruction type (opcode)
// the address of the data, or 1 if none
// holds the current operand
// a bit that can be turned off to halt the machine

public static void interpret(int memory[ ], int starting_address) {


PC = starting_address;
while (runbit) {
instr = memory[PC];
// fetch next instruction into instr
PC = PC + 1;
// increment program counter
instr_type = get_instr_type(instr);
// determine instruction type
data_loc = find_data(instr, instr_type);
// locate data (1 if none)
if (data_loc >= 0)
// if data_loc is 1, there is no operand
data = memory[data_loc];
// fetch the data
execute(instr_type, data);
// execute instruction
}
}

private static int get_instr_type(int addr) { ... }


private static int find_data(int instr, int type) { ... }
private static void execute(int type, int data){ ... }
}

Mark Coates, Fall 2015

ECSE 426, Lecture 1

27

Processor Execution - Java code


public class Interp {
static int PC;
static int AC;
static int instr;
static int instr_type;
static int data_loc;
static int data;
static boolean run_bit

// program counter holds address of next instr


// the accumulator, a register for doing arithmetic
// a holding register for the current instruction
// the instruction type (opcode)
// the address of the data, or 1 if none
// holds the current operand
= true; // a bit that can be turned off to halt the
// machine

Mark Coates, Fall 2014

ECSE 426, Lecture 1

28

Processor Execution - Java code


public static void interpret(int memory[ ], int starting_address) {
PC = starting_address;
while (runbit) {
instr = memory[PC];
// fetch next instruction into instr
PC = PC + 1;
// increment program counter
instr_type = get_instr_type(instr);
// determine instruction type
data_loc = find_data(instr, instr_type); // locate data (1 if none)
if (data_loc >= 0)
// if data_loc is 1, there is no operand
data = memory[data_loc];
// fetch the data
execute(instr_type, data);
// execute instruction
}
}
private static int get_instr_type(int addr) { ... }
private static int find_data(int instr, int type) { ... }
private static void execute(int type, int data){ ... }
Mark Coates, Fall 2014

ECSE 426, Lecture 1

29

Microprocessor
Data
Cache
Memory
Bus
RAM

Bus
Interface
Unit

I/O

System
Bus

Control
Unit

Arithmetic
& Logic
Unit

Instruction
Decoder

Registers

Instruction
Cache
Mark Coates, Fall 2015

Floating
Point
Unit
Registers
30

Bus Interface Unit


Receives instructions & data from main memory
Instructions are then sent to the instruction cache,
data to the data cache
Also receives the processed data and sends it to the
main memory

Mark Coates, Fall 2015

ECSE 426, Lecture 1

31

Instruction Decoder
Receives the programming instructions
Decodes them into a form that is understandable by
the processing units, i.e., the ALU or FPU
Passes on the decoded instruction to the ALU or FPU

Mark Coates, Fall 2015

ECSE 426, Lecture 1

32

Arithmetic & Logic Unit (ALU)


Also known as the Integer Unit
Performs:
whole-number calculations (subtract, multiply, divide, etc.)
comparisons and logical operations (NOT, OR, AND, etc.)

More recent microprocessors:


multiple ALUs that can do calculations simultaneously

Mark Coates, Fall 2015

ECSE 426, Lecture 1

33

Floating-Point Unit (FPU)


Also known as the Numeric Unit
Performs calculations on numbers in scientific notation
(floating-point numbers)
Floating-point calculations are required in graphics,
engineering and science
The ALU can do these calculations as well, but very slowly

Mark Coates, Fall 2015

ECSE 426, Lecture 1

34

Registers
Small amount of super-fast private memory placed
right next to ALU & FPU for their exclusive use
The ALU & FPU store intermediate and final results
from their calculations in these registers
Processed data goes back to the data cache and
then to main memory from these registers

Mark Coates, Fall 2015

ECSE 426, Lecture 1

35

Control Unit
The brain of the microprocessor
Manages the whole uP
Tasks
fetching instructions & data
storing data
managing input/output devices

Mark Coates, Fall 2015

ECSE 426, Lecture 1

36

Enhancing the capability of a uP?


The computing capability of a uP can be enhanced in many
different ways:

Increasing the clock frequency

Increasing the word-width

More effective caching algorithm and the right cache size

Adding more functional units (e.g. ALUs, FPUs, etc.)

Improving the architecture

Mark Coates, Fall 2015

ECSE 426, Lecture 1

37

Basic Arch. Concepts - Pipelining


Makes processor run at high clock rate
But might take more clock cycles

Trick: overlap execution


Some overhead with the first few instructions

Reproduced from Tanenbaum &


Austin

Photo ALCE \ Fotolia.com

38

Pipelining - Reference
Pipeline
Connect data processing elements in series
Output of one element is input of the next
Execute in parallel (using time-slices)

Pipelining effects
Does not decrease processing time for a single datum
Increases throughput of the system when processing a stream of data
Using many pipelining stages causes increase in latency

More resources (processing units, memory) than when executing one


branch at the time
Stages cannot reuse resources of previous stage
Pipelining may increase the time required for an instruction to finish

Mark Coates, Fall 2015

ECSE 426, Lecture 1

39

Other Speedups Multiple Units


Bottlenecks execution in single pipeline units
ALU, especially floating point

Resolution provide multiple units

Reproduced from Tanenbaum & Austin

Mark Coates, Fall 2015

ECSE 426, Lecture 1

40

Superscalar Architectures
Common solution for modern processors
Multiple execution units

Reproduced from Tanenbaum & Austin

Mark Coates, Fall 2015

ECSE 426, Lecture 1

41

Multiple-core architectures

Intel slide

Mark Coates, Fall 2015

ECSE 426, Lecture 1

42

Multiple-core architectures

Mark Coates, Fall 2014

ECSE 426, Lecture 1

Intel slide

43

Memory
Hierarchy of memory units
Speed vs. size
Solutions
Caching
Virtual memory

Reproduced from Tanenbaum & Austin

Mark Coates, Fall 2015

ECSE 426, Lecture 1

44

The Main Memory Bottleneck


Modern uPs can process a huge amount of data in a short duration
They require quick access to data to maximize their performance
Data unavailable
literally stop and wait this results in
reduced performance and wasted power
Current uPs can process an instruction in about a ns.
To fetch data from main memory (RAM): order of 10-100 ns

Mark Coates, Fall 2015

ECSE 426, Lecture 1

45

Solution to the Bottleneck Problem


Make the main memory faster
Problem: 1-ns memory is extremely expensive as compared with
currently popular 10-100 ns memory
Alternative:
Add a small amount of ultra-fast RAM right next to the uP on the same chip
Make sure that frequently used data and instructions resides in that ultrafast memory

Advantage: Much better overall performance due to fast access


to frequently-used data and instructions

Mark Coates, Fall 2015

ECSE 426, Lecture 1

46

On-Chip Cache Memory (1)


On-Chip Cache Memory
Small amount of memory located on the same chip as the uP
May be multiple levels of caches

The uP stores a copy of frequently used data and


instructions in its cache memory
When the uP wants some data:
checks in the cache first.
only then does the uP ask for the same data from the main memory
Mark Coates, Fall 2015

ECSE 426, Lecture 1

47

On-Chip Cache Memory (2)


Small size and proximity to the uP
access times short

boost in performance

Predict what data will be required for future calculations


pre-fetch that data and place it in the cache
available immediately when the need arises

Speed-advantage of cache memory


Depends heavily on caching algorithm

Mark Coates, Fall 2015

ECSE 426, Lecture 1

48

Expanded View of the Memory Systems


Processor
Control

Cache

Datapath

Register

Speed: Faster
Size: Smaller
Cost: Higher

2nd Cache

Main
Memory

Cache is handled by hardware


Virtual memory is handled by OS
Programmer sees only one memory and registers
Mark Coates, Fall 2015

ECSE 426, Lecture 1

Hard disk
(Virtual
Memory)

Slowest
Biggest
Lowest

49

Memory Organization - Standards


Computer Word
Basic unit of access

The same memory


can be accessed in
different ways
Reproduced from Tanenbaum & Austin

Mark Coates, Fall 2015

ECSE 426, Lecture 1

50

Little Endian vs. Big Endian


Matter of
preference
Significant
implications for
compatibility
Some processors
can have both
Reproduced from Tanenbaum & Austin

Mark Coates, Fall 2015

ECSE 426, Lecture 1

51

Standardization ASCII set


Standardized way to
use bits for encoding

Characters
Display
Communication
File

Reproduced from
Tanenbaum & Austin

Mark Coates, Fall 2015

ECSE 426, Lecture 1

52

Programmers Model of Microprocessor


Instruction Set:
ldr
r0 , [r2, #0]
add
r2, r3, r4

Registers:
r0 - r3, pc

Addressing Modes:
ldr
r12, [r1,#0]
mov r1, r3
How to access data in registers
and memory

Memory:
80000004 ldr r0,[r2,#0]
80000008 add r2, r3, r4
8000000B 23456
80000010 AEF0
Memory mapped I/O
80000100 input
80000108 output

Programmers
Model

Software Build and Load


Typical flow for desktop computer
Object Files
Compiler

Loader

Linker

Assembler

Executable
Image File

Run-Time Library:

Operating System Image:

Boot
Process

Read-Write
Memory (RAM)

Example Program Creation & Run


Register file

CPU
PC

ALU
System bus

Memory bus
Main
memory

I/O
bridge

Bus interface

I/O bus
USB
controller
Mouse Keyboard

Graphics
adapter
Display

Disk
controller
Disk

Reproduced from
Tanenbaum & Austin

Expansion slots for


other devices such
as network adapters
hello executable
stored on disk
55

Reading Hello Command


Register file

CPU
PC

ALU
System bus

Memory bus
Main
memory

I/O
bridge

Bus interface

I/O bus
USB
controller

Graphics
adapter

Mouse Keyboard
Display
User types
"hello"

Disk
controller
Disk

Expansion slots for


other devices such
as network adapters
hello executable
stored on disk
56

Loading Hello Program


Register file

CPU
PC

ALU
System bus

Memory bus
Main
memory

I/O
bridge

Bus interface

I/O bus
USB
controller
Mouse Keyboard

Graphics
adapter
Display

Disk
controller
Disk

"hello,world\n"
hello code

Expansion slots for


other devices such
as network adapters
hello executable
stored on disk
57

Finally -Program Running


Register file

CPU
PC

ALU
System bus

Memory bus
Main "hello,world\n"
memory
hello code

I/O
bridge

Bus interface

I/O bus
USB
controller
Mouse Keyboard

Graphics
adapter
"hello,world\n"
Display

Disk
controller
Disk

Expansion slots for


other devices such
as network adapters
hello executable
stored on disk
58

Instruction Set Architecture


Interface between HW and SW
Virtual Machine

Operations
performed

Many possible
implementations

Given by
Resources
Processor Registers
Execution Units

Address
Bus
CPU

Data
Bus

Memory

Control
Bus

Operations
Instruction Types
Data Types
Addressing Modes

Operands and
results stored

HW

SW

i.e., where and how to address operands


59

Problem-oriented Language layer

Compiled to assembly or instruction set level


You will be using embedded C
How does this differ from usual use of C?

Directly write to registers to control processor operation

All of the registers have been mapped to macros

Important bit combinations have macros use these, please !

Registers are 32 bits, so int type is 4 bytes

Register values may change without your specific instructions

Limited output system

Floating point operations are inefficient, divide & square-root to


be avoided.

Mark Coates, Fall 2015

ECSE 426, Lecture 1

60

Assembly versus C
Efficiency of compiled code
Source code portability
Program maintainability
Typical bug rates (say, per thousand lines of code)
The amount of time it will take to develop the solution
Availability and cost of compilers and other development tools
Your personal experience (or that of the developers on your team)
with specific languages or tools
Dont rule out Java or C++ if you have the memory to play with.

Mark Coates, Fall 2015

ECSE 426, Lecture 1

61

Problem

Company Ostrich has recently re-developed their embedded


software for flagship products

Developed in assembly, 80 percent working, 2000 lines of code

Suddenly realized that the product isnt shippable

Bugs: system lock-ups indicative of major design flaws or


implementation errors & major product performance issues

Designer has left the company and provided few notes or comments

You are hired as a consultant. Do you:

Fix existing code?

Perform complete software redesign and implementation? In this


case, which language?

Mark Coates, Fall 2015

ECSE 426, Lecture 1

62

Jobs for This Week


Form pairs before next Friday (random assignment
thereafter)
Tutorial next week
Sign up in groups on myCourses

Mark Coates, Fall 2015

ECSE 426, Lecture 1

63

You might also like