Professional Documents
Culture Documents
2 MARKS
Computer architecture is defined as the fundamental operation of the individual unit. Computer
architecture is a specification detailing how a set of software and hardware technology standards interact
to form a computer system or platform
Computer H/W is the electronic circuit and electro mechanical equipment that constitutes the computer.
Computer hardware is the collection of all the parts you can physically touch
A memory that is smaller and faster than main memory and that is interposed between the CPU and main
memory. The cache acts as a buffer for the recently used memory location
A memory referenced instruction activated the READ or WRITE control line and doesnot affect the IO
device. Separate IO instruction are required to activate the READIO and WRITE IO lines, which cause a
word to be transferred between the address of IO port and CPU.
RISC CISC
Simple instructions take one cycle per operation Complex instruction takes multiple cycles per
operation
Few instruction and address modes are used Many instruction and address modes
Fixed format instructions are used Variable format instructions are used
Complexity in compiler Complexity in micro program
Highly pipelined Not pipelined
Embedded computers are computers that are lodged into other devices where the presence of computer is
not immediately obvious. These devices range from every day machine to handheld digital devices. They
have a wide range of processing power and cost.
8. Define CPI
The term Clock cycles Per Instruction which is the average number of clock cycles each instruction takes
to execute is abbreviated as CPI
Pipelining is used to overlap the execution of instruction and improve performance. This potential overlap
among instructions is called Instruction Level Parallelism (ILP). The instructions can be evaluated in
parallel.
The simplest and common way to increase the ILP is to exploit the parallelism among iterations of a loop.
This type of parallelism is often called loop level parallelism
Statistically scheduled
Dynamically scheduled
15. What is locality of reference?
Many instruction in localized area of the program, are executes repeatedly during some time period and
the remainder of the program is accessed relatively infrequently called as locality of reference
All cells on the corresponding row to be read and refreshed during both read and write operation. The
contents of DRAM are maintained each row of cell must be accessed periodically once every 2-16 ms.
The memory arithmetic and logic input and output units store and process information and perform I/O
and O/P operations of these unit must be co-ordinate in some way this is the task of control unit and is
effectively the nerve center that sends the control signal to other units.
An interrupt is an event that causes the execution of one program to be suspended and another program to
be executed.
Different steps in an instruction are completed in different parts of different instruction is parallel. Each of
these steps is called pipe stage, the time required between moving the instruction one step down the
pipeline is called processor cycle
The term latency is used to refer the amount of time it takes to transfer a word of data to or from the
memory. It is used to denote the time it takes to transfer the first word of data,. This time is usually longer
that the time needed to transfer each word of a block.
CPU time = Instruction Count * Clock Cycle Time * Cycles per Instruction
Computer organization refers to the operational units and their interconnections that realize the
architectural specifications. Eg. Hardware details transparent to the programmer such as control signals
and interface between the computer and peripherals and memory.
Hardware based speculation follows unpredicted flow of data values to choose when to execute
instructions. This method of executing programs is essentially a data flow execution. Operations executes
as soon as their operands are available.
24. Differentiate between Pipelining and ILP
Pipelining ILP
Single functional unit Multiple functional Unit
Instructions are issued sequentially Instructions are issued in parallel
Throughput increased by overlapping the instruction Instructions are not overlapped but executed in
execution parallel in multiple functional units
Very little extra hardware requires to implement Multiple functional units within CPU are required
pipelining
It is defined as hardware rearranges instruction execution to reduce stalls and it allows instruction behind
stall to process.
It is the measure of maximum performance attained. By minimizing each term the overall pipeline CPI
gets reduced and increases Instruction per clock.
Data dependence
Name dependence
Control dependence
A simple scheme for increasing the number of instructions relative to the branch and overhead
instructions is loop unrolling. It simply replicates the loop body multiple times adjusting loop termination
code.
vector processor is a central processing unit (CPU) that implements an instruction set containing
instructions that operate on one-dimensional arrays of data called vectors. Vector input is a set of inputs
provided to a system in order to test that system.
29. What is a vector processor?
Vector processor is the machines built to handle large scientific and engineering calculations. Their
performance derives from a heavily pipelined architecture which operations on vector and matrices are
effectively exploited.
Mask register provides conditional execution for each element operation in a vector instruction. The
vector mask control uses a Boolean vector to control the execution of Test instruction.
Strip mining is the generation of code such that each vector operation is done for a size equal or less than
maximum vector length
Loop level parallelism determine whether data accesses in later iterations are dependent on data values
produced in earlier iterations such as dependence is called loop carried dependence.
SSE is an SIMD instruction set extension designed by Intel and the goal of these extension has been to
accelerate carefully written libraries rather than for compiler to generate them.
Thread Block Scheduler- determines the number of thread blocks needed for loop and keep assigning
them in different multithreaded SIMD processor until loop gets completed
Thread Scheduler- defines which threads of SIMD instruction are ready to run and then it send them off to
a dispatch unit to be run on multithread SIMD processor.
GPU uses a single instruction multiple thread model where individual scalar instruction streams for each
CUDA thread are grouped together for SIMD execution on hardware.
Multiprocessor is used to increase the performance and improve availability. The different categories are
SISD, SIMD, MISD, MIMD
38. List parameters to optimize performance of vector architecture
These are multiple processors executing a single program and sharing the code and most of their address
space. The way of multiple processor share code and data in the way called as threads.
Consistency says in what order a processor must observe the data writes of another processor
It requires that the result of any execution be the same, as if the memory access executes by each
processor were kept in order and the accesses among different processors were interleaved,
It switches between threads on each instruction, causing the execution of multiple threads to be
interleaved.
It switches threads only on costly stalls. Thus it is much less likely to slow down the execution of an
individual thread.
The first hurdle is to do with the limits parallelism available in the programs and the second arises
from the high cost of communications.
The second major challenge in parallel processing involves the large latency of remote access in
parallel processor.
46. Which protocol is more suited for distributed shared memory architecture?
The protocols to maintain coherence for multiple processors are called cache coherence protocols. There
are two classes of protocols use different techniques to track the sharing status
Directory based
Snooping
47. What is cache miss and cache hit?
When a CPU finds a requested data item in a cache it is called cache miss. When the CPU does not find
that data item it needs in the cache called cache miss
Miss rate is the fraction of cache access that results in a miss. Miss penalty depends on the number of
misses and clock per miss.
Spreading multiple data over multiple disks is called striping which automatically forces access to several
disks.
Hot spares are extra disks that are not used in normal operation. When failure occurs, and idle hot spare is
pressed into services. Thus hot spare reduce the MTTR.
Mean number of tasks being serviced divided by service rate server utilization= Arrival rate/ Server rate.
The value should be between 0 and 1, otherwise there would be more task arriving than could be serviced.
Bus masters are services that initiate the read or write transactions. Eg. CPU is always a bus master. The
bus can have many masters when there are multiple CPUs and when the input devices can initiate bus
transactions.
A simple mapping that works well is to spread the address of the block sequentially across the banks
called sequential interleaving.
Non blocking cache allows data cache to continue to supply cache hits during a miss requires out-of-order
execution CPU
Access time - Access time is the time between when a read is requested and when the desired
word arrives.
Cycle Time - Cycle Time is the minimum time between requests.