You are on page 1of 86

Week 1

Objectives What is Architecture? (Structure, Perspectives & Views)

What is Architecture? A system of inter-related pieces Structure of Parts Changes in one part impacts the other Relationships more important than the pieces

The fundamental organization of a system embodied in its components, their relationships to each other and to the environment and the principles guiding its design and evolution. - IEEE Std. 1471 definition Environment: Complex, Uncertain, Poorly structured Quality Attributes: Non-functional aspects dealing with system behaviour

System Architect: Sole owner of designing the architecture and solving the problems with it. Design is iterative

Functionality More Concerned With Multiple Machines (distribution) Global Accessibility (bandwidth, latency, language) Integration with other Systems (Version compatibility & variations) Less Concerned With Widget selection Data Formats Algorithm selection

Constraints Design & Implementation decisions being made already Constraints flowing from architectural implementation to another (data, network, interfaces)

Quality Attributes Performance (Response & Throughput) o Sensitivity of latency requirements (controls, web pages, reporting) o Typical loads, peak hour loads and spike loads o Response time vs. Throughput vs. Scalability Availability & Reliability (How often does it work? What happens during a failure?) o How much tolerance for service outages (Recovery time and criticality of safety) o Disaster scale events Modifiability (How easy to change? Porting, protocols and adding of features) Security, Ease of Use and Time to Market

Quality Attributes by SEI

Concerns- Parameters from which attributes of system are judged, specified and measured. Requirements are expressed as concerns Attribute-specific factors- Properties of system and environment with an impact on the concerns. Attributes can be internal or external based on the internal and external properties Performance factors Dependability impairments (Aspects of system leading to lack of dependability) Security factors (aspects contributing to security including environment and internal features) Safety impairments (aspects contributing to lack of safety. Hazards are system states that may lead to mishaps that are unplanned events with undesirable consequences

Methods- How concerns are addressed

Performance
Smiths Definition Performance refers to responsiveness. The time required to respond to specific events or the number of events processed in a given time interval. This characterizes the timeliness of service delivered by the system.

Performance is not Speed Poor performance is not salvaged by just using better processors. For many system, faster alone is not enough to achieve timeliness when execution speed is just one factor. Objective of fast-computing is reducing the response time for a group of services, but real-time computing is the individual timing requirements of each service. Predictability, not speed, is foremost in real-time system design. Performance engineering is concerned with predictable performance which is worst-case or average-case The Scheduling Problem How to allocate shared resources when multiple demands need to be carried out on this same set of resources Performance Concerns Criteria to evaluate the schedule Timing constraints to responding to these events Throughput Number of event responses over a given observation interval Capacity Maximum achievable throughput under ideal workload conditions (Bandwidth megabits/s) Modes Mode characterized by the state of demand placed on the system (configuration of resources to satisfy demands)

Latency Time taken to respond to an event

Precedence: Specification for partial/ total ordering Processing rate is not of event responses enough (also specify Response time Jitter: Variation in observation intervals) as accompanied with computed result from you need to look at throughput requirement. Reduced capacity and cycle to cycle processing patterns in the Not to violate latency overload are periods Criticality: Importance period requirements* commonly experienced *Utilization is the percentage of time a resource is busy. Schedulable utilization is the maximum utilization achievable by a system while meeting timing requirements Performance Factors Behavioural patterns and intensity, Resource usage and software descriptions, Jobs and operations; these characterize system demand Execution environment, numbers and types of machines characterizing the system

Performance Methods Synthesis and analysis drawing on the queuing, scheduling theories and formal methods used to understand relationships between factors and concerns

Factors affecting Performance


Demand 1. Arrival Pattern: Periodic and aperiodic on how events come in 2. Execution Time: Time requirements for responding to each event. Worst and best case to help define boundary case behaviour System Resources used to execute event responses Resource Types: CPU, Memory, I/O, Backplane bus, Network, Data Object Software Services: Managing system resources that usually reside in the OS Resource Allocation

Real-time OS: Small, fast proprietary kernels; Real-time extensions of commercial OS; Research oriented OS Context switch times and the priority of OS services Interrupt latency Time during which interrupts are disabled Virtual memory and bounds on execution of system calls

Resource allocation policy to resolve contention for shared resources that influence the performance of systems

Methods
Synthesis: Methods to synthesize real-time methodologies intended to augment rather than supplant other engineering methodologies Analysis: 2 schools of thought in analysing system performance in queuing & scheduling analysis

Queuing Theory Model systems as one or more service facilities performing services for a stream of arriving customers. A server and queue for waiting customers. Concerned with average case aggregate behaviours good for performance capacity planning and management info systems Scheduling Theory Rooted in job-shop scheduling, it is applicable to performance analysis of real-time systems and offers valuable intuition. Computing utilization bounds and response times are key to help compare to a theoretically derived bound. Timing requirements are guaranteed where the utilization is kept beneath a specific bound

Dependability
Property of a system where reliance can be justifiably be placed on the service it delivers Availability- Readiness for usage Reliability- Continuity of service Safety- Non-occurrence of catastrophic consequences on the environment Confidentiality- Non-occurrence of unauthorized disclosure of information Integrity- Non-occurrence of improper alterations of information Maintainability- Aptitude to undergo repairs and evolution

Availability Availability measured as the limit of probability that the system is functioning at time t

Reliability Ability to continue operating over time, measured by MTTF, the expected life of the system Maintainability Aptitude to undergo repair and evolution. Quantitative measure of maintainability but doesnt tell the entire story. Built in diagnostics can reduce MTTR at the possible cost of extra memory, run-time or development. Safety In terms of dependability, safety is the absence of catastrophic consequences on the environment. Confidentiality Non-occurrence of unauthorized disclosure of information Integrity Non-occurrence of improper alteration of information

Impairments to Dependability
1. Failures

Domain Failures Value failures: Improper values computed that is inconsistent with proper execution of the system Timing failure: Service is delivered too early or too late Halting failure: Service no longer delivered

Perception of Failures Consistent failure: Same view of the failures Inconsistent failures: Some users have different perception of failures. Hardest to detect

Consequence on Environment A system that can only fail in a benign manner is termed fail-safe

2. Errors
System state that is at risk to lead to failure if not corrected; 3 factors determine if itll lead to failure i. ii. iii. Redundancy (designed or inherent in the system) System Activity (error may go away before damage is caused) What user deems as acceptable behaviour (in data transmission there is notion of acceptable error rate)

3. Faults
Hypothesised cause of an error and classed in accordance of

Cause Nature Phase of Creation Boundary Persistence

Physical: Occurs because of adverse physical phenomena (e.g. lightning) Human-made: Human imperfection like poor design, manufacture of misuse Accidental: Created by chance Intentional: Deliberate, with or without malice Can be created at developmental time or happen during operational time Internal: Parts of internal state of system where invoked will produce an error External: Induction by like radiation outside the system Temporary: Fault disappears over time. Transient faults are those from the external physical environment. Intermittent are for internal faults

Week 2
UML Distilled: Sequence Diagrams
1. Interaction Styles (Control)

Participants: Used in UML 2. If classes are used then follow object: class notation Found Message: Comes from undetermined source, no participant Centralized with 1 participant doing all the processing while others supply data

Distributed Control: Processing is split across participants with each handling a little of the algorithm Distribution helps to localize effects of CHANGE and introduce POLYMORPHISM rather than conditional logic

2. Creating & Deleting Participants

2 Types of Deletion: Self-deletion (the normal sort) and External Deletion In garbage-collected environments, deletion is not done directly, but the X is still useful to indicate when the object is no longer needed and ready to be collected.

3. Interaction Frames

Useful to help delimit a portion of the sequence diagram. Operators and guards used to execute the logic. Only frames whose logic is true will be executed.

Asynchronous Message: Stick arrowhead showing no need for a response Synchronous Message: Filled arrowhead showing the need for a response

UML Distilled: Deployment Diagrams

Systems physical layout showing which pieces of software run on what pieces of hardware

Node: Something capable of hosting some software and connected by communication pathways. 2 forms Device: Hardware like a computer or simpler hardware connected to a system Execution Environment: Software hosting or containing other software Nodes contain artefacts (physical manifestations of software; namely files like executables, data files, configuration files, HTML documents etc.). Listing the artefact within a node shows that it is deployed in there during runtime.

Wikipedia: TCP
TCP provides reliable, ordered, error-checked delivery of a stream of octets between programs running on computers connected to a local area network,intranet or the public Internet. It resides at the transport layer. Communications service between the application program and IP layer TCP handles the breaking of request to IP-sized chunks Network congestion, traffic load-balancing and unpredictable network behaviour, packets can be lost (sequence of octets containing a header and body) TCP detects these problems, requests retransmission of lost data and rearranges out-of-order data. These actions also help to reduce congestion by minimizing occurrence of other problems Abstracts application communication from network details Optimized for accurate rather than timely delivery; RTP, UDP is more suitable for real-time applications like VOIP (Voice over IP)

A. Segment Structure
Accepts data from data stream, divides it into chunks and adds a TCP header creating a TCP segment. TCP segment then encapsulated into an IP datagram and exchanged with peers. Processes transmit data by calling on the TCP and passing buffers of data as arguments. The TCP packages the data from these buffers into segments and calls on the internet module [e.g. IP] to transmit each segment to the destination TCP.[5] A TCP segment consists of a segment header and a data section. The TCP header contains 10 mandatory fields, and an optional extension field (Options, orange background in table). The data section follows the header. Its contents are the payload data carried for the application.

Source Port (16 bits) Destination Port (16 bits) Sequence Number (32 bits) having dual role SYN flag (1), initial sequence number. The sequence number of the actual first data byte and the acknowledged number in the corresponding ACK; this sequence number plus 1. SYN flag (0), accumulated sequence number of the first data byte of this segment for the current session.

Acknowledgement Number (32 bits) If the ACK flag is set then the value of this field is the next sequence number that the receiver is expecting. This acknowledges receipt of all prior bytes (if any). The first ACK sent by each end acknowledges the other end's initial sequence number itself, but no data. Data Offset (4 bits) Specifies the size of the TCP header in 32-bit words. The minimum size header is 5 words and the maximum is 15 words thus giving the minimum size of 20 bytes and maximum of 60 bytes, allowing for up to 40 bytes of options in the header. This field gets its name from the fact that it is also the offset from the start of the TCP segment to the actual data. Reserved (3 bits) For future use and to be set to zero Flags (9 bits) a.k.a. Control bits that are 9 1-bit flags 1. NS (1 bit) ECN-nonce concealment protection (added to header by RFC 3540). 2. CWR (1 bit) Congestion Window Reduced (CWR) flag is set by the sending host to indicate that it received a TCP segment with the ECE flag set and had responded in congestion control mechanism (added to header by RFC 3168). 3. ECE (1 bit) ECN-Echo indicates a. SYN flag (1), that the TCP peer is ECN capable. b. SYN flag (0), that a packet with Congestion Experienced flag in IP header set is received during 4. 5. 6. 7. 8. normal transmission (added to header by RFC 3168). URG (1 bit) indicates that the Urgent pointer field is significant ACK (1 bit) indicates that the Acknowledgment field is significant. All packets after the initial SYN packet sent by the client should have this flag set. PSH (1 bit) Push function. Asks to push the buffered data to the receiving application. RST (1 bit) Reset the connection SYN (1 bit) Synchronize sequence numbers. Only the first packet sent from each end should have this flag set. Some other flags change meaning based on this flag, and some are only valid for when it is set, and others when it is clear. FIN (1 bit) No more data from sender

9.

Window Size (16 bits)

Size of the receive window, which specifies the number of window size units (by default, bytes) (beyond the sequence number in the acknowledgment field) that the sender of this segment is currently willing to receive (see Flow control and Window Scaling) Checksum (16 bits) Used for error-checking of the header and data Urgent Pointer (16-bits) if the URG flag is set, then this 16-bit field is an offset from the sequence number indicating the last urgent data byte Options (Variable 0-320 bits divisible by 32) The length of this field is determined by the data offset field. Options have up to three fields: Option-Kind (1 byte), Option-Length (1 byte), Option-Data (variable). Option-Kind field indicates the type of option, and is the only field that is not optional. Depending on what kind of option we are dealing with, the next two fields may be set Option-Length field indicates the total length of the option Option-Data field contains the value of the option, if applicable

Padding TCP header padding is used to ensure that the TCP header ends and data begins on a 32 bit boundary. The padding is composed of zeros

B. Protocol Operation
3 phases 1. Connection establishment: Multi-step handshake process 2. Data Transfer 3. Connection Termination: Closes established virtual circuits. Release allocated resources. Description Server: Waiting connection request from any remote TCP and port Client: Waiting for matching connection request after firing a request out Server: Awaiting a confirming acknowledged after received and sending a connection request Description Server & Client: waiting for connection termination request from local user Server & Client: waiting for connection termincation request acknowledgement from remote TCP Server & Client: Waiting for acknowledgement of connection termination request previously sent to remote TCP (acknowledgement of its connection termination request) Server & Client: Maximum waiting time to be sure remote TCP receives acknowledgement of previous connection termination request

LISTEN SYN-SENT

CLOSEWAIT CLOSING

SYNRECEIVED

LAST-ACK

ESTABLISHED Server & Client: Open connection where data received can be delivered to the user

TIME-WAIT

FIN-WAIT-1

FIN-WAIT-2

Server & Client: Waiting for connection termination request from remote TCP or acknowledgement of connection termination request previously sent Server & Client: Waiting for connection termination request from remote TCP

CLOSED

Server & Client: No connection state at all

Connection Termination

C. Data Transfer
There are a few key features that set TCP apart from User Datagram Protocol:

Ordered data transfer the destination host rearranges according to sequence number[2] Retransmission of lost packets any cumulative stream not acknowledged is retransmitted[2] Error-free data transfer[14] Flow control limits the rate a sender transfers data to guarantee reliable delivery. The receiver continually hints the sender on how much data can be received (controlled by the sliding window). When the receiving host's buffer fills, the next acknowledgment contains a 0 in the window size, to stop transfer and allow the data in the buffer to be processed.[2] Congestion control [2]

Reliable Transmission TCP primarily uses a cumulative acknowledgment scheme, where the receiver sends an acknowledgment signifying that the receiver has received all data preceding the acknowledged sequence number. The sender sets the sequence number field to the sequence number of the first payload byte in the segment's data field, and the receiver sends an acknowledgment specifying the sequence number of the next byte they expect to receive. For example, if a sending computer sends a packet containing four payload bytes with a sequence number field of 100, then the sequence numbers of the four payload bytes are 100, 101, 102 and 103. When this packet arrives at the receiving computer, it would send back an acknowledgment number of 104 since that is the sequence number of the next byte it expects to receive in the next packet. Error Detection Sequence numbers allow receivers to discard duplicate packets and properly sequence reordered packets. Acknowledgments allow senders to determine when to retransmit lost packets.

Wikipedia: IP
The Internet Protocol (IP) is the principal communications protocol in the Internet protocol suite for relaying datagrams across network boundaries. Its routingfunction enables internetworking, and essentially establishes the Internet. IP, as the primary protocol in the Internet layer of the Internet protocol suite, has the task of delivering packets from the source host to the destination host solely based on the IP addresses in the packet headers. IP defines packet structures that encapsulate the data to be delivered. It also defines addressing methods that are used to label the datagram with source and destination information.

A. Datagram Construction
2 components: a header and a payload.

IP header is tagged with the source IP address, the destination IP address, and other meta-data needed to route and deliver the datagram Payload is the data that is transported.

B. IP Addressing & Routing


Assignment of IP addresses and associated parameters to host interfaces. The address space is divided into networks and sub-networks, involving the designation of network or routing prefixes.

IP routing is performed by all hosts, but most importantly by routers, which transport packets across network boundaries Communicate with one another via specially designed routing protocols, either interior gateway protocols or exterior, as needed for the topology of the network

IP provides only best effort delivery and service is characterized as unreliable; it being a connectionless protocol. Routing is dynamic where each packet is independent where the network maintains no state based on path of prior packets. Improper sequencing can occur where some packets are routed on a different path to their destination.

Lesson Notes
1. Internet Model

1. Application layer (user interface services and support services) Applications create user data and communicate this data to other applications on another or the same host. The communications partners are often called peers. This is where the higher level protocols such as SMTP, FTP, SSH, HTTP, etc. operate. 2. Transport layer (process-to-process) The transport layer constitutes the networking regime between two network processes, on either the same or different hosts and on either the local network or remote networks separated by routers. Processes are addressed via "ports," and the transport layer header contains the port numbers. UDP is the basic transport layer protocol, providing communication between processes via port addresses in the header. Also, some OSI session layer services such as flow-control, error-correction, and connection establishment

and teardown protocols belong at the transport layer. In the Internet protocol suite, TCP provides flow-control, connection establishment, and reliable transmission of data. 3. Network layer The internet layer has the task of exchanging datagrams across network boundaries. It provides a uniform networking interface that hides the actual topology (layout) of the underlying network connections. It is therefore also referred to as the layer that establishes inter-networking, indeed, it defines and establishes the Internet. This layer defines the addressing and routing structures used for the TCP/IP protocol suite. The primary protocol in this scope is the Internet Protocol, which defines IP addresses. Its function in routing is to transport datagrams to the next IP router that has the connectivity to a network closer to the final data destination. 4. Link layer This layer defines the networking methods within the scope of the local network link on which hosts communicate without intervening routers. This layer describes the protocols used to describe the local network topology and the interfaces needed to effect transmission of Internet layer datagrams to next-neighbor hosts.

2. Network Diagram

IP is the address; It determine where the packets go Much of replication both for load balancing and fault tolerance will depend on this underlying behavior.

TCP provides some reliability, and therefore requires a time-out Time-outs come up all the time in fault tolerance; computers cannot distinguish failure & silence.

Week 3- Hardware Architecture & Load Balancing


Readings (1) Leaky Abstractions
IP: Unreliable nature TCP: Reliable transmission that is organized and accurate (not garbled or corrupted) Leaky Abstraction TCP attempts to provide a complete abstraction of an underlying unreliable network, but sometimes, the network leaks through the abstraction and you feel the things that the abstraction can't quite protect you from. All non-trivial abstractions, to some degree, are leaky Abstraction fails. Sometimes a lot where there is leakage. E.g. in some cases, certain SQL queries are thousands of times slower than other logically equivalent queries. A famous example of this is that some SQL servers are dramatically faster if you specify "where a=b and b=c and a=c" than if you only specify "where a=b and b=c" even though the result set is the same.

The problem with abstractions is that, by encapsulating away the detail and structure of these abstractions, when leakages happen, developers abstracting from these issues are unable to resolve these bugs/ problems

Readings (2) Little Man Computer


Characteristics of the LMC CANT remember anything CANT multitask CANT understand anything more complicated than Go there now or 1+1 Components of the LMC 100 mailboxes: Storing of values 1 accumulator: Only number LMC currently remembers 1 input box: User input 1 output box: User output 1 instruction counter: Necessary for playing with using loops and conditionals Commands

Command ADD XX SUB XX STA XX LDA XX BRA XX BRZ XX BRP XX INP XX OUT HLT

Instruction ADD SUBTRACT STORE LOAD BRANCH BRANCH (IF ZERO) BRANCH (IF POSITIVE) INPUT OUTPUT HALT

Description Adds the value stored in mailbox XX into accumulator Subtracts the value stored in mailbox XX from accumulator Stores accumulator value to a mailbox XX Loads mailbox XXs value to accumulator Branches to a specific line of code. The value of XX will be the next value executed Branches to a specific line of code IF accumulator value is zero Branches to a specific line of code IF accumulator value is positive Asks for user input, and places value XX in accumulator Outputs value in accumulator Stops working.

Boxes can be labelled. Labelling allows a more englishfied version of looping as an option to the branch instead of specifying a line number in Branch XX

LMC in Pseudo-code

LMC Translating Code

Concurrency

Assuming mailbox 00 contains the value 1, the code can be broken into the following: L1 LDA X Load from Box X L2 BRP POS Branch to POS if positive L3 HLT Stop the program since we dont want POS to happen L4 SUB 00 Subtract value from box 00, which is 1 L5 STA X Store value into box X Since the code is multi-threaded, each thread runs at the same time, but the LMC is incapable of multi-tasking. So it hops from thread to thread as it goes. This means that it can switch at any time; at line 1, 2, 3 or 4. If it switches after line 5 consistently, all is well; the value in box X is always updated correctly before the next thread loads X. The final value of x is 0. If it switches at any other time, you get a problem. The value of box X is not updated before the switch; the LMCs values go haywire! Question: Switching at lines 1, 2, 3 or 4 all give inaccurate results. What result would they give?

Readings (3) Von Neumann Architecture


Stored program computer where an instructional fetch and a data operation cannot occur concurrently since they share a common bus

Includes by design an instruction set which can store in memory a set of instructions. Had to be done manually in the past Allows for self-modifying code that was important till the emergence of index registers and indirect addressing due to a need for the program to increment or modify the address portion of instructions o Self-modifying code: code that alters its own instructions during execution- usually to reduce instruction path length and improve performance or simply to reduce similarly repetitive code o Absolute Addressing: Address parameter itself + No modifications o PC-relative Instruction: Next instruction address + offset parameter. For referencing code before and after the instruction. Useful for connection with jumps to nearby instructions o Indirect Addressing: Contents of register reg. Effect is to transfer control to instruction whose address is in the specified register o Index Register: Processor register for modifying operand addresses during runtime. Usually added/ subtracted from address to give an effective address. In early systems without indirect addressing, operations have to be done by modifying the instruction address requiring several programs and more computer memory

This model evolved where memory-mapped I/O allows input and output devices to be treated the same as memory. A single system bus could be used to provide a modular system with lower cost. But this streamlining led to the Von Neumann bottleneck where the limited throughput (data transfer rate) between the CPU and memory compared to the amount of memory

1. 2. 3. 4. 5.

Program & data memory cannot be accessed simultaneously Throughput is smaller than the rate the CPU can work with CPU forced to wait for needed data to be transferred to/from memory Effective processing speed limited by the CPU performing minimal processing on large amounts of data Severity increases as CPU & memory sizes continue to improve

Resolving the Bottleneck Caching Modified Havard Architecture Storing copies of data Cache & Path Separation from frequently used where contents of main memory locations instruction memory are accessed as if they were Average latency of data memory accesses closer to cache than MM latency Word width, timing, implementation of data and program differs

Branch Predictor Algorithms For conditionals Not known whether a conditional jump will be taken or not until the condition has been calculated; the execution stage of the pipeline

Limited CPU stack/ Scratchpad Memory High-speed internal memory for temporary storage of calculations, data and other work in progress Simplification of caching logic guaranteeing the unit works w/o main memory consternation

2 simultaneous memory fetches Greater and more predictable memory bandwidth

Conditional jump tries to predict by guessing whether the jump is likely to happen and speculatively executes the branch most likely to be executed

Lesson Notes: Hardware & Architecture


1. Registers
On CPU storage: Only storage directly changed by arithmetic and control units Size in bits (how many bits for your laptops cpu?)

Modern CPUs have several dozen Instruction Program Counter Memory Address Memory Register General purpose (used by applications)

2. Load Balancing
Problem: More requests coming in than the LMC can handle Solution: Have clusters of Little Men

Hardware: Individual CPU cores execute very simple instructions, very fast, one at a time 1 addition ~ 3 instructions Allocate an array ~ a lot of instructions!

Registers are the only operational memory Speed of memory: Registers > RAM >>>>>>> Disk

Week 4- Parallelism
Readings: Parallel Computing
Software traditionally written for serial computation; run on a single CPU where problem is broken into discrete series of instructions that are executed one after another with only 1 at any point in time.

1. Parallel Computing
Simultaneous use of multiple computing resources to solve a computational problem running on multiple processors where the discrete parts can be solved concurrently with an overall control/coordination mechanism

Problem should be able to be: Broken down into discrete parts capable of being solved simultaneously Execute multiple program instructions at any point in time Solved in less time in multiple than a single compute resource

Why parallel computing? Saves time/money (Parallel computers built from cheap commodity components) Solution to larger problems (Complex problems impossible for limited computer memory) Concurrency enabled Use of non-local resources Limits to serial computing LLNL (Lawrence Livermore National Laboratory) Parallel Computer Compute Node: Each node a multi-processor parallel computer itself Infiniband Switch: Multiple compute nodes networked together Special purpose Nodes: Multiprocessor meant for other purposes

2. Parallel Computer Architectures


Shared Memory All processors share the same memory Uniform Access Memory (UMA) Identical processors & equal access & access times to memory For current and fastest computers with a mix of advantage and disadvantages GPUs perform computationally intensive kernels with local on-node data MPI (message passing model for communications) Hybrid Distributed Shared Memory

Distributed Memory

Non-Uniform Memory Access (NUMA) Not all processors have equal access Access across links are slower

Requires a communication network for inter-processor memory Advantage Memory scalable with processors No memory interference or overhead in keeping cache coherency and cost effectiveness Disadvantage Programmer responsible for data communication between processors Difficult to map existing data structures to such memory organization Advantage User friendly programming perspective to memory Fast & uniform data sharing due to proximity of memory to CPUs Disadvantage Lack of scalability between memory & CPUs Programmer responsible to ensure correctness of access to global memory Expense

3. Designing Parallel Programs


3.1. Partitioning Breaking problems into discrete chunks that can be distributed to multiple tasks aka decomposition Domain Decomposition

Data is associated with problem to be decomposed and each parallel task works on a portion of the data Functional Decomposition

Good for problems that can be split into different tasks (especially the independent sequential ones)

3.2. Communications This is dependent on the type of problem being solved Embarrassingly Parallel: Problems that do not need any inter-task communications (e.g. converting image pixels) Factors to Consider 1. Cost of Communications a. Inter-task communications always implies overheads b. Machine cycles and resources for computation are instead for packaging and transmitting data c. Requires synchronization between tasks, resulting in some waiting instead of doing work d. Competing traffic can saturate the network and aggravate performance issues 2. Latency vs. Bandwidth a. Latency: Time taken to send a message from Point A to B b. Bandwidth: Amount of data that can be communicated per unit time 3. Visibility of Communications a. Message Passing Model: Communications explicit and visible under the programmers control b. Data Parallel Model: Communications are transparent but inter-task communications are not exactly known 4. Synchronous vs. Asynchronous a. Synchronous requires some sort of handshaking to be accomplished b. Referred as blocking since other work must wait till communications is done c. Asynchronous communications allow for independent data transfers. Non-blocking d. Interleaving computation with communications is the greatest benefit 5. Scope of Communications a. Knowing which task must communicate with each other is critical during design b. Point-to-Point: 2 tasks with 1 acting as sender/producer of data. Other the receiver c. Collective: Data-sharing across more than 2 tasks

d. 6. Efficiency of Communications a. Programmer has choice of factors that can affect communications performance b. Which implementation model should be used? Performance varies c. What type of communication operations should be used? Async is faster d. Network media: Some platforms have more than one network for communications

3.3.

Synchronization Lock/ Semaphore Involves any number of tasks Used to serialize access to global data or section of code Only ONE task may own the lock at a time Others can attempt to own the lock but have to wait till current lock owner releases lock Synchronous Communication Only tasks executing a communication operation When a task performs a communication operation, some form of coordination needed

Barrier Implies all tasks are involved Each task performs till barrier is reached When the LAST task completes, all tasks are synchronized

3.4. Data Dependencies Dependence exists between program statements when order of execution affects the program results Data dependence results from multiple use of same location(s) in storage by different tasks

The value of A(J-1) must be computed before the value of A(J), therefore A(J) exhibits a data dependency on A(J-1). Parallelism is inhibited. If Task 2 has A(J) and task 1 has A(J-1), computing the correct value of A(J) necessitates:
o

Distributed memory architecture - task 2 must obtain the value of A(J-1) from task 1 after task 1 finishes its computation Shared memory architecture - task 2 must read A(J-1) after task 1 updates it

Parallelism is inhibited. The value of Y is dependent on: o Distributed memory architecture - if or when the value of X is communicated between the tasks. o Shared memory architecture which task last stores the value of X. Although all data dependencies are important to identify when designing parallel programs, loop carried dependencies are particularly important since loops are possibly the most common target of parallelization efforts.

Handling Data Dependencies


Distributed memory architectures - communicate required data at synchronization points. Shared memory architectures -synchronize read/write operations between tasks.

3.5.

Load Balancing - Distributing approximately equal amounts of work amongst tasks so that all tasks are kept busy all the time - Important where the slowest task is the performance bottleneck

How to achieve load balance? Equal Partitioning 1. For arrays/matrices, equally distribute data set across tasks 2. For loop iterations where work is similar, equally split iterations 3. For machines of varying performances, use some analysis tool to detect load imbalances Dynamic Work Assignment Some problems will have load imbalances even if data is evenly distributed, like sparse arrays where some tasks have data while others are mostly zeroes Scheduler Task Pool Approach: As each task completes, it queues to get new work An algorithm to detect and handle load imbalances may be necessary where they occur dynamically within code 3.6. Granularity Qualitative measure of the ratio of computation to communication; Periods of computation separated from communication via synchronization events. Fine Grained Parallelism Small amounts of computational work Low computation to communication ratio Facilitates load-balancing Implies high communication overhead and less opportunity for performance enhancement If too fine, it is possible that the communications overhead and synchronization is higher than computation

Coarse Grained Parallelism Large sets of computational work between communication/ synchronization events Advantageous for coarse granularity since comms and sync overheads are high Fine grained parallelism can help reduce overheads due to load imbalance

3.7. -

I/O

I/O operations are generally regarded as inhibitors to parallelism. I/O operations require an order of magnitude (or greater) amount of time than memory operations. Parallel I/O systems may be immature or not available for all platforms. In an environment where all tasks see the same file space, write operations can result in file overwriting. Read operations can be affected by file server's ability to handle multiple read requests simultaneously I/O that must be conducted over the network (NFS, non-local) can cause severe bottlenecks and even crash file servers.

4. Parallel Examples Array Processing

This example demonstrates calculations on 2dimensional array elements, with the computation on each array element being independent from other array elements. The serial program calculates one element at a time in sequential order. Serial code could be of the form:

do j = 1,n do i = 1,n a(i,j) = fcn(i,j) end do end do

The calculation of elements is independent of one another - leads to an embarrassingly parallel situation. The problem should be computationally intensive.

Array Processing Parallel Solution 1


Arrays elements are distributed so that each processor owns a portion of an array (subarray). Independent calculation of array elements ensures there is no need for communication between tasks. Distribution scheme is chosen by other criteria, e.g. unit stride (stride of 1) through the subarrays. Unit stride maximizes cache/memory usage. Since it is desirable to have unit stride through the subarrays, the choice of a distribution scheme depends on the programming language. See theBlock - Cyclic Distributions Diagram for the options. After the array is distributed, each task executes the portion of the loop corresponding to the data it owns. For example, with Fortran block distribution:

do j = mystart, myend do i = 1,n a(i,j) = fcn(i,j) end do end do

Notice that only the outer loop variables are different from the serial solution.

One Possible Solution:


Implement as a Single Program Multiple Data (SPMD) model. Master process initializes array, sends info to worker processes and receives results. Worker process receives info, performs its share of computation and sends results to master. Using the Fortran storage scheme, perform block distribution of the array. Pseudo code solution: red highlights changes for parallelism.
find out if I am MASTER or WORKER if I am MASTER initialize the array send each WORKER info on part of array it owns send each WORKER its portion of initial array receive from each WORKER results else if I am WORKER receive from MASTER info on part of array I own receive from MASTER my portion of initial array

# calculate my portion of array do j = my first column,my last column do i = 1,n a(i,j) = fcn(i,j) end do end do send MASTER results endif

Example MPI Program in C: mpi_array.c Example MPI Program in Fortran: mpi_array.f

Array Processing Parallel Solution 2: Pool of Tasks

The previous array solution demonstrated static load balancing: o Each task has a fixed amount of work to do o May be significant idle time for faster or more lightly loaded processors slowest tasks determines overall performance. Static load balancing is not usually a major concern if all tasks are performing the same amount of work on identical machines. If you have a load balance problem (some tasks work faster than others), you may benefit by using a "pool of tasks" scheme.

Pool of Tasks Scheme:

Two processes are employed Master Process:


o o o

Holds pool of tasks for worker processes to do Sends worker a task when requested Collects results from workers

Worker Process: repeatedly does the following Gets task from master process o Performs computation o Sends results to master Worker processes do not know before runtime which portion of array they will handle or how many tasks they will perform. Dynamic load balancing occurs at run time: the faster tasks will get more work to do.
o

Pseudo code solution: red highlights changes for parallelism.


find out if I am MASTER or WORKER if I am MASTER do until no more jobs if request send to WORKER next job else receive results from WORKER end do else if I am WORKER do until no more jobs request job from MASTER receive from MASTER next job calculate array element: a(i,j) = fcn(i,j) send results to MASTER end do endif

Discussion:

In the above pool of tasks example, each task calculated an individual array element as a job. The computation to communication ratio is finely granular. Finely granular solutions incur more communication overhead in order to reduce task idle time. A more optimal solution might be to distribute more work with each job. The "right" amount of work is problem dependent.

Parallel Examples

PI Calculation

The value of PI can be calculated in a number of ways. Consider the following method of approximating PI 1. Inscribe a circle in a square 2. Randomly generate points in the square 3. Determine the number of points in the square that are also in the circle 4. Let r be the number of points in the circle divided by the number of points in the square 5. PI ~ 4 r 6. Note that the more points generated, the better the approximation Serial pseudo code for this procedure:

npoints = 10000 circle_count = 0 do j = 1,npoints generate 2 random numbers between 0 and 1 xcoordinate = random1 ycoordinate = random2 if (xcoordinate, ycoordinate) inside circle then circle_count = circle_count + 1 end do PI = 4.0*circle_count/npoints

Note that most of the time in running this program would be spent executing the loop Leads to an embarrassingly parallel solution o Computationally intensive o Minimal communication o Minimal I/O

PI Calculation Parallel Solution

Parallel strategy: break the loop into portions that can be executed by the tasks. For the task of approximating PI: o Each task executes its portion of the loop a number of times. o Each task can do its work without requiring any information from the other tasks (there are no data dependencies). o Uses the SPMD model. One task acts as master and collects the results. Pseudo code solution: red highlights changes for parallelism.

npoints = 10000 circle_count = 0 p = number of tasks num = npoints/p find out if I am MASTER or WORKER do j = 1,num generate 2 random numbers between 0 and 1 xcoordinate = random1 ycoordinate = random2 if (xcoordinate, ycoordinate) inside circle then circle_count = circle_count + 1 end do if I am MASTER receive from WORKERS their circle_counts compute PI (use MASTER and WORKER calculations) else if I am WORKER send to MASTER circle_count endif

Example MPI Program in C: mpi_pi_reduce.c

Example MPI Program in Fortran: mpi_pi_reduce.f

Parallel Examples
Simple Heat Equation

Most problems in parallel computing require communication among the tasks. A number of common problems require communication with "neighbor" tasks. The heat equation describes the temperature change over time, given initial temperature distribution and boundary conditions. A finite differencing scheme is employed to solve the heat equation numerically on a square region. The initial temperature is zero on the boundaries and high in the middle. The boundary temperature is held at zero. For the fully explicit problem, a time stepping algorithm is used. The elements of a 2-dimensional array represent the temperature at points on the square. The calculation of an element is dependent upon neighbor element values.

A serial program would contain code like:

do iy = 2, ny - 1 do ix = 2, nx - 1 u2(ix, iy) = u1(ix, iy) + cx * (u1(ix+1,iy) + u1(ix-1,iy) - 2.*u1(ix,iy)) + cy * (u1(ix,iy+1) + u1(ix,iy-1) - 2.*u1(ix,iy)) end do end do

Simple Heat Equation Parallel Solution


I. II.

III.

IV.

V. VI.

Implement as an SPMD model The entire array is partitioned and distributed as subarrays to all tasks. Each task owns a portion of the total array. Determine data dependencies o interior elements belonging to a task are independent of other tasks o border elements are dependent upon a neighbor task's data, necessitating communication. Master process sends initial info to workers, and then waits to collect results from all workers Worker process calculates solution within specified number of time steps, communicating as necessary with neighbor processes Pseudo code solution: red highlights changes for parallelism.
find out if I am MASTER or WORKER if I am MASTER initialize array send each WORKER starting info and subarray receive results from each WORKER else if I am WORKER receive from MASTER starting info and subarray do t = 1, nsteps update time send neighbors my border info receive from neighbors their border info update my portion of solution array end do send MASTER results endif

VII. VIII.

Example MPI Program in C: mpi_heat2D.c Example MPI Program in Fortran: mpi_heat2D.f

Lesson Notes
1. Levels of Parallelism

2. Large Parallelism
Practical Problems Launching Masters/workers Allocating tasks to workers Tracking workers Handling Master failures Handling worker failures (common) Dealing with network failures Getting data to workers Getting data between workers Performance Limitations Data size >>> Memory Space Network is limited, and therefore slow Worlds current fastest computer won through network innovations, not processors or memory Put processing with, or near, data. Keep data local with its processing Stragglers In a large cluster, there is always some machine which is slow Stragglers can take up 30% of response time

3. Map Reduce

Creates master & workers Tracks and restart workers as needed Passes data between workers Ensures data locality Deals with stragglers

Week 5- Concurrency
Reading (1) Thread
Smallest sequence of programmed instructions that can be managed independently by OS scheduler Scheduler is a lightweight process Thread is contained in a process with other threads. Resources like memory are shared within processes In a single processor, multiplexing is used for time-division. Context-switching happens to switch tasks

1. Threads vs. Processes


Processes are independent, threads are subsets of processes Processes have more state information. Threads share them Threads share address spaces Processes interact only through OS-provided inter-process communication mechanisms Context switching in threads is faster

2. Multithreading
Allows processing across multiple cores and multiple CPUs in a cluster of machines; lending themselves to truly concurrent processing of tasks; Race conditions need to be observed. Race conditions are behaviors of software systems where output is dependent on sequence of uncontrollable events and becomes a bug when events dont happen in the intended order. 2 signals racing each other to influence output first. Threads may recover mutually exclusive operations to prevent concurrent modification of common data. Preemptive Multitasking Allows for the OS to determine when context switch should occur. Inappropriate switch may occur leading to lock convoys, priority inversion and other effects that can be avoided by cooperative multi-threading Cooperative Multithreading Relies on threads to relinquish control once theyre at a stopping point; creates problems if a thread is waiting for a resource to be available

3. Processes, Kernel Threads, User Threads & Fibers


Process Heaviest unit of kernel scheduling; own resources allocated by operating system such as memory, file handles, sockets, device handles and windows. Address spaces or file resources are not shared except through explicit methods like inheriting file handles or mapping to the same file in a shared way Typically preemptively multitasked Kernel Thread Lightest unit of scheduling; at least one exists within each process. Preemptively multitasked if the OS process scheduler is preemptive

Do not own resources except for a stack, copy of registers including program counter and the thread-local storage. User Threads Sometimes implemented in user space libraries where the kernel is unaware of them. Green threads are user threads implemented by virtual machines. Generally fast to create and manage but unable to take advantage of multi-processing/tasking if all associated kernel threads get blocked even when ready to run. Fibers Even lighter unit of scheduling that is cooperatively scheduled. It must explicitly yield to allow another fiber to run. This can be scheduled to run in any thread in the same process. It is a system-level construct similar to co-routines (language levels construct).

4. Thread & Fiber Issues


4.1. Concurrency & Data Structures - Sharing address spaces allow for tight coupling and exchange of data without IPC overhead o Inter-process communication is a set of methods for exchanging data amongst multiple threads in one or more processes o In Java, such processes includes pipes and sockets - Prone to race conditions that require more than one CPU instruction to update - Synchronizing primitives like mutexes help lock data structures against concurrent access o Mutually exclusive events 4.2. I/O and Scheduling For user threads/ fiber implementations existing entirely in user-space; context switches here are extremely efficient without any interaction with the kernel Problems happen when most OS user-spaces are performing blocking system calls Implementing a synchronous interface that uses non-blocking I/O internally as a solution

Reading (2) Concurrency


Property of system where several computations are executing simultaneously and potentially interacting with one another. Concurrent use of shared resources can be a source of indeterminacy leading to issues like deadlock and starvation. o Deadlock: Situation where 2 or more competing actions are waiting for each other to finish and never does then o Starvation: Case in multitasking where process is perpetually denied resources such that its task can never be completed

Reading (3) Context Switch


Process of storing and restoring state (context) of a process so that execution can be resumed from the same time at a later time; usually computationally intensive requiring actions like

Saving & loading registers and memory maps Updating tables and lists

Reading (4) Java Concurrency


Process Self-contained execution environment that communicates with IPC resources like pipes & sockets ProcessBuilder object used to create new processes Thread All applications start with 1 MAIN thread

1. Thread Objects
Instantiate a thread to directly control creation and management; pass application tasks to an executor to abstract thread management. 1.1. Defining & Starting a Thread Runnable Implementation

Subclass a Thread

More general Flexible to allow subclassing other objects

Thread implements Runnable though the run does nothing - Has methods useful for thread management and status of the thread

1.2. Pausing Execution with Sleep Thread.sleep allows the current thread to suspend execution for a specified period and makes the processor available for other application uses. 1.3. Interrupts Indicates to the thread to stop its current process and do something else. Thread.interrupted() : Checks if the current operation has been interrupted. Throwing an interrupted exception helps centralize code into a catch clause Interrupt Status Flag Thread.interrupt sets this flag and it is cleared when the Thread.interrupted method is invoked. A non-static isInterrupted method is used to check statuses of other threads and doesnt change this status flag 1.4. Join Allows 1 thread to wait for completion of the other till it ends; overloads allow us to specify the waiting period. Interrupt is responded to by exiting with an InterruptedException.

2. Synchronization
Threads communicate via sharing access to fields and the objects referenced to.

Extremely efficient 2 problems of thread interference and memory consistency errors

Synchronization prevents these errors but can lead to thread contention where 2 or more threads try to access the same resource, leading to one being executed more slowly (or even be suspended). 2.1. Thread Interference Interleaving: Where 2 operations running in different threads act on the same data 2.2. Memory Consistency Errors Different threads have inconsistent views of the same data (supposedly). Happens-before relationships are a guarantee that memory writes by a specific statement are visible to another statement. 2 actions that already create a happens-before relation 1. Thread.start: Every statement with a happens-before with that statement also has a happens-before with every statement executed by the new thread 2. Thread.join: All statements executed by the terminate thread have a happens-before relation with all statements following the successful join. Effects of code in thread is visible to the one performing the join 2.3. Synchronized Methods 1. Not possible to interleave: When 1 thread executes a synchronized method, others are blocked till this first is done with the object 2. Happens-before: When it exits the method, a happens-before relation is established with any subsequent invocation guaranteeing that state changes are visible to all threads 2.4. Intrinsic Locks & Synchronization Every object has an intrinsic lock associated with it which enforces exclusive access to its state and establishes a happens-before relation essential to visibility. When a thread releases an intrinsic lock, a happens-before relation is established between the action and any subsequent acquisition of the same lock. Synchronized Statements Unlike sync methods, statements must specify the object providing the intrinsic lock

This is use to synchronize changes within the object but to also avoid synchronizing invocations of other objects methods (which can create liveness problems). Reentrant Synchronization Allowing a thread to acquire a lock it already owns. Synchronized code, directly or indirectly, invokes a method already present in its control. Without this, many precautions have to be taken to avoid self-blocking.

2.5. Atomic Access An action that effectively occurs all at once; it cannot stop in the middle and happens either altogether or none. Side effects of the actions are visible only upon completion of the action. Reads and writes are atomic for reference variables and most primitives Reads and writes are atomic for all variables declared volatile (including long and doubles) o Any write to a volatile variable establishes a happens-before o Changes are always visible and effects of code also seen

Simple atomic access is more efficient but requires more care to avoid memory consistency errors

3. Liveness
Concurrent applications ability to execute in a timely manner. 3.1. Deadlock Where 2 or more threads are blocked forever waiting for each other. It is likely that both threads will block when attempting to invoke the return, but will never end since each thread is waiting for the other to exit

3.2. Starvation Thread cannot gain access to resources that have been consumed by greedy threads and hence unable to progress. Blocking ensues. 3.3. Livelock Threads often act in response to each other. If the other threads action is also a response to the action of another thread, then livelock happens where both are too busy responding to each other to resume work.

4. Guarded Blocks
Polling a condition till the block can proceed. Object has a notifyAll() method to inform all waiting threads that something important has happened.

Wait(): Tells calling thread to give up the monitor and sleep till another thread enters the same monitor and calls notify() Notify(): Wakes up the first thread that called wait() on the same object

5. Immutable Objects
Object is immutable if its state cannot change after construction. Since these objects cannot change state, they cannot be corrupted by thread interference or observed in an inconsistent state. 5.1. Strategy to define Immutable Objects

o o

Don't provide "setter" methods methods that modify fields or objects referred to by fields. Make all fields final and private. Don't allow subclasses to override methods. The simplest way to do this is to declare the class as final. A more sophisticated approach is to make the constructor private and construct instances in factory methods. If the instance fields include references to mutable objects, don't allow those objects to be changed: Don't provide methods that modify the mutable objects. Don't share references to the mutable objects. Never store references to external, mutable objects passed to the constructor; if necessary, create copies, and store references to the copies. Similarly, create copies of your internal mutable objects when necessary to avoid returning the originals in your methods.

6. High Level Concurrency


Lock Objects: Locking idioms that simplify concurrent applications Executors: Launching and managing threads

Concurrent Collections: Manage large collections of data & reduce the need for synchronization Atomic Variables: Features helping to minize memory consistency errors ThreadLocalRandom: Efficient generation of pseudorandom numbers off multiple threads

6.1. Lock Objects Implementation of a simple version of Reentrant Lock; unlike the normal implicit lock is that this backs out of an attempt to acquire a lock (i.e. tryLock method; lockInterruptibly backs out if another thread sends in an interrupt before acquisition of lock)
import java.util.concurrent.locks.Lock; import java.util.concurrent.locks.ReentrantLock; import java.util.Random; public class Safelock { static class Friend { private final String name; private final Lock lock = new ReentrantLock(); public Friend(String name) { this.name = name; } public String getName() { return this.name; } public boolean impendingBow(Friend bower) { Boolean myLock = false; Boolean yourLock = false; try { myLock = lock.tryLock(); yourLock = bower.lock.tryLock(); } finally { if (! (myLock && yourLock)) { if (myLock) { lock.unlock(); } if (yourLock) { bower.lock.unlock(); } } } return myLock && yourLock; } public void bow(Friend bower) { if (impendingBow(bower)) { try { System.out.format("%s: %s has" + " bowed to me!%n", this.name, bower.getName()); bower.bowBack(this); } finally { lock.unlock(); bower.lock.unlock(); } } else { System.out.format("%s: %s started" + " to bow to me, but saw that" + " I was already bowing to"

+ " him.%n", this.name, bower.getName()); } } public void bowBack(Friend bower) { System.out.format("%s: %s has" + " bowed back to me!%n", this.name, bower.getName()); } } static class BowLoop implements Runnable { private Friend bower; private Friend bowee; public BowLoop(Friend bower, Friend bowee) { this.bower = bower; this.bowee = bowee; } public void run() { Random random = new Random(); for (;;) { try { Thread.sleep(random.nextInt(10)); } catch (InterruptedException e) {} bowee.bow(bower); } } } public static void main(String[] args) { final Friend alphonse = new Friend("Alphonse"); final Friend gaston = new Friend("Gaston"); new Thread(new BowLoop(alphonse, gaston)).start(); new Thread(new BowLoop(gaston, alphonse)).start(); } }

6.2. Executors Separate thread management and creation from the rest of application 6.2.1. Executor Interfaces - Executor: Simple interface supporting the launching of new tasks - ExecutorService: Sub interface of executor adding features that manage the lifecycle of both individual tasks and executor itself - ScheduledExecutorService: Subinterface of ES supporting future/ periodic execution of tasks Executor Execute does the same thing as Thread(r).start() but is more likely to use an existing worker thread to run or place it in a queue to wait for a worker thread to become available

ExecutorService Uses the more versatile submit method which accepts Runnable, Callable objects that allows the return of Future objects. Methods are also provided to manage shutdown of executor but tasks should also be able to handle interrupts correctly

public class CallableFutures { private static final int NTHREDS = 10;

public static void main(String[] args) {

ExecutorService executor = Executors.newFixedThreadPool(NTHREDS); List<Future<Long>> list = new ArrayList<Future<Long>>(); for (int i = 0; i < 20000; i++) { Callable<Long> worker = new MyCallable(); Future<Long> submit = executor.submit(worker); list.add(submit); } long sum = 0; System.out.println(list.size());

// now retrieve the result


for (Future<Long> future : list) { try { sum += future.get(); } catch (InterruptedException e) { e.printStackTrace(); } catch (ExecutionException e) { e.printStackTrace(); } } System.out.println(sum); executor.shutdown(); } }

Results can only be retrieved when computation is done, blocking where necessary

ScheduledExecutorService Supplements with a schedule method that executes a Runnable or Callable after a specified delay. Specific tasks can also be supplanted repeatedly at defined intervals with scheduleWithFixedDelay
import static java.util.concurrent.TimeUnit.*; class BeeperControl { private final ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1); public void beepForAnHour() { final Runnable beeper = new Runnable() { public void run() { System.out.println("beep"); } }; final ScheduledFuture<?> beeperHandle = scheduler.scheduleAtFixedRate(beeper, 10, 10, SECONDS); scheduler.schedule(new Runnable() { public void run() { beeperHandle.cancel(true); } }, 60 * 60, SECONDS); } }

ScheduledFuture<?> schedule(Runnable command, long delay, TimeUnit unit)


Creates and executes a one-shot action that becomes enabled after the given delay.

Parameters:
command - the task to execute delay - the time from now to delay execution unit - the time unit of the delay parameter

Returns:
a ScheduledFuture representing pending completion of the task and whose get() method will return nullupon completion

6.2.2. Thread Pools Most executor implementations use thread pools consisting of worker threads; exists separately from Runnable and Callable tasks it executes and is usually used for multiple tasks Minimize overhead due to thread creation (allocating and de-allocating many threads causes significant memory overhead management) Advantage of degrading gracefully by allowing system to handle threads according to how much it can handle and not how fast they come in

Core Pool Size


ThreadPoolExecutor will automatically adjust the pool size (see

getPoolSize()) according to the bounds set by

corePoolSize (see getCorePoolSize()) and maximumPoolSize (see getMaximumPoolSize()). When a new task is submitted in method execute(java.lang.Runnable), and fewer than corePoolSize threads are running, a new thread is created to handle the request, even if other worker threads are idle.

On-demand Construction
By default, even core threads are initially created and started only when new tasks arrive, but this can be overridden dynamically using method prestartCoreThread() or prestartAllCoreThreads(). You probably want to prestart threads if you construct the pool with a non-empty queue.

Creating New Threads


By supplying a different ThreadFactory, you can alter the thread's name, thread group, priority, daemon status, etc. If a ThreadFactory fails to create a thread when asked by returning null from newThread, the executor will continue, but might not be able to execute any tasks.

Keep-Alive Times
Means of reducing resource consumption when the pool is not being actively used. If the pool becomes more active later, new threads will be constructed.

Queuing
Direct handoffs. A good default choice for a work queue is a SynchronousQueue that hands off tasks to threads without otherwise holding them. Here, an attempt to queue a task will fail if no threads are immediately available to run it, so a new thread will be constructed. This policy avoids lockups when handling sets of requests that might have internal dependencies. Direct handoffs generally require unbounded maximumPoolSizes to avoid rejection of new submitted tasks. This in turn admits the possibility of unbounded thread growth when commands continue to arrive on average faster than they can be processed.

6.2.3. Fork/Join An extension of the AbstractExecutorService class and implements a work-stealing algorithm to execute tasks that are queued to other busy worker threads Performing the blur is accomplished by working through the source array one pixel at a time. Each pixel is averaged with its surrounding pixels (the red, green, and blue components are averaged), and the result is placed in the destination array. Since an image is a large array, this process can take a long time. You can take advantage of concurrent processing on multiprocessor systems by implementing the algorithm using the fork/join framework. Here is one possible implementation:
public class ForkBlur extends RecursiveAction { private int[] mSource; private int mStart; private int mLength; private int[] mDestination; // Processing window size; should be odd. private int mBlurWidth = 15; public ForkBlur(int[] src, int start, int length, int[] dst) { mSource = src; mStart = start; mLength = length; mDestination = dst; } protected void computeDirectly() { int sidePixels = (mBlurWidth - 1) / 2; for (int index = mStart; index < mStart + mLength; index++) { // Calculate average. float rt = 0, gt = 0, bt = 0;

for (int mi = -sidePixels; mi <= sidePixels; mi++) { int mindex = Math.min(Math.max(mi + index, 0), mSource.length - 1); int pixel = mSource[mindex]; rt += (float)((pixel & 0x00ff0000) >> 16) / mBlurWidth; gt += (float)((pixel & 0x0000ff00) >> 8) / mBlurWidth; bt += (float)((pixel & 0x000000ff) >> 0) / mBlurWidth; } // Reassemble destination pixel. int dpixel = (0xff000000 ) | (((int)rt) << 16) | (((int)gt) << 8) | (((int)bt) << 0); mDestination[index] = dpixel; } } ...

If the previous methods are in a subclass of the RecursiveActionclass, then setting up the task to run in a ForkJoinPool is straightforward, and involves the following steps: 1. Create a task that represents all of the work to be done.
2. // source image pixels are in src 3. // destination image pixels are in dst 4. ForkBlur fb = new ForkBlur(src, 0, src.length, dst);

5. Create the ForkJoinPool that will run the task.


6. ForkJoinPool pool = new ForkJoinPool();

7. Run the task.


pool.invoke(fb);

6.3. Concurrent Collections Help avoid a memory consistent error by defining a happens-before relationship between an operation that adds an object to the collection with subsequent operations accessing or removing the object

defines a first-in-first-out data structure that blocks or times out when you attempt to add to a full queue, or retrieve from an empty queue. ConcurrentMap is a subinterface of java.util.Map that defines useful atomic operations. These operations remove or replace a key-value pair only if the key is present, or add a key-value pair only if the key is absent. Making these operations atomic helps avoid synchronization. The standard general-purpose implementation of ConcurrentMap isConcurrentHashMap, which is a concurrent analog ofHashMap. ConcurrentNavigableMap is a subinterface ofConcurrentMap that supports approximate matches. The standard general-purpose implementation ofConcurrentNavigableMap is ConcurrentSkipListMap, which is a concurrent analog of TreeMap.
BlockingQueue

6.4. Atomic Variables Atomic operations with get and set methods that work like read and writes on volatile variables. A set has a happens-before relation with any get on the same variable
import java.util.concurrent.atomic.AtomicInteger; class AtomicCounter { private AtomicInteger c = new AtomicInteger(0); public void increment() { c.incrementAndGet(); } public void decrement() { c.decrementAndGet(); } public int value() { return c.get(); } }

6.5. Concurrent Random Numbers A convenience class for applications expecting to use random numbers from multiple threads or ForkJoinTasks
int r = ThreadLocalRandom.current() .nextInt(4, 77);

Lesson Notes

class RWDictionary { private final Map<String, Data> m = new TreeMap<String, Data>(); private final ReentrantReadWriteLock rwl = new ReentrantReadWriteLock(); private final Lock r = rwl.readLock(); private final Lock w = rwl.writeLock(); public Data get(String key) { r.lock(); try { return m.get(key); } finally { r.unlock(); } } public String[] allKeys() { r.lock(); try { return m.keySet().toArray(); } finally { r.unlock(); }

} public Data put(String key, Data value) { w.lock(); try { return m.put(key, value); } finally { w.unlock(); } } public void clear() { w.lock(); try { m.clear(); } finally { w.unlock(); } } }

Lock Policies
Where multiple threads are waiting for a lock, the acquisition policy can have significant impact on response times and efficiency Non-Fair (default) When continuously contended, it may indefinitely postpone one or more reader/writer threads but will normally have higher throughput than a fair lock. Every thread attempts to acquire in random order and may jump queue. Fair Keep a FIFO queue of requests with no jumping of queue. For reentrant read-write locks
When the currently held lock is released either the longest-waiting single writer thread will be assigned the write lock, or if there is a group of reader threads waiting longer than all waiting writer threads, that group will be assigned the read lock. A thread that tries to acquire a fair write lock (non-reentrantly) will block unless both the read lock and write lock are free (which implies there are no waiting threads). (Note that the nonblockingReentrantReadWriteLock.ReadLock.tryLock() andReentrantReadWriteLock.Write Lock.tryLock() methods do not honor this fair setting and will acquire the lock if it is possible, regardless of waiting threads.)

Write-Policy Writes have priority over reads (they are put in front of reads in the queue) Alternately think of two queues, with the write queue always served first.

Granularity: Row vs. Table


Row level locking Advantages
Fewer lock conflicts when different sessions access different rows Fewer changes for rollbacks

Disadvantages
Requires more memory than page-level or table-level locks Slower than page-level or table-level locks when used on a large part of the table because you must acquire many more locks

Possible to lock a single row for a long time

Slower than other locks if you often do GROUP BY operations on a large part of the data or if you must scan the entire table frequently

Table Locks Suitable for the following cases


Most statements for the table are reads Statements for the table are a mix of reads and writes, where writes are updates or deletes for a single row that can be fetched with one key read:
UPDATE tbl_name SET column=value WHERE unique_key_col=key_value; DELETE FROM tbl_name WHERE unique_key_col=key_value;

SELECT combined with concurrent INSERT statements, and very

few UPDATE or DELETE statements Many scans or GROUP BY operations on the entire table without any writers

Week 6- Transactions
Reading (1) Wikipedia ACID
Atomicity : All or nothing. If part of the transaction fails, the entire set fails and the database is unchanged Consistency : Any transaction state will bring the DB from 1 valid state to another and that programming errors do not violate any rules Isolation Durability : Concurrent execution of transactions result in a system state obtained as if executed serially : Committed transaction remains so even if theres a crash, power loss or errors

ACID Failures
An integrity constraint requires that the value in A and the value in B must sum to 100. (Constraints ensuring accuracy and consistency in an RDBMS)

Atomicity Failure After removing 10 from A, the transaction is unable to modify B and the DB retained As value. Atomicity and the constraint is violated. There is partial failure. Consistency Failure Assuming a transaction attempts to subtract 10 from A without touching B. If the DB does record A+B=90, then the constraint is violated and entire transaction must be canceled and rolled back to previous state. Isolation Failure
Consider two transactions. T1 transfers 10 from A to B. T2 transfers 10 from B to A. Combined, there are four actions: T1 subtracts 10 from A. T1 adds 10 to B. T2 subtracts 10 from B. T2 adds 10 to A.

If these operations are performed in order, isolation is maintained, although T 2 must wait. Consider what happens if T1 fails half-way through. The database eliminates T1's effects, and T2 sees only valid data. By interleaving the transactions, the actual order of actions might be: T1 subtracts 10 from A. T2 subtracts 10 from B. T2 adds 10 to A. T1 adds 10 to B.

Again, consider what happens if T 1 fails halfway through. By the time T1 fails, T2 has already modified A; it cannot be restored to the value it had before T 1 without leaving an invalid database. This is known as a write-write [citation needed] failure, because two transactions attempted to write to the same data field. In a typical system, the problem would be resolved by reverting to the last known good state,

Durability Failure Transaction is complete in runtime and is queued in the disk buffer waiting to be writer. Power fails and the changes are lost. User assumes that the change is already made. Locking vs. Multi-versioning Non-trivial transactions usually require a large number of locks that result in substantial overhead and may block other transactions. If B is attempting to modify what A is already working on, B must wait and this is 2-phase locking to guarantee full isolation (2PL) 2-Phase Locking (2PL) Expanding Phase: Locks are acquired and none released Shrinking Phase: Locks released and none acquired Only when a transaction has entered ready state in all its processes that it is ready to be committed Distributed Transactions In a distributed DB where no single node is responsible for all data affecting a transaction will present additional complications like network connection failures, node failures etc. 2-Phase commit protocol provides atomicity for distributed transactions. 2-Phase Commit (2PC) Specialized type of consensus protocol that coordinates all processes participating in a distributed atomic transaction on whether to commit or abort the transaction Commit-Request Phase : Coordinator prepares all transactions participating processes by voting to commit (if local portion execution has ended properly) or abort (if a problem has been detected with the local portion). Commit Phase : Coordinator decides whether to commit (only when ALL have voted COMMIT) or abort and notifies all cohorts of the result (to follow the needed actions) with local transactional resources and respective portions in the transactions other output (where applicable)

Assumptions 1. One node the master site and designated the coordinator. Others are designated cohorts. 2. Protocol assumes there is stable storage at each node with a write-ahead log (provides durability and atomicity by writing modifications to a log including redo and undo information) 3. No node crashes forever and data in the log is never lost/corrupted in a crash 4. Any 2 nodes can communicate with each other Disadvantage Blocking protocol. If the coordinator fails permanently, some cohorts will never resolve their transactions. After a cohort sends an agreement message, itll block till a commit/rollback is received

Reading (2) Isolation [Database Systems]


Concurrency control contains the underlying mechanisms of a DBMS where isolation is handled and guarantees related correctness. Constraining DB access operations does mean reduced performance (rates of execution) and thus attempt to provide the best performance under these constraints Serializability property is often compromised for better performance when possible without harming correctness.

1. Isolation Levels
Trade-offs here includes the locking overhead due to high isolation levels. Relaxation of code may result in bugs difficult to find while possibility of deadlock is increased if higher isolation levels are required. 1.1. Serializable Highest isolation level Requires read and write locks to be released at the END of transaction. Range-locks are acquired when a SELECT query uses a ranged WHERE clause, to avoid phantom reads For non-lock based concurrency control, no locks are acquired. But if a write collision is detected amongst several concurrent transactions, only 1 is allowed to commit. Snapshot isolation Guarantee that all reads will see a consistent snapshot (normally the last committed values at start of snapshot). Transaction commits only if no updates made conflict with any other concurrent updates since the last snapshot Implemented in MVCC and allows better performance than serializability, and avoids most of the attributed anomalies. MVCC works by allowing each connected user to see a snapshot of the database. When updates are required, a new version marked as the latest is added elsewhere and the rest marked obsolete. 1.2. Repeatable Reads Read and write locks are maintained, but not range locks. Phantom reads can occur 1.3. Read Committed Write locks are kept acquired on selected data, but read locks are released as soon as SELECT is performed (nonrepeatable reads can occur along with phantom reads).

Any data read is committed the moment it is read. Restricts reader from seeing any intermediate, uncommitted dirty read; data is free to change after reading. 1.4. Read Uncommitted Lowest Isolation level Dirty reads are allowed so one transaction may see not-yet committed changes by other transactions.

2. Read Phenomena
2.1. Dirty Read (Uncommitted Dependency) Happens when a transaction is allowed to read from a row modified by another running transaction and not yet committed; similar to non-repeatable reads

2.2. Non-Repeatable Reads A row is retrieved twice and values differ between reads. Happens when locks are not acquired while performing a SELECT, or when locks are released as soon has a SELECT is done. In MVCC, non-repeatable reads may happen when a transaction affected by commit conflict doesnt rollback

At SERIALIZABLE and REPEATABLE READ, DBMS returns old value. At the READ COMMITTED and READ UNCOMMITTED level, DB returns the updated value Serial Scheduling Delay T2 until TI is committed or rolled back MVCC T1 and T2 allowed to continue but T1 works on an older snapshot that is later compared to the schedule. If an error is detected, T1 rolls back with a serialization failure

At REPEATABLE READ level, Query 2 would be blocked till 1 was committed or rolled back. MVCC At the SERIALIZABLE level, both SELECT queries see a snapshot of the database taken at the start of T1, hence returning the same data, but if T1 attempts to update too, then a serialization failure happens At READ COMMITTED, since there is no promise of serializability, the transactions see different data for the start of each query and hence T1 is not retried. 2.3. Phantom Reads Where 2 identical queries are executed and the collection of rows returned is different from the first. Where range locks are not acquired.

In SERIALIZABLE mode, Query 1 will lock all records between 10 and 30 so that Query 2 will still be locked, but in REPEATABLE READ onwards, the range will not be locked and allows the insertion of new records hence its return

3. Isolation Levels, Read Phenomena and Locks


3.1. Isolation Levels vs. Read Phenomena Dirty Reads May occur Non-Repeatable Reads May occur May occur Phantom May occur May occur May occur

Read Uncommitted Read Committed Repeatable Read Serializable

3.2. Isolation Levels vs. Lock Duration C- Locks are held till transaction commits For the rest- Denotes that locks are held only during currently executing statement Write Read Uncommitted Read Committed Repeatable Read Serializable C C C Read Range Operation

C C

Reading (3) Introduction to Concurrency Control


1. Collisions
Collisions happen when 2 activities (may or not be full-fledged transactions) attempt to change entities within a system of record. Dirty Read Activity 1 (A1) reads an entity from the system of record and then updates the system of record but does not commit the change (for example, the change hasnt been finalized). Activity 2 (A2) reads the entity, unknowingly making a copy of the uncommitted version. A1 rolls back (aborts) the changes, restoring the entity to the original state that A1 found it in. A2 now has a version of the entity that was never committed and therefore is not considered to have actually existed. Non-Repeatable Read A1 reads an entity from the system of record, making a copy of it. A2 deletes the entity from the system of record. A1 now has a copy of an entity that does not officially exist. Phantom Read A1 retrieves a collection of entities from the system of record, making copies of them, based on some sort of search criteria such as all customers with first name Bill.A2 then creates new entities, which would have met the search criteria (for example, inserts Bill Klassen into the database), saving them to the system of r ecord. If A1 reapplies the search criteria it gets a different result set.

2. Locking Strategies
2.1. Pessimistic Locking Entity is locked for the whole time it is in application memory by preventing other uses from working with it. Write Lock: Disables write, read, delete entity Read Lock: Allowing reads but not write or delete

Scope can be whole database, table, row or page locks. 2.2. Optimistic Locking Common that collisions are infrequent, instead of preventing, we detect and resolve the collision 1. 2. 3. 4. 5. 6. Read lock secured on the data Object is read into memory for manipulation and lock is released Process for manipulation Obtains a write lock once ready for updating Reads original source to see if theres been a collision (Updates and unlocks if no collision) Resolution of collision if it occurs

Strategies to Resolving Collision Mark source with Unique Identifier Source data marked with unique value each time it is to be updated. At the point of update, this mark is checked for changes, Datetime stamps (use DBs timestamp for this since not all machines have the same sync) Incremental counters UserIDs (only if everyone has a unique ID, logged into only 1 machine and applications ensure just one of this in memory Values generated by a globally unique surrogate key generator

Retain an original Copy Source data retrieved and compared to determine for collision; there should be substance sufficient for this

3. Collision Resolution Strategies


1. 2. 3. 4. 5. Give up Display problems and let user decide Merge changes Log the problem so others can make sense of it in future Ignore the collision and overwrite

4. A Locking Strategy
Table Type Examples Suggested Locking Strategy

Live-High Volume

Account

Optimistic(first choice) Pessimistic(second choice) Pessimistic(first choice) Optimistic(second choice)

Live-Low Volume

Customer Insurance Policy AccessLog AccountHistory TransactionRecord State PaymentType

Log (typically append only)

Overly Optimistic

Lookup/Reference (typically read only)

Overly Optimistic

Reading (4) MySQL SET TRANSACTION


Global: all subsequent sessions; Session: all subsequent transactions within session; then next transaction

InnoDB Repeatable Read


Default transaction level with a difference from READ COMMITTED onwards. All consistent reads are from the same snapshot established by the first read. These SELECT statements (non-locking) are consistent with respect to each other

Locking for reads and writes depends on whether unique index with unique search criteria is applied. For a unique index with search conditions, InnoDB locks only the index record found, not the gap before it. For other conditions, InnoDB locks the range scanned with gap-locks or next-key locks to block insertions by other sessions into the gaps covered by the range

InnoDB Read Committed


Every consistent read, even within the same transaction sets and reads its own fresh snapshot. Consistent Reading A consistent read means that InnoDB uses multi-versioning to present to a query a snapshot of the database at a point in time. The query sees the changes made by transactions that committed before that point of time, and no changes made by later or uncommitted transactions. The exception to this rule is that the query sees the changes made by earlier statements within the same transaction. This exception causes the following anomaly: If you update some rows in a table, a SELECT sees the latest version of the updated rows, but it might also see older versions of any rows. If other sessions simultaneously update the same table, the anomaly means that you might see the table in a state that never existed in the database Repeatable Read: All read the first snapshot established in the transaction. Get fresher ones by committing and reissuing new queries Read Committed: Each consistent read has its own snapshot. So to get the freshest set, use this mode Read Uncommitted: SELECT statements are non-locking such that a possible earlier version may be used. Such reads are not consistent leading to dirty reads Serializable: InnoDB implicitly converts all SELECTs to SELECT LOCK IN SHARE MODE if auto commit is disabled. If autocommit is enabled, then the SELECT is its own TRANSACTION. It is thus READ ONLY and can be serialized if performed as a consistent non-locking read. (Disable autocommit to force a plain SELECT to block if other transactions have modified it) Advantages & Disadvantages
It is important to remember that InnoDB actually locks index entries, not rows. During the execution of a statement InnoDB must lock every entry in the index that it traverses to find the rows it is modifying. It must do this to prevent deadlocks and maintain the isolation level.

In Repeatable Read, every lock acquired during a transaction is held for the duration of the transaction In Read Committed, the locks that do not match the scan are released after the statement completes

InnoDB doesnt release the heap memory back after releasing the locks but the number of locks held is way lower. This means in READ COMMITTED other transactions are free to update rows they would not have been able to update once the UPDATE statement completes Consistent Read Views

REPEATABLE READ MODE Within a transaction, the same snapshot is used for the duration till committing. This above UPDATE also creates a gap lock that will prevent rows from being inserted until rollback or commits are made There is no possibility to change additional rows since the gap after 100 is previously locked Non-Repeatable Reads READ COMMITTED MODE A read is created within the start of each STATEMENT even within the same transaction. Read view of each transaction only lasts as long as the statement execution. For consecutive executions of the same statement could lead to the phantom row problem. Here, gap locks are NEVER created, so the example of SELECT FOR UPDATE will NEVER work in preventing insertions of new rows into the table by other transactions This means possibly updating more rows than you intended to.

Reading (5) Oracle Multi Version Concurrency Control


Oracle automatically provides read consistency to a query so that all the data that the query sees comes from a single point in time (statement-level read consistency). Oracle can also provide read consistency to all of the queries in a transaction (transaction-level read consistency). As a query enters the execution stage, the current system change number (SCN) is determined. In Figure 13-1, this system change number is 10023. As data blocks are read on behalf of the query, only blocks written with the observed SCN are used. Blocks with changed data (more recent SCNs) are reconstructed from data in the rollback segments, and the reconstructed data is

returned for the query. Therefore, each query returns all committed data with respect to the SCN recorded at the time that query execution began. Changes of other transactions that occur during a query's execution are not observed, guaranteeing that consistent data is returned for each query.

Statement Level Read Consistency


Every single query comes from a single point in time- time where query begun. Dirty data or changes made from transactions committing during query execution are not seen. A consistent result set is provided for EVERY QUERY, guaranteeing data consistency. SELECT, INSERT, UPDATE, DELETE all return consistent data. Each is a query in itself.

Transaction Level Read Consistency


Serializable Mode All data accesses reflect state of DB at time of transactions beginning. All data is consistent to a single point in time. Consistently produces repeatable reads and does not expose a query to phantoms

Deadlocks
A deadlock can occur when two or more users are waiting for data locked by each other. Deadlocks prevent some transactions from continuing to work.Figure 13-3 is a hypothetical illustration of two transactions in a deadlock. Figure 13-3 Two Transactions in a Deadlock

Description of "Figure 13-3 Two Transactions in a Deadlock"

Deadlock Detection Oracle automatically detects deadlock situations and resolves them by rolling back one of the statements involved in the deadlock, thereby releasing one set of the conflicting row locks. A corresponding message also is returned to the transaction that undergoes statement-level rollback. The statement rolled back is the one belonging to the transaction that detects the deadlock. Usually, the signalled transaction should be rolled back explicitly, but it can retry the rolled-back statement after waiting. Note: In distributed transactions, local deadlocks are detected by analyzing wait data, and global deadlocks are detected by a time out. Once detected, non-distributed and distributed deadlocks are handled by the database and application in the same way. Deadlocks most often occur when transactions explicitly override the default locking of Oracle. Because Oracle itself does no lock escalation and does not use read locks for queries, but does use row-level locking (rather than page-level locking), deadlocks occur infrequently in Oracle. See Also: "Explicit (Manual) Data Locking" for more information about manually acquiring locks Avoid Deadlocks Multitable deadlocks can usually be avoided if transactions accessing the same tables lock those tables in the same order, either through implicit or explicit locks. For example, all application developers might follow the rule that when both a master and detail table are updated, the master table is locked first and then the detail table. If such rules are properly designed and then followed in all applications, deadlocks are very unlikely to occur. When you know you will require a sequence of locks for one transaction, consider acquiring the most exclusive (least compatible) lock first. Types of Locks Oracle automatically uses different types of locks to control concurrent access to data and to prevent destructive interaction between users. Oracle automatically locks a resource on behalf of a transaction to prevent other transactions from doing something also requiring exclusive access to the same resource. The lock is released automatically when some event occurs so that the transaction no longer requires the resource. Throughout its operation, Oracle automatically acquires different types of locks at different levels of restrictiveness depending on the resource being locked and the operation being performed. Oracle locks fall into one of three general categories. Lock DML locks (data locks) Description DML locks protect data. For example, table locks lock entire tables, row locks lock selected rows.

Lock

Description

DDL locks (dictionary DDL locks protect the structure of schema objectsfor example, the definitions of tables locks) and views. Internal locks and latches Internal locks and latches protect internal database structures such as datafiles. Internal locks and latches are entirely automatic.

The following sections discuss DML locks, DDL locks, and internal locks. DML Locks The purpose of a DML lock (data lock) is to guarantee the integrity of data being accessed concurrently by multiple users. DML locks prevent destructive interference of simultaneous conflicting DML or DDL operations. DML statements automatically acquire both table-level locks and row-level locks. Note: The acronym in parentheses after each type of lock or lock mode is the abbreviation used in the Locks Monitor of Enterprise Manager. Enterprise Manager might display TM for any table lock, rather than indicate the mode of table lock (such as RS or SRX). Row Locks (TX) Row-level locks are primarily used to prevent two transactions from modifying the same row. When a transaction needs to modify a row, a row lock is acquired. There is no limit to the number of row locks held by a statement or transaction, and Oracle does not escalate locks from the row level to a coarser granularity. Row locking provides the finest grain locking possible and so provides the best possible concurrency and throughput. The combination of multiversion concurrency control and row-level locking means that users contend for data only when accessing the same rows, specifically:

Readers of data do not wait for writers of the same data rows. Writers of data do not wait for readers of the same data rows unless SELECT ... FOR UPDATE is used, which specifically requests a lock for the reader. Writers only wait for other writers if they attempt to update the same rows at the same time.

Note: Readers of data may have to wait for writers of the same data blocks in some very special cases of pending distributed transactions. A transaction acquires an exclusive row lock for each individual row modified by one of the following statements: INSERT, UPDATE, DELETE, and SELECT with the FOR UPDATE clause. A modified row is always locked exclusively so that other transactions cannot modify the row until the transaction holding the lock is committed or rolled back.However, if the transaction dies due to instance failure, block-level

recovery makes a row available before the entire transaction is recovered. Row locks are always acquired automatically by Oracle as a result of the statements listed previously. If a transaction obtains a row lock for a row, the transaction also acquires a table lock for the corresponding table. The table lock prevents conflicting DDL operations that would override data changes in a current transaction. See Also: "DDL Locks" Table Locks (TM) Table-level locks are primarily used to do concurrency control with concurrent DDL operations, such as preventing a table from being dropped in the middle of a DML operation. When a DDL or DML statement is on a table, a table lock is acquired. Table locks do not affect concurrency of DML operations. For partitioned tables, table locks can be acquired at both the table and the subpartition level. A transaction acquires a table lock when a table is modified in the following DML statements: INSERT, UPDATE, DELETE, SELECT with the FOR UPDATE clause, and LOCK TABLE. These DML operations require table locks for two purposes: to reserve DML access to the table on behalf of a transaction and to prevent DDL operations that would conflict with the transaction. Any table lock prevents the acquisition of an exclusive DDL lock on the same table and thereby prevents DDL operations that require such locks. For example, a table cannot be altered or dropped if an uncommitted transaction holds a table lock for it. A table lock can be held in any of several modes: row share (RS), row exclusive (RX), share (S), share row exclusive (SRX), and exclusive (X). The restrictiveness of a table lock's mode determines the modes in which other table locks on the same table can be obtained and held.

Lesson Notes

Week 9 Fault Tolerance

Typical Failure Modes 1. 2. 3. 4. Code: Bugs Data Consistency Systems: Hardware failures, software hangs Environment: Differing data from redundant input sources Common to embedded control systems

Availability & Fault Tolerance Availability to give service Availablility% = Uptime/Total Time o Total Time = Uptime + Total Downtime o Downtime = Time to Detect + Time to Restart How to count scheduled maintenance Fault Tolerance = Working correctly despite errors

Client Based

Load-Balancer

DNS Domain Name System

Mapping of I.P. addresses to meaningful URL names (Uniform Resource Locator)

IP

Virtual I.P

Restart

Week 10- Replication


Reading (1) Replication in Computing
Sharing of information to ensure consistency between redundant resources to improve reliability, fault-tolerance or accessibility Data Replication: Same data on multiple storage devices Computation Replication: Same computing task executed many times Typically replicated in space (across devices) Or across time (many times on the device) Often linked via the scheduling algorithm

Active Replication: Performed by processing the same request at every replica Passive Replication: Processing the request in a single replica and then transfer the results to all replicas

Master-Slave: Primary-backup scheme where pre-dominant in high availability clusters. Master replica is designated to process all requests. Single master makes it easier to achieve consistency within the group Multi-Master: Any replica processes and distributes the new state. Distributed concurrency control is needed such as a distributed lock manager

Load balancing focuses on distributing a load of different computations across machines and allows a single computation to be dropped in case of failure. Sometimes, data replication is needed for distribution of data across machines

Backups vs. Replicas: Backups saves a copy of data for a long period of time while replicas undergo frequent updates and quickly lose any historical state.

Replication Models in Distributed Systems


Transactional Replication A database combined with some transactional storage structure. One-copy serializability employed in accordance with ACID properties guaranteed.

State Machine Replication Replicated process is deterministic automaton and atomic broadcast is possible. Based on distributed consensus model and is similar to transaction replication. Implemented by a replicated log using Paxos algorithm; used by Googles Chubby system and behind Keyspace datastore. Virtual Synchrony Used when a group of processes cooperate to replicate in-memory data and to coordinate actions. A process group is defined for this; and provided a checkpoint containing current state of the data. Processes can send multicasts that are seen by all members to communicate current state of data

It is more simple especially if you just write to one node fallback and recovery are rather easy. Even if all things are automated simple things mean less software bugs. Handling write load If your application is write intensive master-N-slave configuration will be saturated much faster because it has to handle much more write load. Especially keeping into account MySQL replication is single thread it might be not long before it will be unable to keep up.

Waste of cache memory If you have same data on the slaves you will likely same data cached in their database caches. You can partially improve it by load partitioning but still it will not be perfect for example all of write load has to go to all nodes getting appropriate data in the cache. In our example if you have 16GB boxes and say 12GB allocated to MySQL database caches you can get 12GB effective cache on the master-N-slave configuration compared to 36GB of effective cache on 3 master-master pairs. Waste of Disk Disk is cheap but for IO bound workloads you may need fast disk array, which becomes not so cheap so having less data to deal with becomes important. More time to clone If replication breaks you may need more time to re-clone it (or restore database from backup) compared to multiple master-master pairs.

Week 11- Big Distributed Data

Partitioning Problem
1. Google Big Table

CONSISTENT, AVAILABLE, PARTITION-TOLERANT

Notes: Note that locking granularity is small (single row) and the transactions are very fast (single row read/write). Thus contention is likely to be low and blocking times are short Replication is fast partly because it is inherently conflict free no one else can edit the data at the same time; and partly because GFS is heavily optimized for this *BigTable uses Chubby, a distributed lock system to ensure only one server is hosting each range of rows (called a tablet)

2. Google Megastore

Replicas need to track if they are up to date or not. If they are, the read is done locally. If not, the read forces them to catch up. Writes always force a catch up. 3 consistency modes for each read Inconsistent: whatever is available locally now Snapshot: most recently applied transaction Current: forces all commits to be applied and gets most current data

*(file read/write, network communication, cluster distribution) ** the synch groups vs. asynch groups is one example. Another is that the API does not allow joins, the programmer must implement them thus exposing when a difficult/expensive operation is entailed.

3. Amazon Dynamo

Local Fault detection If A cant replicate to B, then A thinks B is down and periodically retries Membership info and fault info is gossiped around the cluster

*W = number of nodes MUST write to R = number of nodes read from (if possible) N = number of nodes in cluster

4. Facebook Cassandra

http://www.datastax.com/dev/blog/your-ideal-performance-consistency-tradeoff More on Cassandra http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-performance

Paxos
Paxos is a family of protocols for solving consensus in a network of unreliable processors. Consensus is the process of agreeing on one result among a group of participants. This problem becomes difficult when the [1] participants or their communication medium may experience failures. Consensus protocols are the basis for the state machine approach to distributed computing, as suggested by Leslie [2] [3] Lamport and surveyed by Fred Schneider. The state machine approach is a technique for converting an algorithm into a fault-tolerant, distributed implementation. Ad-hoc techniques may leave important cases of failures unresolved. The principled approach proposed by Lamport et al. ensures all cases are handled safely. The Paxos protocol was first published in 1989 and named after a fictional legislative consensus system used on [4] [5] the Paxos island in Greece. It was later published as a journal article in 1998. The Paxos family of protocols includes a spectrum of trade-offs between the number of processors, number of message delays before learning the agreed value, the activity level of individual participants, number of messages sent, and types of failures. Although no deterministic fault-tolerant consensus protocol can guarantee progress in [6] an asynchronous network (a result proved in a paper by Fischer, Lynch and Paterson ), Paxos guarantees safety [5][7][8][9][10] (consistency), and the conditions that could prevent it from making progress are difficult to provoke.

Paxos is usually used where durability is required (for example, to replicate a file or a database), in which the amount of durable state could be large. The protocol attempts to make progress even during periods when some bounded number of replicas are unresponsive. There is also a mechanism to drop a permanently failed replica or to add new a replica.

You might also like