You are on page 1of 13

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI

WORK INTEGRATED LEARNING PROGRAMMES


Digital Learning
Part A: Course Design

Course Title Distributed Data Systems


Course No(s) SS ZG554
Credit Units 5
Credit Model
Content Authors Anil Kumar G

Course Objectives

No Course Objectives

This field covers all aspects of computing and information access across multiple
CO1 processing elements connected by any form of communication network, either local
area, or wide area

There has been a steady growth in the development of contemporary applications that
CO2 demonstrate their efficacy by connecting millions of users/applications/machines across
the globe without relying on a traditional client-server approach.

The general computing trend is to leverage shared resources and massive amounts of
CO3 data over the Internet. This course aims to provide an understanding of theory and
systems aspects of distributed data

Text Book(s)

T1 M. Tamer Özsu • Patrick Valduriez Principles of Distributed Database Systems Third Edition

T2 Distributed Operating Systems: Concepts And Design By Pradeep K. Sinha

Reference Book(s)

R1 “Storage Networks Explained” – by Ulf Troppens, Wolfgang Muller-Freidt, Rainer Wolafka,


IBM Storage Software Development, Germany. Publishers: Wiley
On-Line Resources

HBase
https://hbase.apache.org
http://www.tutorialspoint.com/hbase/

MapReduce
https://www-01.ibm.com/software/data/infosphere/hadoop/mapreduce/
http://www.tutorialspoint.com/hadoop/hadoop_mapreduce.htm

SAN
http://searchstorage.techtarget.com/definition/storage-area-network-SAN
http://www.snia.org/education/storage_networking_primer/san/what_san

NAS
http://searchstorage.techtarget.com/definition/network-attached-storage
http://www.webopedia.com/TERM/N/network-attached_storage.html

Content Structure

1. Distributed Data Storage Technology


a. Server-centric IT architecture and its limitations
b. Storage-centric IT architecture and its advantages
c. Architecture of intelligent disk subsystems
d. Hard disks and internal i/o channels and JBOD
e. Storage virtualization using RAID
f. Introduction to NAS, SAN and DAS
2. Distributed File Systems & Security
a. File Models & Accessing models
b. File sharing Semantics
c. File Caching
d. File Replication
e. Fault Tolerance
f. File System Security
3. Distributed Databases
a. Distributed DBMS
b. Architectural Models for DDBS
c. Distributed DBMS Architecture
d. Distributed Data Sources
4. Distributed Database Design Issues & Integration
a. Framework of Distribution
b. Distributed Design Issues
c. Top-Down Design Process
d. Fragmentation
e. Allocation
f. Bottom-Up Design Methodology
g. Schema Matching
h. Schema Integration
i. Schema Mapping
j. Data Cleaning
5. Data and Access Control
a. Database Security
b. Discretionary Access Control
c. Multilevel Access Control
d. Distributed Access Control
e. View Management
f. Views in Centralized DBMSs
g. Views in Distributed DBMSs
h. Maintenance of Materialized Views
6. Data Replication
a. Consistency of Replicated Databases
b. Update Management Strategies
c. Replication Protocols
d. Replication and failures
e. Replication Mediator Service
7. Parallel Database Systems
a. Parallel Database System Architectures
b. Parallel Data Placement
c. Load Balancing
d. Database Clusters
8. Web Data Management
a. Web Graph Management
b. Web Search
c. Web Crawling
d. Indexing
e. Ranking and Link Analysis
f. Keyword Search
g. Web Querying
h. Semi-structured Data Approach
i. Web Query Language Approach
j. Question Answering
k. Searching and Querying the Hidden Web
9. Hadoop & Big Data
a. Introduction
b. Hadoop Architecture
c. HDFS Operations
d. HDFS Commands
e. Big Data Overview
f. Multi Node Cluster
g. Map Reduce

Learning Outcomes:

No Learning Outcomes

LO1 Understanding about Distributed structures

LO2 Understanding of Distributed Storage systems and the technologies used to implement
LO3 Understanding of Distributed databases architecture

LO4 Understanding of Parallel databases architecture and systems

LO5 Understanding Hadoop environment and Big Data


Part B: Learning Plan

Academic Term SECOND SEMESTER 2017-2018

Course Title Distributed Data Systems

Course No SS ZG554

Lead Instructor Anil Kumar G

Glossary of Terms:

1. Contact Hour (CH) stands for a hour long live session with students conducted either in a physical
classroom or enabled through technology. In this model of instruction, instructor led sessions will
be for 20 CH.
a. Pre CH = Self Learning done prior to a given contact hour
b. During CH = Content to be discussed during the contact hour by the course instructor
c. Post CH = Self Learning done post the contact hour
2. RL stands for Recorded Lecture or Recorded Lesson. It is presented to the student through an
online portal. A given RL unfolds as a sequences of video segments interleaved with exercises
3. SS stands for Self-Study to be done as a study of relevant sections from textbooks and reference
books. It could also include study of external resources.

Contact Hour 1: Distributed Data Storage Technology

Time Type Sequence Reference

T1 – 1
RL 1.1 T1 - 2
Pre CH RL
RL 1.2 R1 – 1
R1 – 2

 SERVER-CENTRIC IT ARCHITECTURE AND ITS


LIMITATIONS R1 – 1.1
 STORAGE-CENTRIC IT ARCHITECTURE AND ITS R1 – 1.2
During CH CH
ADVANTAGES R1 – 2.1
 ARCHITECTURE OF INTELLIGENT DISK SUBSYSTEMS R1 – 2.2
 HARD DISKS AND INTERNAL I/O CHANNELS

Post CH SS Case Study: Replacing a Server with Storage Networks R1 – 1.3

Contact Hour 2: Distributed Data Storage Technology

Time Type Sequence Reference

RL 1.3 T1 - 2
Pre CH RL
RL 1.4 R1 – 2

During CH CH  JBOD: JUST A BUNCH OF DISKS R1 – 2.3


 STORAGE VIRTUALISATION USING RAID R1 – 2.4
 DIFFERENT RAID LEVELS R1 – 2.5
 RAID 0: BLOCK-BY-BLOCK STRIPING
 RAID 1: BLOCK-BY-BLOCK MIRRORING
 RAID 0+1/RAID 10: STRIPING AND MIRRORING
COMBINED
 RAID 0+1: STRIPING AND MIRRORING COMBINED
 RAID 10: STRIPING AND MIRRORING COMBINED

Post CH SS R1 : Page 535 & 536 R1

Contact Hour 3: Distributed Data Storage Technology

Time Type Sequence Reference

R1 – 2
Pre CH RL RL 1.5
RL - 1

 RAID 4 AND RAID 5


 RAID 6: DOUBLE PARITY
 RAID 2 R1 – 2.5.4
 RAID 3 R1 – 2.5.5
During CH CH  COMPARISON OF THE RAID LEVELS R1 – 2.5.6
 BASIC FORMS OF STORAGE R1 – 2.5.7
 COMPARISON
 Introduction to NAS, SAN and DAS

Post CH SS Availability of Disk Subsystems R1 - 2

Contact Hour 4: Distributed File Systems & Security

Time Type Sequence Reference

T2 - 9.1
Pre CH RL Features of Distributed File system
T2 - 9.2

 File Models & Accessing models


T2 - 9.3
During CH CH
 File sharing Semantics T2 - 9.4

Post CH SS Design Principles T2 - 9.10

Contact Hour 5: Distributed File Systems & Security

Time Type Sequence Reference

Pre CH RL - -
 File Caching

 File Replication T2 – 9.5


T2 – 9.6
During CH CH
 Fault Tolerance T2 – 9.7
T2 – 9.8
 File System Security

Post CH SS Case study T2 – 9.11

Contact Hour 6: Distributed Databases

Time Type Sequence Reference

RL 2.1 T1 – 1
Pre CH RL
RL 2.2 RL - 2

T1 – 1.7.1
 Distributed DBMS Systems T1 – 1.7.2
T1 – 1.7.3
During CH CH
 Architectural Models for DDBS T1 – 1.7.4
T1 – 1.7.5
T1 – 1.7.6

Post CH SS - -

Contact Hour 7: Distributed Databases

Time Type Sequence Reference

RL 2.3
T1 – 1
Pre CH RL RL 2.4
RL - 2
RL 2.5

 Distributed DBMS Architecture T1 – 1.7.8


During CH CH T1 – 1.7.9
 Distributed Data Sources
T1 – 1.7.10

Post CH SS Distributed DBMS Architecture Online References

Contact Hour 8: Distributed Database Design Issues & Integration

Time Type Sequence Reference

RL 3.1
RL 3.2 T1 – 3
Pre CH RL
RL 3.3 RL - 3
RL 3.4
RL 3.5

 Framework of Distribution

 Distributed Design Issues


T1 – 3.1
 Top-Down Design Process T1 – 3.2
During CH CH
T1 – 3.3
 Fragmentation T1 – 3.4
 Allocation

Post CH SS Solve T1 : Problem 3.1 & 3.2 T1 - Page 126

Contact Hour 9: Distributed Database Design Issues & Integration

Time Type Sequence Reference

T1 – 4
Pre CH RL RL 3.6
RL - 3

 Bottom-Up Design Methodology

 Schema Matching T1 – 4.1


T1 – 4.2
During CH CH  Schema Integration T1 – 4.3
 Schema Mapping T1 – 4.4
T1 – 4.5
 Data Cleaning

Post CH SS Solve T1 : Problem 4.4 T1 - Page 161

Contact Hour 10: Data and Access Control

Time Type Sequence Reference

RL 4.1
RL 4.2 T1 – 5
Pre CH RL
RL 4.3 RL - 4
RL 4.4

 Database Security

 Discretionary Access Control


During CH CH T1 – 5.2
 Multilevel Access Control

 Distributed Access Control

Post CH SS Case Studies T1 – 5


Contact Hour 11: Data and Access Control

Time Type Sequence Reference

Pre CH RL - -

 View Management

 Views in Centralized DBMSs


During CH CH T1 – 5.1
 Views in Distributed DBMSs

 Maintenance of Materialized Views

Post CH SS Solve T1 : Problem 5.1 T1 – Page 202

Contact Hour 12: Mid-Semester Review

Time Type Sequence Reference

Pre CH RL CH 1 to 11 -

During CH CH Mid-Semester Review CH 1 to 11

Post CH SS CH 1 to 11 -

Contact Hour 13: Data Replication

Time Type Sequence Reference

RL 5.1
RL 5.2 T1 – 13
Pre CH RL
RL 5.3 RL – 5
RL 5.4

 Consistency of Replicated Databases


T1 – 13.1
During CH CH
 Update Management Strategies T1 – 13.2

Post CH SS

Contact Hour 14: Data Replication

Time Type Sequence Reference

RL 5.5 T1 – 13
Pre CH RL
RL 5.6 RL – 5

During CH CH  Replication Protocols T1 – 13.3


 Replication and failures T1 – 13.5
T1 – 13.6
 Replication Mediator Service

Post CH SS Solve T1 : Problem 13.2 T1 - Page 493

Contact Hour 15: Parallel Database Systems

Time Type Sequence Reference

RL 6.1
RL 6.2
T1 - 14
Pre CH RL RL 6.3
RL - 6
RL 6.4
RL 6.5

 Parallel Database System


During CH CH T1 – 14.1
 Architectures

Post CH SS Parallel Database Architectures T1 – 14.1

Contact Hour 16: Parallel Database Systems

Time Type Sequence Reference

T1 - 14
Pre CH RL RL 6.6
RL - 6

 Parallel Data Placement


T1 – 14.2
During CH CH  Load Balancing T1 – 14.4
 Database Clusters T1 – 14.5

Post CH SS Solve T1 : Problem 14.15 T1 - Page 550

Contact Hour 17: Web Data Management

Time Type Sequence Reference

RL 7.1
RL 7.2 T1 - 17
Pre CH RL
RL 7.3 RL - 7
RL 7.4

 Web Graph Management T1 – 17.1


During CH CH
 Web Search

Post CH SS Understanding Web search Online References

Contact Hour 18: Web Data Management

Time Type Sequence Reference

RL 7.5
T1 - 17
Pre CH RL RL 7.6
RL - 7
RL 7.7

 Web Crawling

 Indexing
During CH CH T1 – 17.2
 Ranking and Link Analysis

 Keyword Search

Post CH SS Indexing and Ranking case studies Online References

Contact Hour 19: Web Data Management

Time Type Sequence Reference

Pre CH RL - -

 Web Querying

 Semi-structured Data Approach

During CH CH  Web Query Language Approach T1 – 17.3


 Question Answering

 Searching and Querying the Hidden Web

Post CH SS Solve T1 : Problem 17.1 T1 - Page 719

Contact Hour 20: Hadoop & Big Data

Time Type Sequence Reference

RL 8.1
Pre CH RL RL 8.2 Online References
RL 8.3

During CH CH  Hadoop & Big Data Introduction Online References


 Hadoop Architecture

 HDFS Operations

 HDFS Commands

Post CH SS HDFS Commands and Hadoop case studies Online References

Contact Hour 21: Hadoop & Big Data

Time Type Sequence Reference

Pre CH RL RL 8.4 RL - 8

 Big Data Overview

During CH CH  Multi Node Cluster Online References


 Map Reduce

Post CH SS Big Data solutions Online References

Contact Hour 22: Comprehensive Exam Review

Time Type Sequence Reference

Pre CH RL CH 1 TO 21 -

During CH CH Comprehensive Exam Review CH 1 TO 21

Post CH SS CH 1 TO 21 -

Evaluation Scheme:
Legend: EC = Evaluation Component; AN = After Noon Session; FN = Fore Noon Session
No Name Type Duration Weight Day, Date, Session, Time
EC-1 Quiz-I/ Assignment-I Online 5% February 1 to 10, 2018
Quiz-II Online 5% March 1 to 10, 2018
Quiz-III/ Assignment-II Online 5% March 20 to 30, 2018
EC-2 Mid-Semester Test Closed Book 2 hours 35% 04/03/2018 (AN) 2 PM – 4 PM
EC-3 Comprehensive Exam Open Book 3 hours 50% 22/04/2018 (AN) 2 PM – 5 PM

Note: If Assignment kindly remove Quiz-I, II, III


Syllabus for Mid-Semester Test (Closed Book): Topics in Session Nos.
Syllabus for Comprehensive Exam (Open Book): All topics (Session Nos. 1 to 22)
Important links and information:
Elearn portal: https://elearn.bits-pilani.ac.in
Students are expected to visit the Elearn portal on a regular basis and stay up to date with the latest announcements
and deadlines.
Contact sessions: Students should attend the online lectures as per the schedule provided on the Elearn portal.
Evaluation Guidelines:
1. EC-1 consists of either two Assignments or three Quizzes. Students will attempt them through the course
pages on the Elearn portal. Announcements will be made on the portal, in a timely manner.
2. For Closed Book tests: No books or reference material of any kind will be permitted.
3. For Open Book exams: Use of books and any printed / written reference material (filed or bound) is
permitted. However, loose sheets of paper will not be allowed. Use of calculators is permitted in all exams.
Laptops/Mobiles of any kind are not allowed. Exchange of any material is not allowed.
4. If a student is unable to appear for the Regular Test/Exam due to genuine exigencies, the student should
follow the procedure to apply for the Make-Up Test/Exam which will be made available on the Elearn portal.
The Make-Up Test/Exam will be conducted only at selected exam centres on the dates to be announced later.
It shall be the responsibility of the individual student to be regular in maintaining the self study schedule as given in
the course handout, attend the online lectures, and take all the prescribed evaluation components such as
Assignment/Quiz, Mid-Semester Test and Comprehensive Exam according to the evaluation scheme provided in the
handout.

You might also like