Neutrino

Neutrino: Improved Scalability of Distributed System
Objective:
• Enabling mobile database access with good performance over low‐bandwidth networks.
• Maintaining consistency in mobile replicas.
• Employing partial template based replication in the mobile client.
Motivation:
The World Wide Web, not to mention has taken significant importance in almost all fields ranging from
docile to enterprise services and has also become their primary communication medium. Relational databases lie
at the core for many such enterprises. This drives the need for hosting architectures that support arbitrary levels of
load with acceptable performance. A common technique used to improve the performance of a database is
replication. Preserving consistency, along with replication, with acceptable performance under conditions of weak
connectivity is a difficult challenge. A trade‐off always exists fundamentally between consistency, performance and
poor network quality, leading to various architectures. Some of these might relax on consistency; however, failing
to maintain the same might undermine the very purpose of using databases for enterprise applications.
Absence of such a replication method and inconsistency challenges faced with mobile replicas motivated
towards this project.

Abstract:

Data replication brings with it an important issue of consistency. Lot of effort has been put into
developing different consistency models over the past decade. Though, strong consistency models seem like an
obvious choice for all replica consistency management, there is an argument that the weak consistency models are
pretty important and definitely have an edge over strong consistency models in mobile and WAN environments.
However, today's replication systems are not “mobile‐ready”. Instead of improving the mobile user's environment,
the replication system actually hinders mobility and complicates mobile operation. Designed for stationary
environments, the replication services do not and cannot provide mobile users with the capabilities they require.
Replication in mobile environments requires fundamentally different solutions than those previously proposed,
because mobility presents a fundamentally new and different computing paradigm.
Even with replication, replicating a whole database in all the replicas creates many problems. First, a huge
storage or memory cannot be expected in a mobile device. Second, whenever the replicas have to update it self
with master copy storing the whole database in clients will create a bandwidth bottleneck in the server. To avoid
such problems the mobile client shall partially replicate the whole database.
Generic database replication algorithms do not scale linearly in throughput as all update, deletion and
insertion (UDI) queries must be applied to every database replica. The throughput is therefore limited to the point
where the number of UDI queries alone is sufficient to overload one server. In such scenarios, partial replication of
a database can help, as UDI queries are executed only by a subset of all servers. This system employs partial
replication to improve database throughput. This exploits the fact that a Web application’s query workloads is
composed of a small set of read and write templates. Using knowledge of these templates and their respective
execution costs, this provides database table placements that produce significant improvements in database
throughput.
Replication:

Replication is one of the oldest and most important topics in the overall area of distributed systems.
Whether one replicates data or computation, the objective is to have some group of processes that handle
incoming events. If we replicate data, these processes are passive and operate only to maintain the stored data,
1

reply to read requests, and apply updates. When we replicate computation, the usual goal is to provide fault‐
tolerance. For example, a replicated service might be used to control a telephone switch, with the objective of
ensuring that even if the primary controller fails, the backup can take over its functions. But the underlying needs
are the same in both cases: by ensuring that the replicas see the same events in equivalent orders, they stay in
consistent states and hence any replica can respond to queries.
Database replication can be used on many database management systems, usually with a master/slave
relationship between the original and the copies. The master logs the updates, which then ripple through to the
slaves. The slave outputs a message stating that it has received the update successfully, thus allowing the sending
(and potentially re‐sending until successfully applied) of subsequent updates.
When these replicas are placed in mobile clients then there comes a problem of consistency. Any mobile
device is known to get outside the coverage area. In a wireless network the clients often suffer from poor signal
strength. In such situations the client cannot be expected to update its copy of replication from the master copy.
Partial Replication:
By replicating the whole database there is a reduced complexity in design of the distributed system. But
the size of the data present in the mobile device is high. In laptops and PDAs huge storage cannot be expected. To
avoid such storage problems partial replication scheme can be employed.
The partial replication is possible by partitioning the database logically. The partitioning can be done by
two different schemes. The schemes are:
1) Horizontal fragmentation
2) Vertical fragmentation
Sample Student Data:
Roll No. Name Mark1 Mark2 Total

1234 Xyz 98 78 176
1235 Yyy 89 78 167
1534 Abc 78 67 148
2342 Xxx 67 56 123

Horizontal Fragmentation:
Replication1:

1234 Xyz 98 78 176
1235 Yyy 89 78 167

Replication2:

1534 Abc 78 67 148
2342 Xxx 67 56 123
2

Vertical Fragmentation:
Roll No. Name Total Roll No. Mark1 Mark2

1234 Xyz 176 1234 98 78
1235 Yyy 167 1235 89 78
1534 Abc 148 1534 78 67
2342 Xxx 123 2342 67 56

Template based replication:
This is a database replication system that exploits the fact that the database queries issued by typical Web
applications belong to a relatively small number of query templates. A query template is a parameterized SQL
query those parameter values are passed to the system at runtime. Prior knowledge of these templates allows one
to select database table placements such that each query template can be treated locally by at least one server.
We demonstrate that careful table placements based on the data span and the relative execution costs of different
templates can provide major scalability gains in terms of sustained throughput. We further show that this
technique can easily be used in combination with any existing template‐based database query caching system,
Thereby obtaining reduced access latency and yet some more throughput scalability.

Poor consistency with mobile clients:

Weak consistency models are based on the fact that replicas can tolerate inconsistencies for at least some
time period. The replicas perform the process of reconciliation to make their copies consistent. Conflicts are
resolved as and when they arise. User is allowed to perform optimistic updates on the data. Conflict resolution is
however application dependent and some work are needed on part of the application developers to handle
conflicts. Such models scale very well as background update propagation demands less coordination among sites.
The internet has played a big role in the development and popularity of distributed applications. Some of
the most popular application like Napster, Gnutella, Freenet have created a tremendous interest in the user
community. Deploying these distributed/peer‐to‐peer applications over the internet has posed several challenges.
With the number of users being so large, distributed systems deployed today across the internet need to exhibit
several important properties like – high availability, scalability across millions of users, ability to tolerate network
failures, network partitioning, higher rates of packet loss and varied bandwidth connections to the internet.
Despite all the harsh conditions, the distributed application should be able to perform well enough to provide
satisfactory user response times. The number of mobile devices in use is exploding with advancement in mobile
computing. Mobile users are becoming more and more interconnected. Since mobile users want to increasingly
share data and demand access to collaborative applications, the weak consistency model has emerged as a natural
way to allow mobile users to share data among themselves. Since mobile client have limited processing power and
memory, and they often are in a disconnected state, maintaining strong consistency for mobile replicas is not a
feasible option. Hence, many weak consistency solutions for mobile applications have been suggested.

Scalability in Large Scale Systems:
Weakly consistent replication protocols normally disseminate information through methods like anti‐
entropy or gossip. These are examples of epidemic protocols where the updates are propagated lazily. These
protocols make probabilistic guarantees that there will be eventual consistency amongst all the replicas. This kind
of approach has been around for quite a while and has the advantage that in a large‐scale loosely coupled
environment it scales better and performs better than other approaches. Lazy replication is used in peer‐to‐peer
systems like Gnutella. The highly scalable storage system Oceanstore designed to scale across billions of users
provides API supporting weak consistency for increased availability and performance.

3

Wireless Wide Area Networks:

Millions of users today own personal computing devices that can access, query, and update data stores
and databases over wireless networks. The increasing availability of wide‐area connectivity options such as cellular
GPRS/EDGE, EVDO, and WiMax has encouraged the notion of access to data anywhere and anytime. However,
many of these wireless technologies exhibit high variability in peak theoretical throughput, as shown in Table 1.
End‐to‐end latency is also highly variable. To make matters worse, recent studies have shown that achievable
throughput is often only 25–65% of the theoretical maximum, and that large drops in throughput are seen during
peak usage periods. Wireless Wide‐Area Network (WWAN) technologies remain important for the foreseeable
future in spite of the growing popularity of WiFi (802.11) technology. First, WiFi coverage is limited to compact
areas where dense base station infrastructure can be economically sustained. In contrast, WWAN technologies
require much less dense infrastructure that can often be piggybacked on existing cell phone infrastructure. Hence,
they are economically sustainable over much larger geographic areas. Second, organizational and business
considerations may preclude use of WiFi. For example, a user may not subscribe to the wireless service provider of
a hotspot. As another example, corporate security guidelines may prohibit a salesman from using the WiFi
coverage available at a customer site; hence unplanned use of WiFi might not be possible. Third, existing WiFi
infrastructure may be damaged or unusable in situations such as disaster recovery and military operations. Rapid
setup of new wireless infrastructure is feasible only if it is sparse. This typically implies reliance on WWAN
technologies.

Content Addressable Storage (CAS):
Neutrino improves performance by discovering commonality across tentative and authoritative database
results. Since these results can be large, eliding commonality can lead to a win on slow networks. Based on its
success in networking, distributed file systems and enterprise‐scale storage systems we use Content
Addressable Storage (CAS) induced by cryptographic hashing to discover commonality. We have customized this
technique for improved performance in our context. Like previous CAS‐based efforts, we assume that real‐world
data is collision‐resistant with respect the cryptographic hash function being used. In other words, it is
computationally intractable to find two inputs that hash to the same output. Trusting in collision resistance,
CAS‐based systems treat the hash of a data item as its unique identifier or tag. Data then becomes content‐
addressable, with tags serving as code words for the much larger data items in network transmission. Although
concerns about the collision‐resistance assumption of CAS have been expressed is compelling. If Neutrino’s hash
function is broken, replacing it would be simple since Neutrino only uses hashing on volatile data and never on
permanent storage. While a much stronger function such as SHA‐256 would increase computational effort at the
client and server, the new hashes would still be much smaller than the data items they represent.

Database Access API:
Remote database access is widely supported today through Java Database Connectivity (JDBC) and its
antecedent, Open Database Connectivity (ODBC). JDBC defines a Java API that enables vendor independent access
to databases. In other words, an application written to that API can be confident of working with any database that
supports JDBC. Each vendor provides a client component called a JDBC driver that is typically implemented as a
dynamically linked library layered on top of the TCP socket interface. The wire protocol between the JDBC driver
and its database is vendor‐specific, and typically embodies proprietary optimizations for efficiency.
4

Basic Architecture of Neutrino:

Client Requests

Edge Server Edge Server Edge Server

Queue Queue Queue

Query Router

Application Application Application
Server Proxy

Neutrino JDBC Driver Neutrino JDBC Driver Neutrino JDBC Driver JDBC Driver
Network

Client Proxy Client Proxy Client Proxy Cache
JDBC Driver JDBC Driver JDBC Driver

Origin
Table 1, 2 Table 3, 1 Table 3, 2

Replica Replica Replica
Server 3 Server 3 Server 3 Master Copy

Architecture Explained:

Master Copy & Server Proxy:
The master copy of the database is present somewhere in the network. This copy of the database is
connected to server proxy through a JDBC driver. The master database is usually a sophisticated database
management system like oracle. The server proxy maintains the list of clients connected to the network. Whenever
a client contacts the server it has to contact the server proxy. This also maintains transparency by hiding the other
replicas from one replica. The server proxy also maintains the list of updations. Server proxy may be present in the
master itself or somewhere in the network.

Edge Servers:
These are endpoints for this system. Whenever a client has to get data from this content distribution
network they contact the edge servers. The edge servers in turn query the replicas for data. The requests to the
replicas are queued. The client contacting the edge servers are unaware of the technology that is being used to
distribute the data.

5

Query Router:
Query router is a major module of Neutrino which helps in achieving partial replication. The query router
can be thought of something very similar to a directory service which has the information about the placement of
replicas and their load. These routers then find a feasible replica which can meet the client request and serves the
content to the client.

Neutrino JDBC Driver:
This module is responsible for contacting the server proxy and checks for the signal strength regularly.
Whenever there is connectivity to the network the server proxy is contacted. The application program always
contacts the JDBC driver.

Design Challenges with Distributed Systems:

Heterogeneity:
In a distributed system there may systems with various architecture and database servers. Hence there
must a common communication medium between these systems. To defeat this challenge a common database
access API such as Java DataBase Connectivity (JDBC) or Open DataBase Connectivity (ODBC) can be used. Various
database service providers have theirs own drivers which can be plugged into these APIs.
Heterogeneity of architecture need not be worried, the communication is between the JDBC and user
defined drivers alone.

Consistency:
The consistency of the data in various replicas is essential in case of any distributed systems. This is to
ensure that the content delivered to the client is the present one. With Neutrino this is the greatest challenge
since it is for mobile clients. The mobile devices are often known to get out of coverage area. Even the weather
constitutes the loss of signal.
Neutrino manages this problem by the help of the Server Proxy and the Neutrino’s native driver. The
server proxy has the list of clients that are currently connected to the server and it also maintains the list of
updations that have taken place in a database.

Availability:
This distributed system is highly available due to the presence of number of replications for the same
database. There is also an assumption that at least one replica is under the coverage area always.

Scalability:
For improved scalability partial replication schemes are used. The query router routes the client to the
appropriate replica thereby the other replicas are not crowded for the data. By having more than one query router
the bottleneck with single query router is also avoided. The hardware requirements of the replica is also reduced
due to this mechanism because, only partial database is stored in a client.
The throughput is high since if a replica is crowded the request is redirected to another replica with same
data.

Fault Tolerance:
Toleration toward failures is more since the number of replicas are more. Whenever the master fails, the
master is updated with the content in replicas that is very recent. The failure is also notified to the clients so that
they can request the client for verification of data when the data is not recent.

Transparency:
The client is aware of the server proxy alone. They do not know where the replicas are present and how
the data are updated. The server proxy takes care of all these activities

6

Output Metrics:
The project is planned to be evaluated using:
• The bandwidth usage.
• Throughput
• Server load
• Bottleneck due to Neutrino driver.
• Ability of server proxy.
• Bottleneck due to consistency enforcement.
• Increase in server load for one client.
Benchmark:
The project is evaluated in two different environments like RUBBoS and TPC‐W.
References:
| GlobeTP : A template based replication service. Dr. Swaminathan Sivasubramanian, Tobias Groothuyse,
Guillaume Pierre Vrije Universitiet, The Netherlands.
| GlobeDB : Autonomic Data Replication for Web Applications. Dr. Swaminathan Sivasubramanian et al.
| Finger Printing Through Random Polynomials. Micheal O. Rabin, Dept. of Mathematics, The Hebrew
University of Jerusalem.
| Opportunistic Use of Content Addressable Storage for Distributed File Systems. Niraj Tolia et al. Intel
Research Pittsburg.
| Replication for web hosting systems. ACM Computing Surveys. S. Sivasubramanian, M. Szymaniak, G.
Pierre, and M. van Steen. A case for dynamic selection of replication and caching strategies. In
Proceedings of the Eighth International Workshop Web Content Caching and Distribution S.
Sivasubramanian, G. Pierre, and M. Van Steen
| Akamai EdgeSuite. http://www.akamai.com/en/html/services/edgesuite.html.
| DBProxy: A dynamic data cache for Web applications. In Proc. Intl. Conf. on Data Engineering, K. Amiri, S.
Park, R. Tewari, and S. Padmanabhan.
| Characterizing the scalability of a large web‐based shopping system. ACM Transactions on Internet
Technology, M. Arlitt, D. Krishnamurthy, and J. Rolia.
| Adaptive database caching with DBCache. Data Engineering, C. Bornh¨ovd, M. Altinel, C. Mohan, H.
Pirahesh, and B. Reinwald.
| Towards robust distributed systems. Proc. ACM Symp. on Principles of Distributed Computing, E. A.
Brewer.

7

Neutrino

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Neutrino

Uploaded by

Copyright:

Available Formats

Roll No. Name Mark1 Mark2 Total

Roll No. Name Mark1 Mark2 Total

Roll No. Name Mark1 Mark2 Total

Roll No. Name Total Roll No. Mark1 Mark2

You might also like