You are on page 1of 3

Fourth Paradigm

CONCEPT
Scientific discovery has been done throughout the years. However, the approach and
ways of doing it have changed dramatically. At first, scientific discovery was empirical,
based on observation of natural phenomena. With the development of the society and
human knowledge, a more theoretical approach was used, where men discovers and
analyzes new information using models and generalizations Third, on the last few
decades, thanks to technological advances, scientific discovery is done by recurring to the
computational power available, to run simulations and simulate natural environments,
done much faster and comfortably.
A new approach is emerging, and this Is the one that encapsulates the Fourth Paradigm
ideas: this new approach unifies theory, experiment and computer science. Here,
information is extensively collected, analyzed and transported through a long processing
pipeline. Data collection is not done manually by scientists, but gathered from tools,
simulations and complex instruments, and then parsed and analyzed. Scientists only have
access to it sometimes on the end of this process, where it is apt to be used and
concluded on.
The problem is that, as research work nowadays returns a massive collection of results, all
this data is hard to manage and understand. Its also not easy to share information
through research teams, store it and make it available in an efficient manner.
To try to solve these problems, there is a growing demand for tools and technologies that
are more generalized and applicable to a larger extent, not only to specific research areas
(including the creation of Laboratory Information Management Systems).
Even when new developments are achieved (mostly for huge projects with funding for
software developments) some of these developments cannot be reused, as they are
specific to the projects scope. This expands the gap between big projects and smaller
projects, and makes the standardizing of tools and software much more difficult.
If all these contributions (both by bigger and smaller projects) could be collected and
organized, all information could be shared and insights and conclusion could be given
based on much more solid grounds (more information, more data reliability).
The main idea to have in mind, and the one that summarizes the Fourth Paradigm
concept, is that we can leverage on the technological innovations and research available

currently to create tools and mechanisms that can help us manage, store, share, access
and consume data on a much more efficient and effective way.

OPEN DATA
The management of information, and in particular important data from varied sources,
such as scientific discoveries, legal reports, and other contributions, is an increasingly
recurring theme on current days.
The Open Data vision, an idea where certain data must be correctly stored and most
importantly, freely available to public use, has gain terrain thanks to the increasing need
and desire to access massive sets of data in order to leverage on the relevant information
it can provide.
One of the most important fields that can take advantage of this idea is scientific discovery
and research in general.
As said earlier, nowadays most information processed on scientific research comes from
advanced tools and machines that generate massive amounts of data, based on
simulations and models. It is then easy to understand the advantage of having all this
information (or even a more processed and parsed version of it) available to similar
researches. Scientists all around the world can benefit from research done all over the
world, and share their conclusions, greatly speeding the scientific discovery process.
On a wider scope, this big vision (possible through the ideas coming from the Fourth
Paradigm) can benefit other fields, such as legal fields, where past information (former
cases and former decisions for example) can be used to great benefit. If all this
information could be available to all people with possible interest in it, great changes
could occur on how we use information, and how we can leverage on it.
On this big vision of Open Data, the Fourth Paradigm ideas will certainly come into play, as
the generalized use of information cannot be possible without a correct and efficient
management of all that information.

TECHNIQUES
Efficient Index Construction
To ensure that all information available to be used on scientific discovery is effectively
available (is possible to access in a fast and efficient manner), all this information must be
indexed.
To leverage on all the possibilities, a strong and powerful structure for indexing must be
built.
Inverted indexes have to be built, so it is easy to find data that is relevant to a particular
field and research. If the process of accessing the data is not simple, the interest on doing
it will surely be hurt and another approach must be sought.
The indexing must be done on a distributed way of course, because data, even if not
globally shared at first (to access by all interested), will be generated at different places
but should be available everywhere.
Specifically, the indexing work can be done locally, with the sharing facility being
responsible for its processing (treated as a node on the network). This distributed
approach is much more efficient and of course effective, because, as seen on the Fourth
Paradigm book, data sets are massive and using only very limited computer clusters can
be hard.
To achieve this, and to allow the ideas of the Fourth Paradigm to succeed, tools must be
provided to make this process as simple and invisible to the scientist/user as possible.

Querying
Even if information is available and relatively organized on each of the data centers it is
hosted on, no useful practical results can be taken from this if there is no way to retrieve
information on a simple and efficient way.
It is impossible to simply navigate all this information until relevant information is found,
so powerful querying techniques must be implemented to allow users to get information
they find relevant. As querying massive sets of data can be unfeasible and extremely
heavy, information needs to be structured (meta-data must be included on sets of data for
quick identification for example).

You might also like