Professional Documents
Culture Documents
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/297206976
CITATION READS
1 68
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Shahzaib Tahir on 07 March 2016.
Abstract— Big Data is a term associated with large datasets forensic is the investigation done to hold a person culprit or
that come into existence with the volume, velocity and variety of innocent of doing a cyber crime. Keeping this fact under
data. An ever increasing human dependence on computers and consideration the integrity of the digital data can be termed
automated systems has caused data to increase massively. The highly critical during the investigation.
substantial collection of data is not only helpful for researchers
but equally valuable to investigators who intend to carry out Advancements in the field of digital forensics have
forensic analysis of data associated with the criminal cases. The resulted in the development of methodologies that assist in
conventional methodologies of performing forensic analysis have carrying out forensic activities. The information age has
changed with the emergence of big data because big data forensic resulted in the generation of huge amounts of data that can
requires more sophisticated tools along with the deployment of serve as evidence, clue, fact or an indication of what
efficient frameworks. Up till now several techniques have been happened. Acknowledgement of the importance of data and
devised to help the forensic analysis of small datasets but none of the linkages between data has given rise to the term “Big
the techniques have been studied by coupling them with big data. Data”. Big data has forced the development of techniques and
Hence in this paper different techniques have been studied by tools that are applicable to data sets so large and complex that
closely analyzing their feasibility in the extraction and the conventional data processing techniques cannot be applied to
forensic analysis of evidence from large amounts of data. In this them. The techniques should be user friendly, highly
paper we discuss various sources of data and how techniques interactive and equally aesthetic to facilitate the process of
such as the MapReduce framework and phylogenetic trees can
investigation.
help a forensic investigator to visualize large data sets to conduct
a forensic analysis. Since audio and video are an attractive source According to a recent survey conducted by the American
of forensic data therefore this paper also discusses the latest Institute of CPA “Big Data is listed as the top issue facing
techniques that assist in the extraction of useful sound signals forensic and valuation professionals in next two to five years”
from noise infested audio signals. Similar techniques for forensic [2].Over the past decade extensive research was being done to
analysis of the images have also been presented. Based upon improve the digital forensic techniques in order to make the
interviews conducted with the forensic professionals, the factors task of the forensic investigator easier. Scope of the
affecting big data forensic techniques along with their severity
investigations was limit to a workstation, office or an
have been identified so that a scenario specific approach can also
organization due to which the tools being developed were not
be adopted based upon the available investigative resources.
applicable to large datasets. This era requires digital forensic
Keywords— Big Data; Forensics; Phylogenetic Trees; Digital investigations in corporate sector, multinational companies
Forensics; MapReduce; Hadoop Distributed File System; Blind and large scale data centres. Currently, extensive research is
Source Separation; Image Culling. being done to carryout big data analysis so the relationship
among the data can be uncovered in an effective and efficient
manner. Many researchers have proposed different techniques
I. INTRODUCTION
that have their own advantages and disadvantages. Often focus
As humans become increasingly reliant on computers, has remained on the use of trees that can help in visualizing
incidents involving computing based crime have also risen. large data sets and revealing the relationship among those
Cyber crime is a term that refers to the use of computers to datasets [3][4]. Internet can be termed as a sea of data and this
carry out a crime. Owing to its digital nature cyber crime sea is becoming deeper with the passage of every second.
cannot be investigated using conventional investigative Hence it isn’t wrong to associate the term big data to the data
techniques and requires sophisticated software to conduct the residing over the internet. For the analysis of the data over the
investigation. Digital forensics can be defined in a number of internet or Apache server, Hadoop distributed file system is
ways but a concise definition would explain digital forensics used which employs the MapReduce Framework [5]. In the
as scientific methodologies or steps that are taken towards the world of digital forensic, sometimes the evidence can be
collection, identification, analysis, documentation of the audio. The extraction of desired signals from a large set of
evidence that is derived from a digital source and can be signals or mixture of signals requires a technique termed as
presented in the court when required [1]. Hence digital