You are on page 1of 46

Table of contents

Sno
TITLE
Page Nos
1

ABSTRACT..

OVERVIEW..
2.1 Purpose of the project.

2.2
Existing
system
2.3

Proposed system.

REQUIREMENT SPECIFICATION.............
3.1 Hardware requirements
3.2
Software
requirements

FEASIBILITY STUDY..

4.1 Technical
feasibility.
4.2

Operational feasibility

4.3 Economic feasibility

LANGUAGE
SPECIFICATION

5.1

Introductio to JAVA

5.2
JavaScript
5.3

JSP

5.4
Servlet
5.5
MySQL
Database
5.6
Net
beans...
5.7
Apache
Tomcat
5.8
Glassfish.
5.9 Web application

SYSTEM
DESIGN..
6.1
System
Architecture..
6.2
Data flow
diagrams
6.3

E-R Diagrams

6.4
UML
Diagrams

7
SYSTEMDESCRIPTION

8
CODING

SYSTEM
TESTING.
9.1 Introduction to
Testing..
9.2 Test
Cases..

10

OUTPUT
SCREENS

11
CONCLUSION

12
BIBLIOGRAPHY..

1.ABSTRACT

Detection of emerging topics is now receiving renewed interest


motivated by the rapid growth of social networks. Conventional-termfrequency-based approaches may not be appropriate in this context,

because the information exchanged in social-network posts include not


only text but also images, URLs, and videos. These projects focus on
emergence of topics signalled by social aspects of these networks.
Specifically, project focus on mentions of user links between users that
are generated dynamically through replies, mentions, and retweets.
These projects recommend a probability model of the mentioning
behaviour of a social network user, and recommend detecting the
emergence of a new topic from the anomalies measured through the
model. Aggregating anomaly scores from hundreds of users, and this
project can detect emerging topics only based on the reply/mention
relationships in social-network posts. The recommend project show that
the recommend mention anomaly based approaches can detect new
topics at least as early as text-anomaly-based approaches, and in some
cases much earlier when the topic is poorly identified by the textual
contents in posts.

2. OVERVIEW

Overall description consists of background of the entire specific


requirement. It also gives explanation about actor which is used. It
gives explanation about architecture diagram and it also gives what
we are assumed and dependencies. It also support specific requirement

and also it support functional requirement, supplementary requirement


other than actor which is used.

2.1 PURPOSE OF THE PROJECT


Communication over social networks, such as Facebook and Twitter, is
gaining its importance in our daily life. Since the information exchanged
over social networks are not only texts but also URLs, images, and
videos, they are challenging test beds for the study of data mining. In
particular, we are interested in the problem of detecting emerging topics
from social streams, which can be used to create automated breaking
news, or discover hidden market needs or underground political
movements. Compared to conventional media, social media are able to
capture the earliest, unedited voice of ordinary people. Therefore, the
challenge is to detect the emergence of a topic as early as possible at a
moderate number of false positives.

2.2 EXISTING SYSTEM


Emerging topic is something people feel like discussing, commenting, or
forwarding the information further to their friends. Conventional
approaches for topic detection have mainly been concerned with the
frequencies of textual words.

DISADVANTAGES OF EXISTING SYSTEM:

A term-frequency-based approach could suffer from the ambiguity


caused by synonyms or homonyms.
It may also require complicated pre-processing depending on the
target language.
Moreover, it cannot be applied when the contents of the messages
are mostly non-textual information.
On the other hand, the words formed by mentions are unique,
require little pre-processing to obtain and are available regardless
of the nature of the contents.

PROPOSED SYSTEM:

Recommended system proposed a new approach to detect the


emergence of topics in a social network stream.
The basic idea of this project is to focus on the social aspect of the
posts reflected in the mentioning behaviour of users instead of the
textual contents.

There is a probability model that captures both the number of


mentions per post and the frequency of mentionee.

ADVANTAGES OF PROPOSED SYSTEM:

The recommended method does not rely on the textual contents of


social network posts, it is robust to rephrasing and it can be applied
to the case where topics are concerned with information other than
texts, such as images, video, audio, and so on.
The link-anomaly-based methods performed even better than the
keyword-based methods on NASA and BBC data sets.

3.REQUIREMENT SPECIFICATION
3.1 HARDWARE REQUIREMENTS
The hardware used for the development of project is:

System

Pentium IV 2.4 GHz.

Hard Disk

40 GB.

Floppy Drive

1.44 Mb.

Monitor

15 VGA Colour.

Mouse

Logitech.

Ram

512 Mb.

3.2 SOFTWARE REQUIREMENTS


The software used for the development of project is:

Operating system

Windows XP/7.

Language

JAVA

Front End

Jsp,Servlet,JavaScript

IDE

Netbeans 7.0

Application Server

Apache Tomcat 7.0/Glassfish

Back End

MYSQL 5.5

4. FEASIBILITY STUDY
Feasibility study is a process which defines exactly what a project is
and what strategic issues need to be considered to assess its feasibility,
or likelihood of succeeding. Feasibility studies are useful both when
starting a new business, and identifying a new opportunity for an
existing business. Ideally, the feasibility study process involves making

rational decisions about a number of enduring characteristics of a


project, including:
Technical feasibility- do we have the technology? If not, can we
get it?
Operational feasibility- do we have the resources to build the
system? Will the system be acceptable? Will people use it?
Economic feasibility, technical feasibility, schedule feasibility, and
operational feasibility- are the benefits greater than the costs?

4.1 TECHNICAL FEASIBILITY

Technical feasibility is concerned with the existing computer


system (Hardware, Software etc.) and to what extend it can support the
proposed addition. For example, if particular software will work only in
a computer with a higher configuration, an additional hardware is
required. This involves financial considerations and if the budget is a
serious constraint, then the proposal will be considered not feasible.

4.2 OPERATIONAL FEASIBILITY

Operational feasibility is a measure of how well a proposed system


solves the problems, and takes advantages of the opportunities identified
during scope definition and how it satisfies the requirements identified

in the requirements identified in the requirements analysis phase of


system development.

4.3 ECONOMIC FEASIBILITY

Economic analysis is the most frequently used method for


evaluating the effectiveness of a candidate system. More commonly
known as cost/ benefit analysis, the procedure is to determine the
benefits and savings that are expected from a candidate system and
compare them with costs. If benefits outweigh costs, then the decision
is made to design and implement the system.

5. LANGUAGE SPECIFICATIONS
5.1 INTRODUCTION TO JAVA:

Java is a general-purpose computer programming language that is


concurrent, class-based, object-oriented, and specifically designed to
have as few implementation dependencies as possible. It is intended to
let application developers "write once, run anywhere", meaning that

code that runs on one platform does not need to be recompiled to run on
another. Java applications are typically compiled to byte code that can
run on any Java virtual machine (JVM) regardless of computer
architecture. Java is, as of 2014, one of the most popular programming
languages in use, particularly for client-server web applications, with a
reported 9 million developers. Java was originally developed by James
Gosling at Sun Microsystems and released in 1995 as a core component
of Sun Microsystems' Java platform. The language derives much of its
syntax from C and C++, but it has fewer low-level facilities than either
of them.
The original and reference implementation Java compilers, virtual
machines, and class libraries were originally released by Sun under
proprietary licences. As of May 2007, in compliance with the
specifications of the Java Community Process, Sun relicensed most of its
Java technologies under the GNU General Public License.

The Java compiler

When you program for the Java platform, you write source code
in .java files and then compile them. The compiler checks your code
against the language's syntax rules, then writes out byte codes in .class
files. Byte codes are standard instructions targeted to run on a Java
virtual machine. In adding this level of abstraction, the Java compiler
differs from other language compilers, which write out instructions
suitable for the CPU chipset the program will run on.

The JVM

At run time, the JVM reads and interprets .class files and executes
the program's instructions on the native hardware platform for which the
JVM was written. The JVM interprets the byte codes just as a CPU
would interpret assembly-language instructions. The difference is that
the JVM is a piece of software written specifically for a particular
platform. The JVM is the heart of the Java language's "write-once, runanywhere" principle. Your code can run on any chipset for which a
suitable JVM implementation is available. JVMs are available for major
platforms like Linux and Windows, and subsets of the Java language
have been implemented in JVMs for mobile phones and hobbyist chips.
The Garbage Collector

Rather than forcing you to keep up with memory allocation the


Java platform provides memory management out of the box. When your
Java application creates an object instance at run time, the JVM
automatically allocates memory space for that object from the heap,
which is a pool of memory set aside for your program to use. The Java
garbage collector runs in the background, keeping track of which objects
the application no longer needs and reclaiming memory from them. This
approach to memory handling is called implicit memory management
because it doesn't require you to write any memory-handling code.
Garbage collection is one of the essential features of Java platform
performance.

The Java Development Kit

When you download a Java Development Kit you get in addition


to the compiler and other tools a complete class library of prebuilt
utilities that help you accomplish just about any task common to
application development. The best way to get an idea of the scope of the
JDK packages and libraries is to check out the JDK API documentation.
The Java Runtime Environment

The Java Runtime Environment includes the JVM, code libraries,


and components that are necessary for running programs written in the
Java language. It is available for multiple platforms. You can freely
redistribute the JRE with your applications, according to the terms of the
JRE license, to give the application's users a platform on which to run
your software. The JRE is included in the JDK.
Features Of Java Language

Java has so many features which are as follows:


Java is Simple

There are various features that makes the java as a simple


language. because Java is easy to learn and developed by taking the best
features from other languages mainly like C and C++. It is very easy to
learn Java who have knowledge of object oriented programming
concepts. Java provides the error free development environment for

programmer because it provide automatic memory management by


development environment and eliminate pointers.
Java is Platform Independent

Java provides the facility to "Write once -Run anywhere". Not


even a single language is idle to this feature but java is closer to this
feature. Java Provide the facility of cross-platform programs by
compiling in intermediate code known as byte code. This byte code can
be interpreted on any system which has Java Virtual Machine.
Java is Object-oriented

The object oriented language must support the characteristics of


the OOPs. And Java is a fully object oriented language. it supports all
the characteristics needed to be object oriented. In the Java everything is
treated as objects to which methods are applied. As the languages like
Objective C, C++ fulfills the above four characteristics yet they are not
fully object oriented languages because they are structured as well as
object oriented languages. But in case of java, it is a fully Object
Oriented language because object is at the outer most level of data
structure in java. No stand alone methods, constants, and variables are
there in java. Everything in java is object even the primitive data types
can also be converted into object by using the wrapper class.

`Java is distributed

The widely used protocols like HTTP and FTP are developed
in java. Internet programmers can call functions on these protocols and
can get access the files from any remote machine on the internet rather
than writing codes on their local system.
Java is Secure

Java does not use memory pointers explicitly. All the programs in
java are run under an area known as the sand box. Security manager
determines the accessibility options of a class like reading and writing a
file to the local disk. Java uses the public key encryption system to allow
the java applications to transmit over the internet in the secure encrypted
form. The bytecode Verifier checks the classes after loading.
1. No memory pointers
2. Programs run inside the virtual machine sandbox.
3. Array index limit checking
Java is compiled and interpreted
We all know that in Java code is compiled to byte codes that are interpreted
by Java virtual machines (JVM). This provides portability to any machine for
which a virtual machine has been written. The interpreter program reads the source
code and translates it on the fly into computations. The two steps of compilation
and interpretation allow for extensive code checking and improved security.

Java is Robust

Java has the strong memory allocation and automatic garbage


collection mechanism. It carries out type checking at both compile and
runtime making sure that every data structure has been clearly defined
and typed. compiler checks the program for any error and interpreter
checks any run time error that every data structure is clearly defined and
typed. Java manages the memory automatically by using an automatic
garbage collector. All the above features make Java language robust.
Java is Portable

The feature of java "write once -run any where" make java
portable. Many type of computers and operating systems are used for
programs By porting an interpreter for the Java Virtual Machine to any
computer hardware/operating system, one is assured that all code
compiled for it will run on that system. This forms the basis for Java's
portability.
5.2 JavaScript:

A dynamic computer programming language. It is most commonly


used as part of web browsers, whose implementations allow client-side
scripts to interact with the user, control the browser, communicate
asynchronously, and alter the document content that is displayed. It is
also used in server-side network programming with frameworks such as
Node.js, game development and the creation of desktop and mobile
applications.

JavaScript is classified as a prototype-based scripting language with


dynamic typing and first-class functions. This mix of features makes it a
multi-paradigm language, supporting object-oriented, imperative, and
functional programming styles.
Despite some naming, syntactic, and standard library similarities,
JavaScript and Java are otherwise unrelated and have very different
semantics. The syntax of JavaScript is actually derived from C, while the
semantics and design are influenced by Self and Scheme programming
languages. JavaScript is also used in environments that aren't web-based,
such as PDF documents, site-specific browsers, and desktop widgets.
Newer and faster JavaScript virtual machines and platforms built upon
them have also increased the popularity of JavaScript for server-side
web applications. On the client side, JavaScript has been traditionally
implemented as an interpreted language, but more recent browsers
perform just-in-time compilation.
5.3 JSP

JavaServer Pages (JSP) is a server-side programming technology


that enables the creation of dynamic, platform-independent method for
building Web-based applications. JSP have access to the entire family of
Java APIs, including the JDBC API to access enterprise databases.JSP
may be viewed as a high-level abstraction of Java servlets. JSPs are
translated into servlets at runtime; each JSP servlet is cached and re-used
until the original JSP is modified. JSP can be used independently or as
the view component of a server-side modelviewcontroller design,
normally with JavaBeans as the model and Java as the controller. This is
a type of Model 2 architecture.JSP allows Java code and certain predefined actions to be interleaved with static web markup content, with
the resulting page being compiled and executed on the server to deliver a
document. The compiled pages, as well as any dependent Java libraries,
use Java byte code rather than a native software format. Like any other
Java program, they must be executed within a Java virtual machine that
integrates with the server's host operating system to provide an abstract
platform-neutral environment.

JSPs are usually used to deliver HTML and XML documents, but
through the use of OutputStream, they can deliver other types of data as
well.
The Web container creates JSP implicit objects like pageContext,
servletContext, session, request & response.
A JavaServer Pages compiler is a program that parses JSPs, and
transforms them into executable Java Servlets. A program of this type is
usually embedded into the application server and run automatically the
first time a JSP is accessed, but pages may also be precompiled for better
performance, or compiled as a part of the build process to test for errors.

5.4 Servlet:

A Servlet is basically a Java Program that executes within a Web server


or an Application Server, acting as a middle layer between requests sent
from a web client and a database on the HTTP server. By use of Servlets,
you can dynamically come up with web pages, obtain information from
users through web forms and display records from a database.

Servlets are most often used to:


1. Process or store data that was submitted from an HTML form.
2. Provide dynamic content such as the results of a database query
3. Manage state information that does not exist in the stateless HTTP
protocol, such as filling the articles into the shopping cart of the
appropriate customer.
With that in mind, a Servlet is a Java class that complies to the Java
Servlet API. This API is the standard for executing Java classes that
respond to requests. Javax.servlet.http is a package that specifies HTTP
specific subclasses for the communication of the Servlet and the Servlet

container. Therefore, you can use a Servlet to establish dynamic content


to a web server through the Java platform. The dynamic content
generated is usually HTML but it may be in other forms such as XML.
Servlets can also be used to maintain state in session variable through
the use of HTTP cookies or URL rewriting. Servlets are usually
packaged in a WAR file.

5.5 MySQL Database

MySQL is the most popular Open Source Relational SQL


database management system. MySQL is one of the best RDBMS being
used for developing web-based software applications.
A Relational Database Management System is software that:
Enables you to implement a database with tables, columns and
indexes.
Guarantees the Referential Integrity between rows of various
tables.
Updates the indexes automatically.
Interprets an SQL query and combines information from various
tables.

RDBMS Terminology:

Before we proceed to explain MySQL database system, let's revise few


definitions related to database.
Database: A database is a collection of tables, with related data.
Table: A table is a matrix with data. A table in a database looks like
a simple spreadsheet.
Column: One column (data element) contains data of one and the
same kind, for example the column postcode.
Row: A row (= tuple, entry or record) is a group of related data, for
example the data of one subscription.
Redundancy: Storing data twice, redundantly to make the system
faster.
Primary Key: A primary key is unique. A key value cannot occur
twice in one table. With a key, you can find at most one row.
Foreign Key: A foreign key is the linking pin between two tables.
Compound Key: A compound key (composite key) is a key that
consists of multiple columns, because one column is not
sufficiently unique.
Index: An index in a database resembles an index at the back of a
book.
Referential Integrity: Referential Integrity makes sure that a
foreign key value always points to an existing row.

MySQL is a fast, easy-to-use RDBMS being used for many small


and big businesses. MySQL is developed, marketed, and supported by
MySQL AB, which is a Swedish company. MySQL is becoming so
popular because of many good reasons:
MySQL is released under an open-source license. So you have
nothing to pay to use it.
MySQL is a very powerful program in its own right. It handles a
large subset of the functionality of the most expensive and
powerful database packages.
MySQL uses a standard form of the well-known SQL data
language.
MySQL works on many operating systems and with many
languages including JAVA, etc.
MySQL works very quickly and works well even with large data
sets.
MySQL is very friendly to PHP, the most appreciated language for
web development.
MySQL supports large databases, up to 50 million rows or more in
a table. The default file size limit for a table is 4GB, but you can
increase this.
MySQL is customizable. The open-source GPL license allows
programmers to modify the MySQL software to fit their own
specific environments.

Net beans:

NetBeans

is

an

integrated

development

environment

for

developing primarily with Java, but also with other languages, in


particular PHP, C/C++, and HTML5.It is also an application
platform framework for Java desktop applications and others.
The NetBeans IDE is written in Java and can run on Windows, OS
X, Linux, Solaris and other platforms supporting a compatible
JVM. The NetBeans Platform allows applications to be developed
from a set of modular software components called modules.
Applications based on the NetBeans can be extended by third party
developers. The NetBeans Team actively support the product and
seek feature suggestions from the wider community.
Apache Tomcat:

Apache Tomcat is an open source web server and servlet container


developed

by

the

Apache

Software

Foundation.

Tomcat

implements several Java EE specifications including Java Servlet,


JavaServer Pages (JSP), Java EL, and WebSocket, and provides a
"pure Java" HTTP web server environment for Java code to run in.

Glassfish:

GlassFish is an open-source application server project started by


Sun Microsystems for the Java EE platform and now sponsored by
Oracle Corporation. The supported version is called Oracle GlassFish
Server.
GlassFish is the reference implementation of Java EE and as such
supports Enterprise JavaBeans, JPA, JavaServer Faces, JMS, RMI,
JavaServer Pages, servlets, etc. This allows developers to create
enterprise applications that are portable and scalable, and that integrate
with legacy technologies. Optional components can also be installed for
additional services.
Web Application:

It has also added user- as well as system-based web applications


enhancement to add support for deployment across the variety of
environments. It also tries to manage sessions as well as applications
across the network.
Tomcat is building additional components. A number of additional
components may be used with Apache Tomcat. These components may
be built by users should they need them or they can be downloaded from
one of the mirrors.

6. SYSTEM DESIGN
6.1 SYSTEM ARCHITECTURE

6.2 DATA FLOW DIAGRAMS

LEVEL 0:

LEVEL 1:

LEVEL 2:

Email

6.3 ER DIAGRAM

Password

Username

Friend
Admin
Name
Level 4

User

Use case diagram:

Write Post

Username

Adm

De
ano

Po

Class diagram:

Sequence diagram:

Activity diagram:

7. SYSTEM DESCRIPTION

Event Detection Streams


Event Description Module
User Profiling In Social Media
Kleinbergs Burst-Detection Method
Data Set.

1. Event Detection Streams

Microblogs have become an important source for reporting real-world


events. A real-world occurrence reported in microblogs is also called a
social event. Social events may hold critical materials that describe the
situations during a crisis. In real applications, such as crisis management
and decision making, monitoring the critical events over social streams
will enable watch officers to analyze a whole situation that is a
composite event, and make the right decision based on the detailed
contexts such as what is happening, where an event is happening, and
who are involved. Although there has been significant research effort on
detecting a target event in social networks based on a single source, in
crisis, we often want to analyze the composite events contributed by
different social users. So far, the problem of integrating ambiguous
views from different users is not well investigated. To address this issue,
we propose a novel framework to detect composite social events over
streams, which fully exploits the information of social data over multiple

dimensions. Specifically, we first propose a graphical model called


location-time constrained topic (LTT) to capture the content, time, and
location of social messages. Using LTT, a social message is represented
as a probability distribution over a set of topics by inference, and the
similarity between two messages is measured by the distance between
their distributions. Then, the events are identified by conducting efficient
similarity joins over social media streams. To accelerate the similarity
join, we also propose a variable dimensional extendible hash over social
streams. We have conducted extensive experiments to prove the high
effectiveness and efficiency of the proposed approach.

2. Event description module

The rise of Social Media services in the last years has created huge
streams of information that can be very valuable in a variety of
scenarios. What precisely these scenarios are and how the data streams
can efficiently be analyzed for each scenario is still largely unclear at
this point in time and has therefore created significant interest in
industry and academia. In this paper, we describe a novel algorithm for
geo-spatial event detection on Social Media streams. We monitor all
posts on Twitter issued in a given geographic region and identify places
that show a high amount of activity. In a second processing step, we
analyze the resulting spatio-temporal clusters of posts with a

MachineLearning component in order to detect whether they constitute


real-world events or not. We show that this can be done with high
precision and recall. The detected events are finally displayed to a user
on a map, at the location where they happen and while they happen.
3. User profiling in social media

A user profile is a visual display of personal data associated with a


specific user, or a customized desktop environment. A profile refers
therefore to the explicit digital representation of a person's identity. A
user profile can also be considered as the computer representation of
user .A profile can be used to store the description of the characteristics
of person. This information can be exploited by systems taking into
account the persons' characteristics and preferences. Profiling is the
process that refers to construction of a profile via the extraction from a
set of data. User profiles can be found on operating systems, computer
programs, recommender systems, or dynamic websites (such as online
social networking sites or bulletin boards).
A social

networking

service is

platform

to

build social

networks or social relations among people who share interests, activities,


backgrounds or real-life connections. A social network service consists
of a representation of each user (often a profile), his or her social links,
and a variety of additional services. Social networks are web-based

services that allow individuals to create a public profile, to create a list


of users with whom to share connections, and view and cross the
connections within the system. Most social network services are webbased and provide means for users to interact over the Internet, such
as e-mail and instant messaging. Social network sites are varied and they
incorporate new information and communication tools such as mobile
connectivity,

photo/video/sharing

and

blogging. Online

community services are sometimes considered as a social network


service, though in a broader sense, social network service usually means
an individual-centered service whereas online community services are
group-centered. Social networking sites allow users to share ideas,
pictures, posts, activities, events, interests with people in their network.
A social

network is

a social

structure made

up

of

set

of social actors (such as individuals or organizations) and a set of


the dyadic ties between these actors. The social network perspective
provides a set of methods for analyzing the structure of whole social
entities as well as a variety of theories explaining the patterns observed
in these structures.[1] The study of these structures uses social network
analysis to identify local and global patterns, locate influential entities,
and examine network dynamics.
4. Kleinbergs Burst-Detection Method

In addition to the change-point detection based on SDNML


followed by DTO described in previous sections, we also test the
combination of our method with Kleinbergs burst-detection method.
More specifically, we implemented a two-state version of Kleinbergs
burst detection model. The reason we chose the two-state version was
because in this experiment we expect no The proposed link-anomalybased change-point detection is highly scalable. Every step described in
the previous subsections requires only linear time against the length of
the analyzed time period. Computation of the predictive distribution for
the number of mentions can be computed in linear time against the
number of mentions. Computation of the predictive distribution for the
mention probability and can be efficiently performed using a hash table.
Aggregation of the anomaly scores from different users takes linear time
against the number of users, which could be a computational bottle neck
but can be easily parallelized. SDNML-based change-point detection
requires two swipes over the analyzed time period. Kleinbergs burstdetection method can be efficiently implemented with dynamic
programming.
5. Data set.

This data set is related to the recent leakage of some confidential


video by the Japan Coastal Guard officer. The keyword used in the
keyword-based methods was Senkaku. the results of link-anomaly
based change detection and burst detection, respectively. Text-anomaly-

based change detection and burst detection, respectively. This data set is
related to a controversial post by a famous person in Japan that the
reason students having difficulty finding jobs is, because they are
stupid and various replies to that post. The keyword used in the
keyword-based methods was Job hunting. The four data sets we
collected are called Job hunting, Youtube, NASA, BBC and
each of them corresponds to a user organized list in Togetter.
For each list, we extracted a list of Twitter users that appeared in
the list, and collected Twitter posts from those users. Number of
participants and the number of posts we collected for each data set. Note
that we collected Twitter posts up to 30 days before the time period of
interest for each user; thus, the number of posts we analyzed was much
larger than the number of posts listed in Togetter. This data set is related
to the discussion among Twitter users interested in astronomy that
preceded NASAs press conference about discovery of an arsenic-eating
organism. This data set is related to angry reactions among Japanese
Twitter users against a BBC comedy show that asked who is the
unluckiest person in the world (the answer is a Japanese man who got
hit by nuclear bombs in both Hiroshima and Nagasaki but survived).

You might also like