You are on page 1of 4

BLISS

IST-1999-14190

Blind Source Separation


and Applications

The BLISS Soundbase


A Database of Audio Wave Samples
for Blind Source Separation
Deliverable D8

Report Version: Final


Report Preparation Date: 25th May 2001
Classification: Public
Contract Start Date: 1 June 2000 Duration: 36 months
Project Co-ordinator: INESC
Partners: HUT, INPG, GMD, McMaster University

Project funded by the European


Community under the
“Information Society Technologies”
Programme (1998-2002)

1
Introduction
Acoustic mixtures represent an important integrating factor in the project as they can be
used for benchmarking the algorithms from workpackage 1. To provide such data in a
convenient form, we collected existing wave files from the world-wide-web and produced
new recordings with two microphones in an office environment. All these datasets are
now available via the BLISS Soundbase.

What is the BLISS Soundbase


The BLISS soundbase is a database that gives researchers working in the area of blind
source separation easy access to a large variety of different sound files that can help to
test their algorithms. The database can be accessed online at
http://ida.first.gmd.de/˜ harmeli/BLISS. For each data set we store a
bibtex entry that can be easily searched and processed in an automated manner. For
example, a typical bibtex entry looks like:
@Misc{parra4,
author = {Lucas Parra},
homepage = {http://www.humanism.org/{\˜}lucas/},
title = {Speaker with TV set},
year = {1998},
url = {http://www.sarnoff.com/career_move/tech_papers/BSS.html},
cached = {ftp://ftp.first.gmd.de/pub/ziehe/BLISSsoundbase/parra4},
abstract = {This is an example of a strongly
reverberating environment. The interfering source
(TV set) has little direct signal to the two microphones
and instead reflects of a wall of the room.},
postscript = {ftp://ftp.first.gmd.de/pub/ziehe/BLISSsoundbase/parra1/parra98convolutive.ps}
}

This entry collects the relevant information for one data set. It specifies who produced the
soundfile, a title for identification, when it was produced and an URL where to download
the file. This URL might also point to a web-page where the user can obtain more infor-
mation about the data set. The abstract field is intended to contain various information.
This might include the type of mixing, the sampling rate, the size of the files, the length
of the recordings, the number of sources, and other useful information. Furthermore, the
homepage of the author of the data set can be given and a link to the paper where this
data set has been used. For the files that we store locally as a cached copy there is the
possibility to keep a pointer to this cached version.

Searching the database


Using the search form this database can be searched from any browser in the internet.
Any entry in the database will be found that contains the chosen keywords. The search
results can be shown in three different formats. To begin with there is a short version:

 Lucas Parra, Speaker with TV set, 1998. (PostScript) (Cached)

Secondly, there is a longer version that includes the abstract:

 Lucas Parra, Speaker with TV set, 1998. (PostScript) (Cached)

2
This is an example of a strongly reveberating environment. The interfer-
ing source (TV set) has little direct signal to the two microphones and
instead reflects of a wall of the room.

Note that in the short format and in the format with the abstract the name of the author
points to the homepage (in case it is given in the BibTeX entry), the name of the data set
points to the URL, the phrase ”Postscript” points to the related paper (if given) and the
phrase ”Cached” to the cached version of the data (again if given). For transparency, the
user can view as a third option the entries in BibTeX format. An example is shown above.
The search string might contain ”*” and ”?”. As usual the star (”*”) matches any string
and the question mark (”?”) matches any character, e.g. ”convolu*” matches ”convolu-
tive”, ”convolutively”, ”convolution” etc.

Adding new entries


The success of this database depends on the participation of other researchers working on
blind source separation to make it a place to exchange data files to test algorithms in a
reproducible manner. There are two ways to submit new entries through the internet:

 The standard form to add new entries let’s the user add new entries just by typing
the relevant information into the fields of an html-form. The page gives for each
field a little explanation what we are asking for.

 In case the user knows the BibTeX format he or she can submit one or even more
BibTeX entries at a time using the expert form. But note that the new entries must
comply the format given above.

After submission it might take some time (a day or two) until the newly added entries
appear in the list.

Implementation
We implemented the BLISS soundbase using freely available standard tools. That is the
reason why we have chosen the BibTeX format to store the entries. This enables us to use
programs like bib2html and tt biblook to process the search queries and to generate
automatically html code for display. In addition to that we wrote several Perl scripts to
process the data that is sent by the html-forms through common-gate-way interface (CGI).

Summary
To conclude, we list the main features of the BLISS soundbase:

 Through the possibility to share data set, the BLISS soundbase provides a platform
for reproducible research.

3
 Most of the data sets are kept locally as a cached version. This gives the user access
to the data even in case the original site is temporarily down.

 Included in each entry is a link to the homepage of the author. This facilitates the
communication between different researchers.

 The entries can even point to a relevant paper where the data set has been used
before.

 The user interface is fast and easy to use since it is based on BibTeX and standard
CGI-technologies.

You might also like