You are on page 1of 3

Interacting with Big Data

The Sloan Digital Sky Survey public SkyServer


Advanced Topics in Astrophysics: Modern Galactic Astronomy Professor Beth Willman, Haverford College Read http://www.economist.com/node/15557443 http://en.wikipedia.org/wiki/Big_data (a decent overview). Background As science in the 21st century moves forth, the rate at which massive scientific datasets are accumulated and stored is rapidly increasing. Projects like the Large Hadron Collider (LHC), Kepler, and GAIA are now technically limited by the amount of data they can store and analyze. For example, The LHC generates hundreds of millions of collisions per second, but only a tiny fraction can be stored and analyzed. When it comes online, the Large Synoptic Survey Telescope (LSST) will produce 20 TB of data per night, with 50 PB of data saved by the end of the survey. The Sloan Digital Sky Survey (SDSS) project has paved the way for how surveys such as LSST can organize and serve out massive amounts of data to the public. Although the 140 TB of public SDSS data may seem paltry by the standards of tomorrow, the first few weeks of SDSS survey operations collected more data than in the entire prior history of astronomy (from Economist article). The prevalence of big datasets in scientific settings and in industry is increasing. A common way to manage and access such datasets is SQL (Structure Query Language). Each of you will thus interact with the SDSS dataset, both through web forms and through running a simple SQL query. You will download three star catalogs to be used in your first Project Set. SDSS public SkyServer The tenth data release of the Sloan Digital Sky Survey is served to the public through: http://skyserver.sdss3.org/dr10/en/home.aspx There are several types of data that can be obtained through this website, including images, photometric catalog data, and spectroscopic catalog data. These data are accessed through web forms, through SQL queries, and through downloading files from the Science Archive Server.

Accessing data with web forms Under Data Access on the website above, Navigate, Finding Chart, and Search provide web interfaces to access SDSS data. The first two provide interactive access to images and the third allows you to download a catalog of data. 1. Play around with the Navigate and Finding Chart tools. Look up a couple of interesting astronomical objects. Then play around a bit with the Search tool, to see what sorts of data are available through the webform. Accessing data via SQL query Using a SQL query to download a catalog of SDSS data gives the user more options than possible with a web interface alone. SQL queries are very basic in structure and consist of three pieces: a SELECT statement, a FROM statement, and a WHERE statement. Despite this seemingly simple structure, SQL can be used to execute quite sophisticated queries of well-structured databases. The schema that describes the SDSS catalog that is accessible via SQL query is located here: http://skyserver.sdss3.org/dr10/en/help/browser/browser.aspx Browsing through the schema to learn which parameters are included in the database and where takes a little practice. You will get some practice and guidance in the tutorial below. For purposes of this course (both the Project sets and research projects): Star (a View), SpecObjAll (a Table), and ProperMotions (a Table) are the only database elements that you should need to query. 2. Follow the SDSS SQL Tutorial here (skip sections 6, 7, 10, 11): http://skyserver.sdss3.org/dr10/en/help/howto/search/introduction.aspx 3. Use a SQL Search tool to download catalogs of stars in three different, 0.5 x 0.5 degree, Milky Way fields. Include: psf magnitudes in g and r and associated measurement uncertainties, reddening in g and r. Use these as your field centers: RA (deg) DEC (deg) 1 234 58 2 184 32 3 318 0.5

Examples of more advanced queries Later in the course, you may need to conduct slightly more sophisticated queries of the SDSS public data. Here are example queries run in CASJobs that join the photometric database with the proper motion database, and that join the spectroscopic database with the photometric database:

SELECT s.ra, s.dec, s.raErr, s.decErr, s.mjd, s.psfmag_gextinction_g as g, s.psfmag_r-extinction_r as r, s.psfmag_iextinction_i as i, s.psfmagerr_g, s.psfmagerr_r, s.psfmagerr_i, m.pmRa, m.pmDec, m.pmRaErr, m.pmDecErr, m.match, m.dist22, m.nfit, m.O, m.J into mydb.PMcatalog FROM Star s JOIN propermotions m ON s.objID = m.objID WHERE (s.clean = 1 AND ((ra BETWEEN 204.0 and 219.0 AND dec BETWEEN 21.0 and 36.0)) )

SELECT s.ra,s.dec, s.elodieLogG, s.elodieFeH,s.elodieObject, s.class, s.subclass, s.elodieTEff, s.elodieZ, s.elodieZErr, p.psfMag_g-p.extinction_g as g, p.psfMag_r-p.extinction_r as r, p.psfMag_i- p.extinction_i as i, p.psfMagErr_g, p.psfMagErr_r, p.psfMagErr_i, p.type into mydb.spectrotest FROM SpecObjAll as s JOIN PhotoObj AS p ON s.bestObjID = p.objID WHERE s.zWarning = 0

You might also like