You are on page 1of 6

RDF APIs using .

NET Framework
SemWeb & dotNetRDF

Mazilu Liviu-Andrei Pintilie Radu-Stefan


ISS2

In the following we will compare two .NET Framework APIs for working with RDF. The two are SemWeb and dotNetRDF. We will compare the two frameworks using the following criteria : IDE integration, triple storage, SPARQL interogation support, performance, level of documentation and licensing. SemWeb URI:http://razor.occams.info/code/semweb/ Author: Joshua Tauberer Current version : 1.064 (11/05/2009) License: GNU GPL v2 dotNetRDF URI: http://www.dotnetrdf.org/ Author: Rob Vesse Current version: Version 0.1.2 Alpha (27/11/2009) License: GNU GPL

Why SemWeb and dotNetRDF?


These two libraries were chosen because both were easy to integrate in .NET framework. A simple reference to the dynamic libraries provided in the download package will give the users full acces to the APIs provided.

Triple Storage
Both SemWeb and dotNetRDF use similar ways of storing triples. SemWebs type for a RDF triple is defined as Statement, whyle dotNetRDF uses a Triple type. Each node in a statement is defined by both APIs as being either a literal or an entity although the types used differ for each implementation. SemWeb
Entity computer = new Entity("http://example.org/computer"); Entity description = new Entity("http://example.org/description"); Entity says = "http://example.org/says"; Entity wants = "http://example.org/wants"; Statement assertion = new Statement(computer, says, new Literal("Hello world!"));
Taken from [1]

dotNetRDF
URINode dotNetRDF = CreateURINode(new Uri("http://www.dotnetrdf.org")); URINode says = CreateURINode(new Uri("http://example.org/says")); LiteralNode helloWorld = CreateLiteralNode("Hello World"); LiteralNode bonjourMonde = CreateLiteralNode("Bonjour tout le Monde", "fr"); new Triple(dotNetRDF, says, helloWorld); new Triple(dotNetRDF, says, bonjourMonde);
Taken from [2]

To store this triples SemWeb uses what they call a MemoryStore, whyle dotNetRDF uses a Graph, thus the second one has more resemblance.

SemWeb
MemoryStore store = new MemoryStore(); store.Add(new Statement(computer, says, (Literal)"Hello world!")); store.Add(new Statement(computer, wants, desire)); store.Add(new Statement(desire, description, (Literal)"to be human")); store.Add(new Statement(desire, RDF+"type", (Entity)"http://example.org/Desire"));
Taken from [1]

dotNetRDF
Graph g = new Graph(); g.Assert(new Triple(dotNetRDF, says, helloWorld)); g.Assert(new Triple(dotNetRDF, says, bonjourMonde)); foreach (Triple t in g.Triples) { Console.WriteLine(t.ToString()); }
Taken from [2]

Both libraries provide various methods to read RDF from files and URIs. The main difference is that while with SemWeb the user must select the parser used for reading a certain file (e.g.: RDF/XML, N-Triples, Turtle, or Notation 3) while dotNetRDF tries to chose the needed parser if it isnt specified manually. In order to test the performance of the parsers that the APIs provide us we parsed a set of large files. We found some large RDF files at http://chefmoz.org/rdf.html . The licesing of these files allowed us to modify them, and so we did in order to obtain 3 large RDF files (10MB, 50MB, 100MB). We then used these files to benchmark the RDF/XML parsers provided by SemWeb and dotNetRDF. Tests were run on an IntelCore2 Duo CPU T7300 @ 2.00 GHZ and 2GB Memory(RAM). Three tests were run on each API, each test with one of the three files obtained earlier. You can see the results in Table1 and Figure1. (Results are displayed using hours:minutes:seconds.milliseconds format as they are outputed by the internal StopWatch we used).

As you can see in benchmark it is clear that SemWeb has a way much better implementation of the RDF parser, storage and memory management. Actually as you can see on the benchmark when it comes to parsing large RDF files SemWeb is as much as ten times faster than dotNetRDF.We think that this happenes because of the MemoryStore it uses that is a type of Sink. So we can state that SemWeb has a better performance that dotNetRDF. 10MB 50MB 00:00:00.8418498 00:00:04.5484593 00:00:06.6143484 00:00:47.5463143 Table 1. Parsing times 100MB 00:00:10.7560606 FATAL ERROR

SemWeb dotNetRDF

50 45 40 35 30 25 20 15 10 5 0 SemWeb dotNetRDF 10MB 50MB 100MB

Figure 1. Parsing performance (seconds)

We also need to mention the surprise we had when we run the 100MB file test on dotNetRDF API. We did expect to take a lot of time due to the previous test results, but we did not expect to encounter a fatal error : Unhandled Exception: OutOfMemoryException. This occurred as the application filled all of the 1,5GB of memory left unused. (see Picture 1).

Picture 1. : dotNetRDF API is a memory hog.

Besides the libraries own implementation for storing and parsing RDF data they also use external means of storage. Using SemWeb you can back up your RDF data by either a MySql server, SQL server, Sqlite and PostgreSQL.[3] dotNetRDF provides integration with Talis Platform and Virtuoso Universal Server. Both provide native means of storing RDF data. More references cand be found at [5] and [6].

SPARQL support
Both libraries provide full SPARQL support. SemWeb uses a fork of Ryan Levering's SPARQL implementation in Java converted to .Net [3]. This means it has full SPARQL support with the option to translate SPARQL into SQL whenever this is available. dotNetRDF has its own SPARQL implementation to use on local data. In order to query remote data it uses SPARQL endpoints or other SPARQL implementations. [4]

Code samples are provided by both authors and can be found at [3] and [4] for further information. In our attempt to benchmark the SPARQL queries performance on both APIs we werent able to query the 10MB file used earlier using the dotNetRDF API, using a simple "SELECT * WHERE {?s ?p ?o}" . We dont know if that was because the nature of the RDF file, our implementation (we had to load the Graph obtained from parsing the RDF file into a TripleStore, which was very slow as it took about as much time as parsing the file) or the SPARQL implementation, though the queries seem to work fine on much smaller chunks of RDF data.

Level of documentation
Both libraries are very well documented. Both homepages contain link to demos, tutorials, hello world examples and implementation issues, although dotNetRDF has a smal edge over SemWeb when it comes to how the information is organized. A downfall of dotNetRDF is that it doesnt provide any source code, so we cant have an insight on the implementation.

Conclusion
Both SemWeb and dotNetRDF provide good support in working with RDF data. Still if you would have to choose between the two, SemWeb is the way to go. It is a more mature and complete implementation and it provides better support for both triple storage and SPARQL interogations. Of course this would be expected as SemWeb has over 4 years of development while dotNetRDF has only 3 months since its first release, hence we think dotNetRDF has a great potential of becoming a reliable option for working with RDF under .NET Framework.

References
[1]:http://razor.occams.info/code/semweb/semweb-current/doc/helloworld.html [2]:http://www.dotnetrdf.org/content.asp?pageID=Hello%20World [3]:http://razor.occams.info/code/semweb/ [4]:http://www.dotnetrdf.org/content.asp?pageID=Querying%20with%20SPARQL [5]:http://www.dotnetrdf.org/content.asp?pageID=Using%20the%20Talis%20Platform [6]:http://www.dotnetrdf.org/content.asp?pageID=Using%20Virtuoso%20Universal%20Server

You might also like