RDFQuery RDFa scraping for noscript

Below is a short tutorial of RDFQuery focused on scraping RDFa from HTML.

The following code is a toy example showing basic triple extraction from RDFa annotations in HTML, based on the tutorial at the RDFQuery Wiki.

<!--?xml version="1.0" encoding="UTF-8" ?-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html lang="en">
<head>	<title>Index</title>
	<script src="/js/jquery.js"></script>
	<script src="/js/jquery.rdfquery.core.js"></script>
	<script src="/js/jquery.rdfquery.rdfa.js"></script>
	<script type="text/javascript">
		$(document).ready(function() {
			var rdf = $("#content").rdf().databank.dump();
		});
	</script>
</head>
<body>

<div id="content">This is a test for <span xmlns:dbpedia-owl="http://dbpedia.org/ontology" property="dbpedia-owl:label">RDFQuery</span>
</div>

</body>
</html>

The RDFQuery javascript files we need are the core and rdfa files. We also need jQuery.

$(“#content”).rdf() extracts triples from RDFa annotations in HTML, return an object that contains the databank. The dump() function of RDFQuery transforms this to a json-ish object that can further be used by the scripts.

This triples extraction from RDFa annotations in HTML is a possible solution to avoid double querying while supporting noscript in AngularJS for the DBpedia Viewer.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s