Two-step loading and infinite scrolling for DBPV

Triples describing a DBpedia entity can be divided in two groups: (outgoing) properties of the entity and (incoming) reverse properties. Reverse properties are properties of some other entity which has this entity as the property value.
So a DBpedia entity has a fan-in and a fan-out, which are described by the amount of reverse properties and (forward) properties of the entity. Some entities on DBpedia have a large fan-in (for example, ~50000 other entities link to  dbpedia:United_States).

To get the triples of some entities, the following SPARQL query could be used:

SELECT ?hasprop ?v ?isprop WHERE { {<http://dbpedia.org/resource/%5BEntity%5D&gt; ?hasprop ?v} UNION {?v ?isprop <http://dbpedia.org/resource/%5BEntity%5D} }

Two-step loading

However, this query would retrieve both outgoing and reverse properties. For entities with a large fan-in this would mean long loading times and a lot of processing at once. For this reason, a two-step loading method is used. In the two-step loading method, there are two queries: one to select the outgoing properties (forward query) and another to select the incoming ones (reverse query) ,both are executed using AJAX, so they run in parallel. Depending on the fan-in and fan-out of the entity (and various other less controllable factors), the responses may arrive in different order. When the fan-in is large, the forward query response is likely to arrive first, however.

Also, the fan-out of DBpedia entities varies less than the fan-in and seems to be relatively low. So processing of the forward query response is most likely to be instant (which can’t be said of the ten thousands of triples that may return with the reverse query).

(In)finite scrolling

Because showing everything at once would impose a heavy load on the browser for entities with high fan-in, infinite scrolling was implemented. However, with a catch, as will be explained shortly.

With infinite scrolling, normally we would load only a limited set of items of some list to show and when the user reaches the end of the list, we would query the server for more items. This is what Facebook and a lot of other sites with endless feeds do.

The catch with infinite scrolling that is currently done for DBPV is that we load everything but don’t show it. The loading part seems to have a very limited impact on the responsiveness of the page (even for dbpedia:United_States). The showing part however (where the view is updated from the model where thousands of triples are added at once), seems to have the highest impact on page response. With these observations in mind, the following infinite scrolling solution is done:

The incoming (reverse) properties of some entity are requested with a reverse query (as usual in two-step loading), but they are stored in a Angular scope variable that isn’t shown. Then when the user scrolls to the bottom (or almost the bottom), a small amount of triples from this hidden list are transferred to the list that’s shown on the page.

This solution is not really how true infinite scrolling should work, but probably provides a better user experience than showing everything at once or limiting the amount of triples in query response. Also, with proper backend support, a true infinite scrolling solution can be done later.

Advertisements

Adding extra folders to distribute in the DBpedia plugin

In order to make extra folders accessible via http, the following files need to be adapted:

  • the makefiles
  • dbpedia_init.sql
  • dbpedia_local.sql

Suppose you add an extra sibling directory of the “statics” directory and want it to have the same behavior (publicly accessible via http), you need to

  • add it to the list of EXTRA_DIST in the makefiles
  • create the folders in make_vad.sh:
directory_init() {
...
mkdir vad/vsp/dbpedia/js
...
}
  • add the path to vhost in dbpedia_local.sql, in the same way as “statics” is done. It looks something like:
DB.DBA.VHOST_REMOVE (lpath=>'/js');
DB.DBA.VHOST_DEFINE (lpath=>'/js', ppath=>registry_get('_dbpedia_path_')||'js/',
    is_dav=>atoi (registry_get('_dbpedia_dav_')));
  • add the path to vhost in dbpedia_init.sql, in the same way as “statics” is added. It looks something like:
DB.DBA.VHOST_REMOVE ( lhost=>registry_get ('dbp_lhost'), vhost=>registry_get ('dbp_vhost'), lpath=>'/js');
...
...
DB.DBA.VHOST_DEFINE ( lhost=>registry_get ('dbp_lhost'), vhost=>registry_get ('dbp_vhost'), lpath=>'/js',
     ppath=>'/DAV/VAD/dbpedia/js/',
     is_dav=>1
);

RDFQuery RDFa scraping for noscript

Below is a short tutorial of RDFQuery focused on scraping RDFa from HTML.

The following code is a toy example showing basic triple extraction from RDFa annotations in HTML, based on the tutorial at the RDFQuery Wiki.

<!--?xml version="1.0" encoding="UTF-8" ?-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html lang="en">
<head>	<title>Index</title>
	<script src="/js/jquery.js"></script>
	<script src="/js/jquery.rdfquery.core.js"></script>
	<script src="/js/jquery.rdfquery.rdfa.js"></script>
	<script type="text/javascript">
		$(document).ready(function() {
			var rdf = $("#content").rdf().databank.dump();
		});
	</script>
</head>
<body>

<div id="content">This is a test for <span xmlns:dbpedia-owl="http://dbpedia.org/ontology" property="dbpedia-owl:label">RDFQuery</span>
</div>

</body>
</html>

The RDFQuery javascript files we need are the core and rdfa files. We also need jQuery.

$(“#content”).rdf() extracts triples from RDFa annotations in HTML, return an object that contains the databank. The dump() function of RDFQuery transforms this to a json-ish object that can further be used by the scripts.

This triples extraction from RDFa annotations in HTML is a possible solution to avoid double querying while supporting noscript in AngularJS for the DBpedia Viewer.

Building the DBpedia Virtuoso plugin

The latest official version of the source code for the DBpedia plugin is located at https://github.com/dbpedia/dbpedia-vad-i18n

Currently, there is no script for building (DBpedia) VAD for Virtuoso so we have to do the following: compile all of Virtuoso OpenSource edition with the code for our plugin in the source folder.

The recipe:

  1. get the sourcecode of the DBpedia plugin at its github: https://github.com/dbpedia/dbpedia-vad-i18n
  2. and unpack it somewhere
  3. get the sourcecode of a 6.x version from the Virtuoso Opensource edition github
  4. unpack the code and navigate to the root of the unpacked folder
  5. set the Compiler Flags as described in : http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VOSMake
  6. BEFORE running ./configure (after running ./autogen.sh), copy the contents of the dbpedia folder of the DBpedia plugin code we downloaded into the dbpedia folder in the binsrc folder of the Virtuoso sourcecode we downloaded in Step 3.
  7. then run ./configure and make
  8. now in the binsrc/dbpedia folder of the build there should be a file called dbpedia_dav.vad which is our VAD plugin waiting to be plugged into a (running) VOS instance.

If you have VOS running on localhost already (default database port 1111), make sure the port used during building is not the same (1111) as the running instance will interfere with building the VOS source.

Installing DBpedia Virtuoso plugin

In continutation on the previous post on DBpedia mirroring, here is a very short guide on the installation of the DBpedia Virtuoso plugin.

Assuming

  • you have installed the OpenLink Virtuoso server
  • imported some DBpedia data

you can install the DBpedia Virtuoso plugin (the dbpedia-dav.vad) using the following simple steps:

  1. download the .dav file (you can get the latest version from the github project download the “dbpedia-dav.vad” file)
  2. go to your Virtuoso Conductor and login
  3. go to System Admin -> Packages
  4. scroll down, until you see “install package”. Click “Browse”
  5. select the dbpedia-dav.vad you downloaded in step 1 and confirm. Click “Proceed” when Conductor asks.
  6. DONE

References:

No-JS and AngularJS

How to support No-JavaScript users while using AngularJS?

Currently, DBpedia generates all of its HTML describing a resource at the server, with minimal added JavaScript. Visiting entities that don’t understand JavaScript (security-freaked users and bots) currently have access to all of the HTML shown to the user. However, the whole point of SPA and AngularJS is to use JavaScript to update the page, which results in a clear problem here. At one hand, we want to make a beautiful dynamic SPARQLing SPA and on the other hand, we need to support the visitors that don’t understand JavaScript.

Solving this would require adjustments on both client- and server-side.

Server-side

When the user lands on a DBpedia entity page from some other page, we query the server. The VSP code of the VAD plugin will generate the HTML and send it back to the user.

To enable AngularJS, we just insert our JavaScripts and alter some HTML at the server.

Client-side

Now, let’s talk about the client-side (and AngularJS). In the previous post, in the toy setup, we were getting the triples describing some entity through a Angular service that SPARQLs to the DBpedia SPARQL endpoint.

When AngularJS is inserted in the current server HTML response, Angular takes control. It will SPARQL to DBpedia to get the description, putting extra load on the server and the network. We are doing double work here (a solution to this will be discussed in the next paragraph(s)). If Angular routes were configured properly and the URLs are correct, clicking on a link to another DBpedia entity should result in Angular doing its thing (SPARQLing DBpedia and displaying the new entity information) instead of requesting all-new HTML from the server.

We need to avoid double work. The server has already done a lot of computing to generate the HTML when the visitor is landing on a DBpedia entity, which involved querying the triple store. And then Angular sends a SPARQL query to the endpoint, which again results in (the same) querying of the triple store. We distinguish two solutions to this problem:

  1. use RDFa scraper on the HTML
  2. include JSON at the server

http://code.google.com/p/rdfquery/ could be used for the first option. It should be called from the Angular service generating the triples (that currently only SPARQLs to DBpedia).

For the second option, we would need to make a few more alterations of the server code. However, there is probably much less load on the client compared with using the first option. In my opinion, this option is better, but it depends. This option was iplemented in the toy Angular setup as follows: the Angular service that is currently just SPARQLing to get triples about some entity now first looks at the page and searches for a special hidden element in the page HTML containing JSON code. Then it parses the JSON (jQuery.parseJSON) and uses that object for display and then removes the element containing the JSON. It doesn’t query DBpedia. However, this implementation can be considered a jQuery hack, it is not a very Angular way of doing things.

Server-side (again)

When going with the second option (HTML+JSON), we need to adjust some server side code to put the JSON in the HTML. Probably, this results in just a small computational overhead.

AngularJS for DBPV

To make an awesome DBpedia Viewer, we need some JavaScript MV* framework. AngularJS is used for this purpose as it actually has controllers.

Using AngularJS, we can make DBpedia Viewer as an SPA. Below follows a short description of a toy AngularJS implementation.

(For more information about AngularJS, please visit http://docs.angularjs.org/tutorial)

First, we setup our HTML. Angular allows to define templates of how model objects will be displayed. The view-model binding is defined in a kind of a declarative way.

The views are under control of controllers. But you don’t generate any HTML in your controllers. Instead, the controller should generate the objects that will be displayed in the corresponding templates. In our case, the controller performs a SPARQL query to DBpedia to get the triples of some entity (using Angular’s $http service). To ensure we don’t get errors due to the 1900 char limit of Virtuoso URLs, the SPARQL query should be sent in a HTTP POST request ($http.post()).

However, it is better to query DBpedia from a Angular service. All of the SPARQL querying logic is thus moved to a Angular service which is used by the controller. The advantage is that we could generalize the service to let it be used by different controllers in the future.

So, at this point we have:

  • one template for a list of triples in our HTML
  • a controller associated with that one list of triples
  • a service getting all triples (incoming and outgoing) about whatever entity the controller will send to it

This is an introduction to the use of AngularJS for DBPV. In the following posts, I will talk about support for no-JavaScript pages with AngularJS and about a separation of querying for incoming links vs outgoing links.