Virtuoso VSP, HTTP Headers and Cookies

HTTP Response Headers

This posts brings together several related pieces of Virtuoso Server Pages documentation.

First, we’ll handle HTTP Headers. To set a header in VSP, one can use

http_header (string)

However, contrary to intuition, using this command a second time doesn’t set a second header entry, but replaces the last one. To “accumulate” custom headers entries, one must use

http_header ( concat (http_header_get (), string));

Please not that string must end in “\r\n”. http_header.get () returns the currently built up header entries so the line above appends a new entry to the accumulated header entries.

An example from description.vsp of the DBpedia plugin:

  http_header ( concat (http_header_get (), sprintf ('Expires: %s \r\n', date_rfc1123 (dateadd ('day', 7, now ())))));
  http_header ( concat (http_header_get (), 'Set-Cookie: dbpv_has_js=0\r\n'));
This code sets the “Expires” and “Set-Cookie” headers. Notice “\r\n” at the end of both entries.

Accessing HTTP Request Header (for cookies)

To access the HTTP request header in VSP, one can use:

http_request_header ()

To access some entry (“Cookie” for example) in the request header, the following line should be used:

http_request_header (http_request_header (), 'Cookie', null, '');

This line returns all cookie information included in the Request header. To get some value from the cookie, one can use the following:

get_keyword ('cookie_value', split_and_decode (cookie, 0, ';='), '');

DBPV Prototype

In the following video, you can see the prototype version of the DBpedia Viewer:

To test it out, please take a look at the github repositories of the project:

Please note that the real version may lag behind the toy version during development as changes are propagated from the toy to the real thing.

Two-step loading and infinite scrolling for DBPV

Triples describing a DBpedia entity can be divided in two groups: (outgoing) properties of the entity and (incoming) reverse properties. Reverse properties are properties of some other entity which has this entity as the property value.
So a DBpedia entity has a fan-in and a fan-out, which are described by the amount of reverse properties and (forward) properties of the entity. Some entities on DBpedia have a large fan-in (for example, ~50000 other entities link to  dbpedia:United_States).

To get the triples of some entities, the following SPARQL query could be used:

SELECT ?hasprop ?v ?isprop WHERE { {<http://dbpedia.org/resource/%5BEntity%5D&gt; ?hasprop ?v} UNION {?v ?isprop <http://dbpedia.org/resource/%5BEntity%5D} }

Two-step loading

However, this query would retrieve both outgoing and reverse properties. For entities with a large fan-in this would mean long loading times and a lot of processing at once. For this reason, a two-step loading method is used. In the two-step loading method, there are two queries: one to select the outgoing properties (forward query) and another to select the incoming ones (reverse query) ,both are executed using AJAX, so they run in parallel. Depending on the fan-in and fan-out of the entity (and various other less controllable factors), the responses may arrive in different order. When the fan-in is large, the forward query response is likely to arrive first, however.

Also, the fan-out of DBpedia entities varies less than the fan-in and seems to be relatively low. So processing of the forward query response is most likely to be instant (which can’t be said of the ten thousands of triples that may return with the reverse query).

(In)finite scrolling

Because showing everything at once would impose a heavy load on the browser for entities with high fan-in, infinite scrolling was implemented. However, with a catch, as will be explained shortly.

With infinite scrolling, normally we would load only a limited set of items of some list to show and when the user reaches the end of the list, we would query the server for more items. This is what Facebook and a lot of other sites with endless feeds do.

The catch with infinite scrolling that is currently done for DBPV is that we load everything but don’t show it. The loading part seems to have a very limited impact on the responsiveness of the page (even for dbpedia:United_States). The showing part however (where the view is updated from the model where thousands of triples are added at once), seems to have the highest impact on page response. With these observations in mind, the following infinite scrolling solution is done:

The incoming (reverse) properties of some entity are requested with a reverse query (as usual in two-step loading), but they are stored in a Angular scope variable that isn’t shown. Then when the user scrolls to the bottom (or almost the bottom), a small amount of triples from this hidden list are transferred to the list that’s shown on the page.

This solution is not really how true infinite scrolling should work, but probably provides a better user experience than showing everything at once or limiting the amount of triples in query response. Also, with proper backend support, a true infinite scrolling solution can be done later.

Adding extra folders to distribute in the DBpedia plugin

In order to make extra folders accessible via http, the following files need to be adapted:

  • the makefiles
  • dbpedia_init.sql
  • dbpedia_local.sql

Suppose you add an extra sibling directory of the “statics” directory and want it to have the same behavior (publicly accessible via http), you need to

  • add it to the list of EXTRA_DIST in the makefiles
  • create the folders in make_vad.sh:
directory_init() {
...
mkdir vad/vsp/dbpedia/js
...
}
  • add the path to vhost in dbpedia_local.sql, in the same way as “statics” is done. It looks something like:
DB.DBA.VHOST_REMOVE (lpath=>'/js');
DB.DBA.VHOST_DEFINE (lpath=>'/js', ppath=>registry_get('_dbpedia_path_')||'js/',
    is_dav=>atoi (registry_get('_dbpedia_dav_')));
  • add the path to vhost in dbpedia_init.sql, in the same way as “statics” is added. It looks something like:
DB.DBA.VHOST_REMOVE ( lhost=>registry_get ('dbp_lhost'), vhost=>registry_get ('dbp_vhost'), lpath=>'/js');
...
...
DB.DBA.VHOST_DEFINE ( lhost=>registry_get ('dbp_lhost'), vhost=>registry_get ('dbp_vhost'), lpath=>'/js',
     ppath=>'/DAV/VAD/dbpedia/js/',
     is_dav=>1
);

RDFQuery RDFa scraping for noscript

Below is a short tutorial of RDFQuery focused on scraping RDFa from HTML.

The following code is a toy example showing basic triple extraction from RDFa annotations in HTML, based on the tutorial at the RDFQuery Wiki.

<!--?xml version="1.0" encoding="UTF-8" ?-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html lang="en">
<head>	<title>Index</title>
	<script src="/js/jquery.js"></script>
	<script src="/js/jquery.rdfquery.core.js"></script>
	<script src="/js/jquery.rdfquery.rdfa.js"></script>
	<script type="text/javascript">
		$(document).ready(function() {
			var rdf = $("#content").rdf().databank.dump();
		});
	</script>
</head>
<body>

<div id="content">This is a test for <span xmlns:dbpedia-owl="http://dbpedia.org/ontology" property="dbpedia-owl:label">RDFQuery</span>
</div>

</body>
</html>

The RDFQuery javascript files we need are the core and rdfa files. We also need jQuery.

$(“#content”).rdf() extracts triples from RDFa annotations in HTML, return an object that contains the databank. The dump() function of RDFQuery transforms this to a json-ish object that can further be used by the scripts.

This triples extraction from RDFa annotations in HTML is a possible solution to avoid double querying while supporting noscript in AngularJS for the DBpedia Viewer.

Building the DBpedia Virtuoso plugin

The latest official version of the source code for the DBpedia plugin is located at https://github.com/dbpedia/dbpedia-vad-i18n

Currently, there is no script for building (DBpedia) VAD for Virtuoso so we have to do the following: compile all of Virtuoso OpenSource edition with the code for our plugin in the source folder.

The recipe:

  1. get the sourcecode of the DBpedia plugin at its github: https://github.com/dbpedia/dbpedia-vad-i18n
  2. and unpack it somewhere
  3. get the sourcecode of a 6.x version from the Virtuoso Opensource edition github
  4. unpack the code and navigate to the root of the unpacked folder
  5. set the Compiler Flags as described in : http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VOSMake
  6. BEFORE running ./configure (after running ./autogen.sh), copy the contents of the dbpedia folder of the DBpedia plugin code we downloaded into the dbpedia folder in the binsrc folder of the Virtuoso sourcecode we downloaded in Step 3.
  7. then run ./configure and make
  8. now in the binsrc/dbpedia folder of the build there should be a file called dbpedia_dav.vad which is our VAD plugin waiting to be plugged into a (running) VOS instance.

If you have VOS running on localhost already (default database port 1111), make sure the port used during building is not the same (1111) as the running instance will interfere with building the VOS source.

Installing DBpedia Virtuoso plugin

In continutation on the previous post on DBpedia mirroring, here is a very short guide on the installation of the DBpedia Virtuoso plugin.

Assuming

  • you have installed the OpenLink Virtuoso server
  • imported some DBpedia data

you can install the DBpedia Virtuoso plugin (the dbpedia-dav.vad) using the following simple steps:

  1. download the .dav file (you can get the latest version from the github project download the “dbpedia-dav.vad” file)
  2. go to your Virtuoso Conductor and login
  3. go to System Admin -> Packages
  4. scroll down, until you see “install package”. Click “Browse”
  5. select the dbpedia-dav.vad you downloaded in step 1 and confirm. Click “Proceed” when Conductor asks.
  6. DONE

References: