No-JS and AngularJS

How to support No-JavaScript users while using AngularJS?

Currently, DBpedia generates all of its HTML describing a resource at the server, with minimal added JavaScript. Visiting entities that don’t understand JavaScript (security-freaked users and bots) currently have access to all of the HTML shown to the user. However, the whole point of SPA and AngularJS is to use JavaScript to update the page, which results in a clear problem here. At one hand, we want to make a beautiful dynamic SPARQLing SPA and on the other hand, we need to support the visitors that don’t understand JavaScript.

Solving this would require adjustments on both client- and server-side.

Server-side

When the user lands on a DBpedia entity page from some other page, we query the server. The VSP code of the VAD plugin will generate the HTML and send it back to the user.

To enable AngularJS, we just insert our JavaScripts and alter some HTML at the server.

Client-side

Now, let’s talk about the client-side (and AngularJS). In the previous post, in the toy setup, we were getting the triples describing some entity through a Angular service that SPARQLs to the DBpedia SPARQL endpoint.

When AngularJS is inserted in the current server HTML response, Angular takes control. It will SPARQL to DBpedia to get the description, putting extra load on the server and the network. We are doing double work here (a solution to this will be discussed in the next paragraph(s)). If Angular routes were configured properly and the URLs are correct, clicking on a link to another DBpedia entity should result in Angular doing its thing (SPARQLing DBpedia and displaying the new entity information) instead of requesting all-new HTML from the server.

We need to avoid double work. The server has already done a lot of computing to generate the HTML when the visitor is landing on a DBpedia entity, which involved querying the triple store. And then Angular sends a SPARQL query to the endpoint, which again results in (the same) querying of the triple store. We distinguish two solutions to this problem:

  1. use RDFa scraper on the HTML
  2. include JSON at the server

http://code.google.com/p/rdfquery/ could be used for the first option. It should be called from the Angular service generating the triples (that currently only SPARQLs to DBpedia).

For the second option, we would need to make a few more alterations of the server code. However, there is probably much less load on the client compared with using the first option. In my opinion, this option is better, but it depends. This option was iplemented in the toy Angular setup as follows: the Angular service that is currently just SPARQLing to get triples about some entity now first looks at the page and searches for a special hidden element in the page HTML containing JSON code. Then it parses the JSON (jQuery.parseJSON) and uses that object for display and then removes the element containing the JSON. It doesn’t query DBpedia. However, this implementation can be considered a jQuery hack, it is not a very Angular way of doing things.

Server-side (again)

When going with the second option (HTML+JSON), we need to adjust some server side code to put the JSON in the HTML. Probably, this results in just a small computational overhead.

Advertisements

AngularJS for DBPV

To make an awesome DBpedia Viewer, we need some JavaScript MV* framework. AngularJS is used for this purpose as it actually has controllers.

Using AngularJS, we can make DBpedia Viewer as an SPA. Below follows a short description of a toy AngularJS implementation.

(For more information about AngularJS, please visit http://docs.angularjs.org/tutorial)

First, we setup our HTML. Angular allows to define templates of how model objects will be displayed. The view-model binding is defined in a kind of a declarative way.

The views are under control of controllers. But you don’t generate any HTML in your controllers. Instead, the controller should generate the objects that will be displayed in the corresponding templates. In our case, the controller performs a SPARQL query to DBpedia to get the triples of some entity (using Angular’s $http service). To ensure we don’t get errors due to the 1900 char limit of Virtuoso URLs, the SPARQL query should be sent in a HTTP POST request ($http.post()).

However, it is better to query DBpedia from a Angular service. All of the SPARQL querying logic is thus moved to a Angular service which is used by the controller. The advantage is that we could generalize the service to let it be used by different controllers in the future.

So, at this point we have:

  • one template for a list of triples in our HTML
  • a controller associated with that one list of triples
  • a service getting all triples (incoming and outgoing) about whatever entity the controller will send to it

This is an introduction to the use of AngularJS for DBPV. In the following posts, I will talk about support for no-JavaScript pages with AngularJS and about a separation of querying for incoming links vs outgoing links.

Importing DBpedia 3.8

Importing progress:

  • June 12, 01:06
    • Virtuoso server up and running
    • test import worked, SPARQL on localhost Virtuoso server works
    • starting proper import, small datasets first (<20MB, gzipped)
  • June 12, 18:11
    • It’s taking forever to import the big datasets
    • probably, it’ll take a few more days to load everything (except pagelinks)
  • June 12, 21:01
    • Having an SSD drive helps BIG time
      • database seems to grow 1GB/5min at average
      • putting the Virtuoso database file (temporarily) on a SSD to populate it seems to speed it up much (labels_en.nt.gz loaded in 10 minutes)

Setting Up a DBpedia mirror

How to setup a DBpedia mirror on Virtuoso?

Deployment system specifications (used here):

  • Ubuntu 12.04
  • Quad-Core @ 3.4 GHz
  • 8 GB RAM
  • X disk space

Step 1: Installing Virtuoso

You can go to the Openlink Virtuoso website and download or buy the Virtuoso server to install it on your machine. On Ubuntu 12.04, however, we can install the (opensource) Virtuoso server through the package manager. Command-line:

~$ sudo apt-get install virtuoso-server

Step 2: Download DBpedia data

On downloads.dbpedia.org, data dumps can be downloaded of different versions of DBpedia for different languages. Here, version 3.8 (most recent at the time) and English are chosen.

  1. Go to the download page for your version and language (in our case, downloads.dbpedia.org/3.8/en/)
  2. Download all archives ending with “nt.bz2” in one folder on your machine. Let’s call this folder dumpfolder.
    1. this doesn’t have to happen manually, you can also use the following command on Linux from dumpfolder:
    • wget -r -np -nd -nc -A'*.nt.bz2' http://downloads.dbpedia.org/3.6/en/
  3. Download the DBpedia Ontology

Step 3: Prepare for importing DBpedia dumps

  1. transform b-zipped dumps to gzip (saves space):
    • ~$ for i in *.bz2 ; do bzcat $i | gzip --fast > ${i%.bz2}.gz && rm $i ; done &
  2. clean DBpedia dumps:
  • ~$ for i in external_links_en.nt.gz page_links_en.nt.gz infobox_properties_en.nt.gz ; do   echo -n “cleaning $i…”   zcat $i | grep -v -E ‘^<.+> <.+> <.{1025,}> \.$’ | gzip –fast > ${i%.nt.gz}_cleaned.nt.gz &&   mv ${i%.nt.gz}_cleaned.nt.gz $i   echo “done.” done
  1. import loading scripts

Step 4: import data

This is the longest step. It may take hours (depending on how much you import)

isql-vt
ld_dir_all(<folder with dumps>, '*.*', 'http://dbpedia.org/');
SELECT * FROM DB.DBA.LOAD_LIST;
EXIT;

Run the loader:

rdf_loader_run();
checkpoint;
commit WORK;
checkpoint;
exit;

References: