SPARUL Update Queries Via SPARQLWrapper And A Virtuoso Server

If you ever had a burning desire to use SPARUL updates (aka SPARQL/1.1) on an Ubuntu machine via SPARQLWrapper you probably seen the following error message

Traceback (most recent call last):
File "./wquery", line 93, in test
ret = m_sparql.query()
File "/usr/local/lib/python2.7/dist-packages/SPARQLWrapper/", line 390, in query
return QueryResult(self._query())
File "/usr/local/lib/python2.7/dist-packages/SPARQLWrapper/", line 369, in _query
raise e
urllib2.HTTPError: HTTP Error 406: Unacceptable

it turns out that it is resolved on 1.5.3, however at the time of the writing PIP only carries SPARQLWrapper 1.5.2. to get the fresh version one must pull from SVN:

svn checkout svn:// sparql-wrapper-code

And here is the test code to write a RDF record to the quad-store:

 iquery = 'INSERT IN GRAPH <nepomuk:/> {<> <> <>}'
 m_sparql = SPARQLWrapper(endpoint = "http://localhost:8893/sparql/", updateEndpoint = "http://localhost:8893/sparql-auth/")
 m_sparql.addDefaultGraph ("nepomuk:/")
 m_sparql.setCredentials(user = "user", passwd = "password")
 ret = m_sparql.query()

This now ends with HTTP/1.1 Error 401:

mcradle@carver:~/workdir/remember/nepomuk/nepomuk$ ./wquery "" "" ""
INSERT IN GRAPH <nepomuk:/> {<> <> <> }
send: 'GET /sparql-auth?output=xml&format=xml&results=xml& HTTP/1.1\r\nAccept-Encoding: identity\r\nAccept: */*\r\nHost: localhost:8891\r\nConnection: close\r\nAuthorization: Digest cmVtZW1iZXI6cmVtZW1iZXI=\n\r\nUser-Agent: sparqlwrapper 1.5.3 (\r\n\r\n'
reply: 'HTTP/1.1 401 Unauthorized\r\n'
header: Server: Virtuoso/06.01.3127 (Linux) i686-pc-linux-gnu
header: Connection: close
header: Content-Type: text/html; charset=UTF-8
header: Date: Sat, 01 Jun 2013 16:40:16 GMT
header: Accept-Ranges: bytes
header: Content-Length: 0
Traceback (most recent call last):
 File "./wquery", line 118, in <module>
 registerStatement2 (t_subject, t_predicate, t_object)
 File "./wquery", line 107, in registerStatement2
 ret = m_sparql.query()
 File "/home/mcradle/workdir/remember/sparqlwrapper/sparql-wrapper-code/src/SPARQLWrapper/", line 391, in query
 return QueryResult(self._query())
 File "/home/mcradle/workdir/remember/sparqlwrapper/sparql-wrapper-code/src/SPARQLWrapper/", line 370, in _query
 raise e
urllib2.HTTPError: HTTP Error 401: Unauthorized

But this is because SPARQLWrapper does not support digest auth , so I monkey patched my 1.5.3 SPARQLWrapper to see if it helps:

mcradle@carver:~/workdir/remember/sparqlwrapper/sparql-wrapper-code$ diff -u /mcradle/temp/sparqlwrapper/sparql-wrapper-code/src/SPARQLWrapper/ ./src/SPARQLWrapper/                                                
--- /tmp/sparqlwrapper/sparql-wrapper-code/src/SPARQLWrapper/        2013-05-19 00:34:20.597501904 +0300                                                                                                             
+++ ./src/SPARQLWrapper/      2013-06-01 19:53:38.808090704 +0300                                                                                                                                                             
@@ -343,8 +344,15 @@                                                                                                                                                                                                                    
         request.add_header("User-Agent", self.agent)                                                                                                                                                                                   
         request.add_header("Accept", acceptHeader)                                                                                                                                                                                     
         if (self.user and self.passwd):                                                                                                                                                                                                
             request.add_header("Authorization", "Basic " + base64.encodestring("%s:%s" % (self.user,self.passwd)))                                                                                                                     
+            passman = urllib2.HTTPPasswordMgrWithDefaultRealm()                                                                                                                                                                        
+            passman.add_password(None, self.updateEndpoint, self.user, self.passwd)                                                                                                                                                    
+            auth_handler = urllib2.HTTPDigestAuthHandler(passman)                                                                                                                                                                      
+            opener = urllib2.build_opener(auth_handler)                                                                                                                                                                                
+            urllib2.install_opener(opener)                                                                                                                                                                                             
         return request                                                                                                                                                                                                                 
     def _query(self):                                     

it further turns out that virtuoso-opensource 6.1.4+dfsg1-0ubuntu1 is not compliant with the new update= notation that replaces the query= syntax in case of a SPARQL 1.1 update operation, according to the SPARQL 1.1 spec anyway.

so one needs to override the SPARQLWrapper Query keyword when a SPARQL 1.1 update is used in a SPARQLWrapper Query:

sparql.queryType= SELECT

I can finally update my virtuoso database by using SPARQLWrapper!

Posted in Uncategorized | Tagged , , , , , , , , , | Leave a comment

Compiling zbar on OSX

If you’re trying to compile zbar the open source bar code reader on OS X with jpeg support and want to use the MacPorts jpeg lib, and getting this error

checking for jpeg_read_header in -ljpeg... no
configure: error: in `/skroner/temp/zbar/zbar-0.10':
configure: error: unable to find libjpeg! ensure CFLAGS/LDFLAGS are set appropriately or configure --without-jpeg        

you can use the following command line to get it working:

CPPFLAGS="-I/opt/local/include" LDFLAGS="-L/opt/local/lib" ./configure --disable-video --without-qt --without-python --without-gtk --with-libiconv-prefix=/opt/local --with-jpeg=yes

it’s a cool program!

Posted in "software enginerring", foss, ocr, open source, OSX, workaround | Tagged , , , , | 1 Comment

Mount Storage As a Webdav On OSX 5.8

Update April 21st: I Have managed to get to mount and to write files, however I can only copy one file at a time. after each file the wdfs process crashes. at this point I’m not investigating further

Don’t know what the deal thee problem lies with my reluctance to upgrade my OSX 5.8 to a newer cat version, but I just couldn’t mount directly using finder. I mean it technically did mount but it was impossible to access any files not for read nor write.

searching the net I did came up with an alternative code to mount webdav resources called wdfs. in spite the fact that it didn’t list any updates since 2007(!). I’ve decided to give it a try.

after an easy confgiure-make-make install I issued:

wdfs  /Volumes/box/ -o debug

and I could read my webdav, which was a major step forward.
writing to the webdav, however, seems to be a different story. I could upload a file using the web interface — and then append to it.

echo "1234" >> ./foo 

yielded the expected results as long as foo existed.
if the file didn’t exist I got the following errors on stderr (thanks to the -o debug flag off course!)

unique: 2, opcode: OPEN (14), nodeid: 11, insize: 48
## GET error: Could not read chunk size: connection was closed by server
   unique: 2, error: -2 (No such file or directory), outsize: 16
unique: 3, opcode: LOOKUP (1), nodeid: 8, insize: 44
Could not read chunk size: connection was closed by server

looking for the error in wdfs source code and some further debugging — it turns out that it comes from the wdfs_open() function.
below is an abbreviated version for clarity:

static int wdfs_open(const char *localpath, struct fuse_file_info *fi)

	struct open_file *file = g_new0(struct open_file, 1);
	file->modified = false;

	file->fh = get_filehandle();
		remotepath = get_remotepath(localpath);

	/* GET the data to the filehandle even if the file is opened O_WRONLY,
	 * because the opening application could use pwrite() or use O_APPEND
	 * and than the data needs to be present. */
	if (ne_get(session, remotepath, file->fh)) {
		fprintf(stderr, "## GET error: %s\n", ne_get_error(session));
		return -ENOENT;


It turns out that wdfs insists on issuing an HTTP GET even if we’re in the midst of creating the file.
not sure why the server reply with a

"Could not read chunk size: connection was closed by server"

but one thing for sure, if I just ignore the error as follows:

	if (ne_get(session, remotepath, file->fh)) {
		fprintf(stderr, "## GET (wdfs_open) error: %s\n", ne_get_error(session));

		/* Mcradle -> Patch don't return an error 
		return -ENOENT;

the file is created just fine, and I can issue a cp command, and verify that the data made it.
this brings a whole new meaning to the term monkey patching 🙂

Please be warned: your mileage may vary, as I’m not sure how legit is to ignore this error, and why is it there to begin with.

Posted in "software enginerring", open source, OSX, workaround | Tagged , , , , , , , | Leave a comment

Dump, Merge And Import Graphs From a Virtuoso Database

I have figure out how to extract my RDF data from a nepomuk-soprano-virtuoso database. I must say that I have crafted my own tools to create the data on top of the soprano model, so I can control the graph all my data belongs to.
It wasn’t always like that – but I’ll get to that pretty soon.

as detailed in my previous post – rebuilding the virtuoso database left me with a file way too bloated for my taste to start backing up.

So I’ve decided to try the strategy of dumping the data, filter it, and import it back to a clean database.

running isql and virtuoso like I described in my previous post, I have used the following SQL function in isql:

  ( IN  dir               VARCHAR  :=  '/Users/mcradle/Library/Preferences/KDE/share/apps/nepomuk/repository/main/data/virtuosobackend/dumps'   , 
    IN  file_length_limit INTEGER  :=  1000000000
    DECLARE inx INT;
    inx := 1;
    SET ISOLATION = 'uncommitted';
    FOR ( SELECT * 
            FROM ( SPARQL DEFINE input:storage "" 
                   SELECT DISTINCT ?g { GRAPH ?g { ?s ?p ?o } . 
                                        FILTER ( ?g != virtrdf: ) 
                 ) AS sub OPTION ( LOOP )) DO

       dbg_printf ('about to dump %s',"g");
        dump_one_graph ( "g", 
                         sprintf ('%s/graph%06d_', dir, inx), 
	       dbg_printf ('dump done %s',"g");
        inx := inx + 1;
CREATE PROCEDURE dump_one_graph 
  ( IN  srcgraph           VARCHAR  , 
    IN  out_file           VARCHAR  , 
    IN  file_length_limit  INTEGER  := 1000000000
    DECLARE  file_name  varchar;
    DECLARE  env, ses      any;
    DECLARE  ses_len, 
             file_idx      integer;
    SET ISOLATION = 'uncommitted';
    max_ses_len := 10000000;
    file_len := 0;
    file_idx := 1;
    file_name := sprintf ('%s%06d.ttl', out_file, file_idx);
    string_to_file ( file_name || '.graph', 
    string_to_file ( file_name, 
                     sprintf ( '# Dump of graph <%s>, as of %s\n', 
                               CAST (NOW() AS VARCHAR)
    env := vector (dict_new (16000), 0, '', '', '', 0, 0, 0, 0);
    ses := string_output ();
    FOR (SELECT * FROM ( SPARQL DEFINE input:storage "" 
                         SELECT ?s ?p ?o { GRAPH `iri(?:srcgraph)` { ?s ?p ?o } } 
                       ) AS sub OPTION (LOOP)) DO
        http_ttl_triple (env, "s", "p", "o", ses);
        ses_len := length (ses);
        IF (ses_len > max_ses_len)
            file_len := file_len + ses_len;
            IF (file_len > file_length_limit)
                http (' .\n', ses);
                string_to_file (file_name, ses, -1);
                file_len := 0;
                file_idx := file_idx + 1;
                file_name := sprintf ('%s%06d.ttl', out_file, file_idx);
                string_to_file ( file_name, 
                                 sprintf ( '# Dump of graph <%s>, as of %s (part %d)\n', 
                                           CAST (NOW() AS VARCHAR), 
                 env := vector (dict_new (16000), 0, '', '', '', 0, 0, 0, 0);
              string_to_file (file_name, ses, -1);
            ses := string_output ();
    IF (LENGTH (ses))
        http (' .\n', ses);
        string_to_file (file_name, ses, -1);

one must have the directory of which the graphs to be dumped in the DirsAllowed parameter within the [Parameters] section of virtuoso.ini
for example:

DirsAllowed= /Users/mcradle/Library/Preferences/KDE/share/apps/nepomuk/repository/main/data/virtuosobackend/dumps/,./dumps,dumps

then issue


on the isql commandline.

this should create a bunch of files in your dump directory similarly to the following:

bash-3.2$ ls -lhtra | head
total 8856
-rw-r--r--    1 mcradle  staff    36B Feb 19 00:14 graph000002_000001.ttl.graph
-rw-r--r--    1 mcradle  staff    43B Feb 19 00:14 graph000001_000001.ttl.graph
-rw-r--r--    1 mcradle  staff   2.9K Feb 19 00:14 graph000001_000001.ttl
-rw-r--r--    1 mcradle  staff    49B Feb 19 00:16 graph000090_000001.ttl.graph
-rw-r--r--    1 mcradle  staff   1.1K Feb 19 00:16 graph000090_000001.ttl
-rw-r--r--    1 mcradle  staff    49B Feb 19 00:16 graph000089_000001.ttl.graph
-rw-r--r--    1 mcradle  staff   947B Feb 19 00:16 graph000089_000001.ttl
-rw-r--r--    1 mcradle  staff    49B Feb 19 00:16 graph000088_000001.ttl.graph
-rw-r--r--    1 mcradle  staff   791B Feb 19 00:16 graph000088_000001.ttl

as it turns out my RDF data was spread across many grpahs, I think nepomuk does that by default, I don’t even claim to understand why and how.
I wanted to unify them into one graph to make the data export/import easier in the future.

Python and librdf to the rescue, it turns out that this is not that hard, following the python snippet to merge a few turtle file containing each a graph (exactly the form created by the dump_graphs() procedure above)

files_to_be_merged = ("./graph000030_000001.ttl",

import rdflib

store = rdflib.Graph()

for fil in files_to_be_merged:
    store.parse (fil, format='n3')

f = open('/Users/mcradle/.../dumps/unified-graph.ttl', 'w')
f.write (store.serialize(format='turtle'))

note that since I’ve created my own turtle file I need to create my own .ttl.graph file.
it turns out that it’s a file containing the graph name, for example:

bash-3.2$ cat graph000341_000001.ttl.graph 

so I’ve created my own unified-graph.ttl.graph with my own graph name.

I then deleted the following virtuoso database files:


and launched nepomuk to have the database recreated. the newly created database was way smaller: around 10MB, I’ve just shrank my database in a factor of almost 200!
next I left with importing unified-graph.ttl back into the database. for that I’ve killed the nepomuk server and relaunched the standalone virtuoso.
it’s important to see that the dump directory is still allowed as per when we dumped the graphs.

to import back the graph with my data: in ‘isql’ prompt I pasted the following:

  ( IN  dir  VARCHAR := 'dumps' )

  arr := sys_dirlist (dir, 1);
  log_enable (2, 1);
      IF (f LIKE '*.ttl')
	          log_message (sprintf ('Error in %s', f));
  		        g := file_to_string (dir || '/' || f || '.graph');
			dbg_printf ('g is %s', "g");
			  DB.DBA.TTLP_MT (file_open (dir || '/' || f), g, g, 255);

and ran the procedure to start the import process

load_graphs ();
Posted in Uncategorized | Tagged , , , , , , , , , , | 1 Comment

Compacting Nepomuk’s Virtuoso database for fun and (little) profit

I wanted to backup my nepomuk generated RDF data sat in my virtuoso database, at first I thought I could just copy my virtuosobackend directory onto dropbox, sym link it back to /Users/mcradle/…/KDE/../nepomuk/../data/ and get it over with.
this was before I discovered that my soprano-virtuoso.db is over 2.5 GB. that’s right 2.5 gigs.

Soon enough I found out that quite a bit of the data in the database was auto-generated by strigi, here is the SPARQL query to spit the strigi generated records:

nepomukcmd --foo query "select distinct ?g where { ?g  ?r . }"

more information about nepomukcmd is here

and here is the nepomuk line to delete the strigi data:

for a in `nepomukcmd --foo query "select distinct ?g where { \
?g ?r . }"`;
do nepomukcmd rmgraph "$a"; done

As this did not reduce the database file size at all, I have moved on to reading virtuoso’s generous documentation and discovered that: “Virtuoso does not relinquish space in the DB file back to the file system as records are removed” .

Next in my quest to reclaim my disk space back I have tried to rebuild the database. The steps are listed in virtuoso’s documentation. there are details listed below that may help when it comes to operating on a nepomuk (or soprano, really) flavored virtuoso database.

spoiler alert

I’m listing the gory details of how to compact a soprano virtuoso database , but please do note (before you follow my instructions) that it has not reduce the file size to my liking!.

how to compact a soprano virtuoso database by rebuilding it

this required me to connect to the virtuoso using the isql command line interface, to achieve this one needs to

1. shutdown the nepomuk-server: I’m using the qdbus this is the command-line I use:

qdbus org.kde.NepomukServer /nepomukserver org.kde.NepomukServer.quit

2. obtain the config file that nepomuk is using to launch virtuoso, this was trickier than I have expected, I had to look in nepomuk server’s log to discover where nepomuk auto-generates the virtuoso config file it’s using and what option it is using to kick virtuoso into life.

Starting Virtuoso server: "/opt/local/bin/virtuoso-t" ("+foreground", "+configfile", "/var/folders/tM/tM6xx7GQHa0fSuQgvBAM7k+++TI/-Tmp-/virtuoso_ME9685.ini", "+wait")

3. I created a copy of virtuoso.ini and made the following changes to it
under the [Parameters] section I have
disabled LiteMode by changing to


Also under the [Parameters] section I have changed the server port to 1112 (for some reason port 1111 was taken) as follows:


my [Database] section was already set by nepomuk to point at the right files


4. it’s time to kick virtuoso to life with the new virtuoso.ini file so the new settings will come into effect, here is the command-line I used

/opt/local/bin/virtuoso-t -df +foreground +debug +configfile /mcradle/temp/virtuoso.ini

5. following the instructions in the virtuoso backup guide I needed to issue a SHUTDOWN; command to the virtuoso server so a checkpoint is created (and the server is shut down…). in order to do that one needs to start the isql utility that comes with virtuoso, here is my command-line to do that:

isql -S 1112

once it’s up I just type in

6. once the server has exit I have relaunched the server in “backup dump mode” by adding the -b switch as follows:

/opt/local/bin/virtuoso-t -df -b +foreground +debug +configfile /mcradle/temp/virtuoso.ini

7. I then backed up the database as follows:
mv ./soprano-virtuoso.db ./backup_before_crash_restore-soprano-virtuoso-db

8. the next step is to start the server in restore mode, and again the command I used:

/opt/local/bin/virtuoso-t -df +restore-crash-dump +foreground +debug +configfile /mcradle/temp/virtuoso.ini

as mentioned in my earlier spoiler above the database rebuild still left me with a file bigger than 1GB. for reasons (and calculations) I won’t get into at the moment I know that the data within the database shouldn’t take more than a megabyte, at most.

so I’ve tried to figure out how many records do I have in the quad-store by running the following SQL statement in the isql interface

select count(*) from "DB"."DBA"."RDF_QUAD";

this returned 21,009, which is way more than I have expected to have there. so it’s not surprising that an explicit compact also didn’t help much (here is the command I ran from the isql interface)


again no effect on the file size, what so ever.

And the quest for a smaller virtuoso database, as it seems, has just begun.

Posted in cli, Command Line, nepomuk, open source, Uncategorized | Tagged , , , , , , , | 1 Comment

Terminate nepomukstrigiservice but keep running Nepomuk on OSX

I have a sneaking suspicion that I’m the only guy on earth that is interested in this setup, but who knows?

I’m running KDE 4.65 Max OSX 10.5.8 via MacPorts.

bash-3.2$ port info kdebase4
kdebase4 @4.6.5, Revision 1 (kde, kde4)
Replaced by:          kde4-baseapps
Variants:             debug, docs, universal

Description:          Core desktop applications and libraries for the KDE4 desktop. This port installs the file manager dolphin file manager.

Build Dependencies:   cmake, pkgconfig, automoc
Library Dependencies: qt4-mac, phonon
Platforms:            darwin
License:              unknown

It could seem like an odd choice running KDE on OS X, I know, the reason I do it is to gain access to the KDE Semantic project Nepomuk.
I’m planning more posts that will elaborate on why, and what is it that I’m doing with nepomuk but this post is much more specific and narrow in scope.

Having started to use nepomuk to tag my various files I have been constantly bothered by a process seriously hammering my cpu (and the hard drive).
The process name was nepomukservicestub, but I had a few instances of this process distinguished by a string parameter I could see with ps

bash-3.2$ ps -ef | grep -i nepomukservicestub

.../MacOS/nepomukservicestub nepomukstorage                           
.../MacOS/nepomukservicestub nepomukqueryservice                      
.../MacOS/nepomukservicestub nepomukremovablestorageservice           
.../MacOS/nepomukservicestub nepomukbackupsync
.../MacOS/nepomukservicestub nepomukstrigiservice                                                                                          

the one instance eating up my cpu was “nepomukstrigiservice”. now strigi is a well known(tm) file indexing service being used by nepomuk to populate parts of it’s own database. for reason I’ll go into in a future post I have no immediate use for the data generated by strigi.
so I decided to disable strigi rather than trying to fix the obvious problem with it (I had enough distraction as it is now, anyway).

strigi does come with a a control app called “strigiclient” however clicking the stop button on it had no effect on the nepomukstrigiservice that kept consuming above 90% of my poor MacBook Pro CPU cycles.

what I end up doing is using the dbus interface to nepomuk itself, ended up using dbus-tools from here checking out their SVN repo.

I wanted to share the command that had the load taken off my poor cpu, let me stress: this is useful under specific terms since it leaves nepomuk functional but on the other hand it disables the automatic indexing. for a lot of users (I’m tempted to say most) this will leave the system in a rather pointless state as the text (and possibly other properties) of new files will not be indexed!.

with that warning in mind here is the command I used to get rid of this process:

dbus org.kde.NepomukServer /servicemanager stopService %'"nepomukstrigiservice"'

where dbus is the dbus-tools executable.

Posted in "software enginerring", cli, Command Line, foss, KDE, nepomuk, open source, OSX, productivity tools, Technology, workaround | Tagged , , , , , , , , , | Leave a comment

Compiling SVN Tesseract on OSX

Should ever the need to compile Tesseract from SVN arise (version v3.01 at the time of the writing) Please note:

In order to fetch the source issue:

bash-3.2$ svn checkout tesseract-ocr-read-only

you have to install Leptonica beforehand (or via macports like me)

bash-3.2$ sudo port install leptonica

if you want to use autotools and libtool from macports (again like me) you’ll have to hack the  runautoconf in the tesseract source directory (‘tesseract-ocr-read-only\‘) prior to running it to call glibtoolize instead of libtoolize, rumor has it that libtoolize has been renamed to glibtoolize by the MacrPorts maintainers to avoid eclipsing  the apple /usr/bin/libtoolize from apple (that conveniently  enough is not compatible with it’s GNU counter part). following is the modified line in runautoconf:

echo "Running libtoolize"

The next step is to run the modified runautoconf:

bash-3.2$ ./runautoconf

next you’ll have to hack the tesseract ./configure script to include where macports installs leptonica (which is /opt/local/include/leptonica)

if test "$LIBLEPT_HEADERSDIR" = "" ; then
  LIBLEPT_HEADERSDIR="/usr/local/include /usr/include /opt/local/include/leptonica"

if you skip or mess up the previous step you’ll see  the following error when runnig ./configure:

bash-3.2$ ./configure
checking build system type... i686-apple-darwin9.8.0
checking for Leffler libtiff library... checking linking with -ltiff... ok
setting LIBTIFF_LIBS=-ltiff
checking for leptonica... configure: error: leptonica not found

it’s easy to forget to run runautoconf script before running ./configure.
and … offcourse call 'make'

note that it is also important to call

sudo make install

in order for the language files to be copied to the location Tesseract expects to find them at.

Posted in foss, ocr, OSX | Tagged , , | 12 Comments