Compacting Nepomuk’s Virtuoso database for fun and (little) profit

I wanted to backup my nepomuk generated RDF data sat in my virtuoso database, at first I thought I could just copy my virtuosobackend directory onto dropbox, sym link it back to /Users/mcradle/…/KDE/../nepomuk/../data/ and get it over with.
this was before I discovered that my soprano-virtuoso.db is over 2.5 GB. that’s right 2.5 gigs.

Soon enough I found out that quite a bit of the data in the database was auto-generated by strigi, here is the SPARQL query to spit the strigi generated records:

nepomukcmd --foo query "select distinct ?g where { ?g  ?r . }"

more information about nepomukcmd is here

and here is the nepomuk line to delete the strigi data:

for a in `nepomukcmd --foo query "select distinct ?g where { \
?g ?r . }"`;
do nepomukcmd rmgraph "$a"; done

As this did not reduce the database file size at all, I have moved on to reading virtuoso’s generous documentation and discovered that: “Virtuoso does not relinquish space in the DB file back to the file system as records are removed” .

Next in my quest to reclaim my disk space back I have tried to rebuild the database. The steps are listed in virtuoso’s documentation. there are details listed below that may help when it comes to operating on a nepomuk (or soprano, really) flavored virtuoso database.

spoiler alert

I’m listing the gory details of how to compact a soprano virtuoso database , but please do note (before you follow my instructions) that it has not reduce the file size to my liking!.

how to compact a soprano virtuoso database by rebuilding it

this required me to connect to the virtuoso using the isql command line interface, to achieve this one needs to

1. shutdown the nepomuk-server: I’m using the qdbus this is the command-line I use:

qdbus org.kde.NepomukServer /nepomukserver org.kde.NepomukServer.quit

2. obtain the config file that nepomuk is using to launch virtuoso, this was trickier than I have expected, I had to look in nepomuk server’s log to discover where nepomuk auto-generates the virtuoso config file it’s using and what option it is using to kick virtuoso into life.

Starting Virtuoso server: "/opt/local/bin/virtuoso-t" ("+foreground", "+configfile", "/var/folders/tM/tM6xx7GQHa0fSuQgvBAM7k+++TI/-Tmp-/virtuoso_ME9685.ini", "+wait")

3. I created a copy of virtuoso.ini and made the following changes to it
under the [Parameters] section I have
disabled LiteMode by changing to


Also under the [Parameters] section I have changed the server port to 1112 (for some reason port 1111 was taken) as follows:


my [Database] section was already set by nepomuk to point at the right files


4. it’s time to kick virtuoso to life with the new virtuoso.ini file so the new settings will come into effect, here is the command-line I used

/opt/local/bin/virtuoso-t -df +foreground +debug +configfile /mcradle/temp/virtuoso.ini

5. following the instructions in the virtuoso backup guide I needed to issue a SHUTDOWN; command to the virtuoso server so a checkpoint is created (and the server is shut down…). in order to do that one needs to start the isql utility that comes with virtuoso, here is my command-line to do that:

isql -S 1112

once it’s up I just type in

6. once the server has exit I have relaunched the server in “backup dump mode” by adding the -b switch as follows:

/opt/local/bin/virtuoso-t -df -b +foreground +debug +configfile /mcradle/temp/virtuoso.ini

7. I then backed up the database as follows:
mv ./soprano-virtuoso.db ./backup_before_crash_restore-soprano-virtuoso-db

8. the next step is to start the server in restore mode, and again the command I used:

/opt/local/bin/virtuoso-t -df +restore-crash-dump +foreground +debug +configfile /mcradle/temp/virtuoso.ini

as mentioned in my earlier spoiler above the database rebuild still left me with a file bigger than 1GB. for reasons (and calculations) I won’t get into at the moment I know that the data within the database shouldn’t take more than a megabyte, at most.

so I’ve tried to figure out how many records do I have in the quad-store by running the following SQL statement in the isql interface

select count(*) from "DB"."DBA"."RDF_QUAD";

this returned 21,009, which is way more than I have expected to have there. so it’s not surprising that an explicit compact also didn’t help much (here is the command I ran from the isql interface)


again no effect on the file size, what so ever.

And the quest for a smaller virtuoso database, as it seems, has just begun.

This entry was posted in cli, Command Line, nepomuk, open source, Uncategorized and tagged , , , , , , , . Bookmark the permalink.

1 Response to Compacting Nepomuk’s Virtuoso database for fun and (little) profit

  1. JJones says:

    I’ve just discovered a similarly sized soprano-virtuoso.db ( almost 3GB ) on my new 1-week old install.
    I also had a massive .cache/upstart/startkde.log file ( over 2GB ) which was filled with video related errors. I had unplugged a usb webcam I was testing using guvcview – unaware of the consequences to my file system until I did my backup.
    I’ve stopped nepomuk and am just deleting the two huge files.. for better or worse.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s