Compacting Nepomuk’s Virtuoso database for fun and (little) profit

I wanted to backup my nepomuk generated RDF data sat in my virtuoso database, at first I thought I could just copy my virtuosobackend directory onto dropbox, sym link it back to /Users/mcradle/…/KDE/../nepomuk/../data/ and get it over with.
this was before I discovered that my soprano-virtuoso.db is over 2.5 GB. that’s right 2.5 gigs.

Soon enough I found out that quite a bit of the data in the database was auto-generated by strigi, here is the SPARQL query to spit the strigi generated records:

nepomukcmd --foo query "select distinct ?g where { ?g  ?r . }"

more information about nepomukcmd is here

and here is the nepomuk line to delete the strigi data:

for a in `nepomukcmd --foo query "select distinct ?g where { \
?g ?r . }"`;
do nepomukcmd rmgraph "$a"; done

As this did not reduce the database file size at all, I have moved on to reading virtuoso’s generous documentation and discovered that: “Virtuoso does not relinquish space in the DB file back to the file system as records are removed” .

Next in my quest to reclaim my disk space back I have tried to rebuild the database. The steps are listed in virtuoso’s documentation. there are details listed below that may help when it comes to operating on a nepomuk (or soprano, really) flavored virtuoso database.

spoiler alert

I’m listing the gory details of how to compact a soprano virtuoso database , but please do note (before you follow my instructions) that it has not reduce the file size to my liking!.

how to compact a soprano virtuoso database by rebuilding it

this required me to connect to the virtuoso using the isql command line interface, to achieve this one needs to

1. shutdown the nepomuk-server: I’m using the qdbus this is the command-line I use:

qdbus org.kde.NepomukServer /nepomukserver org.kde.NepomukServer.quit

2. obtain the config file that nepomuk is using to launch virtuoso, this was trickier than I have expected, I had to look in nepomuk server’s log to discover where nepomuk auto-generates the virtuoso config file it’s using and what option it is using to kick virtuoso into life.

Starting Virtuoso server: "/opt/local/bin/virtuoso-t" ("+foreground", "+configfile", "/var/folders/tM/tM6xx7GQHa0fSuQgvBAM7k+++TI/-Tmp-/virtuoso_ME9685.ini", "+wait")

3. I created a copy of virtuoso.ini and made the following changes to it
under the [Parameters] section I have
disabled LiteMode by changing to

LiteMode=0

Also under the [Parameters] section I have changed the server port to 1112 (for some reason port 1111 was taken) as follows:

ServerPort=1112

my [Database] section was already set by nepomuk to point at the right files

[Database]
DatabaseFile=/Users/.../KDE/.../nepomuk/.../virtuosobackend/soprano-virtuoso.db
ErrorLogFile=/Users/.../KDE/.../nepomuk/.../virtuosobackend/soprano-virtuoso.log
TransactionFile=/Users/.../KDE/.../nepomuk/.../virtuosobackend/soprano-virtuoso.trx
xa_persistent_file=/Users/.../KDE/.../nepomuk/.../virtuosobackend/soprano-virtuoso.pxa

4. it’s time to kick virtuoso to life with the new virtuoso.ini file so the new settings will come into effect, here is the command-line I used

/opt/local/bin/virtuoso-t -df +foreground +debug +configfile /mcradle/temp/virtuoso.ini

5. following the instructions in the virtuoso backup guide I needed to issue a SHUTDOWN; command to the virtuoso server so a checkpoint is created (and the server is shut down…). in order to do that one needs to start the isql utility that comes with virtuoso, here is my command-line to do that:

isql -S 1112

once it’s up I just type in
SHUTDOWN;

6. once the server has exit I have relaunched the server in “backup dump mode” by adding the -b switch as follows:

/opt/local/bin/virtuoso-t -df -b +foreground +debug +configfile /mcradle/temp/virtuoso.ini

7. I then backed up the database as follows:
mv ./soprano-virtuoso.db ./backup_before_crash_restore-soprano-virtuoso-db

8. the next step is to start the server in restore mode, and again the command I used:

/opt/local/bin/virtuoso-t -df +restore-crash-dump +foreground +debug +configfile /mcradle/temp/virtuoso.ini

as mentioned in my earlier spoiler above the database rebuild still left me with a file bigger than 1GB. for reasons (and calculations) I won’t get into at the moment I know that the data within the database shouldn’t take more than a megabyte, at most.

so I’ve tried to figure out how many records do I have in the quad-store by running the following SQL statement in the isql interface

select count(*) from "DB"."DBA"."RDF_QUAD";

this returned 21,009, which is way more than I have expected to have there. so it’s not surprising that an explicit compact also didn’t help much (here is the command I ran from the isql interface)

DB..VACUUM ('DB.DBA.RDF_QUAD');

again no effect on the file size, what so ever.

And the quest for a smaller virtuoso database, as it seems, has just begun.

About these ads
This entry was posted in cli, Command Line, nepomuk, open source, Uncategorized and tagged , , , , , , , . Bookmark the permalink.

One Response to Compacting Nepomuk’s Virtuoso database for fun and (little) profit

  1. JJones says:

    I’ve just discovered a similarly sized soprano-virtuoso.db ( almost 3GB ) on my new 1-week old install.
    I also had a massive .cache/upstart/startkde.log file ( over 2GB ) which was filled with video related errors. I had unplugged a usb webcam I was testing using guvcview – unaware of the consequences to my file system until I did my backup.
    I’ve stopped nepomuk and am just deleting the two huge files.. for better or worse.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s