I wanted to backup my nepomuk generated RDF data sat in my virtuoso database, at first I thought I could just copy my virtuosobackend directory onto dropbox, sym link it back to /Users/mcradle/…/KDE/../nepomuk/../data/ and get it over with.
this was before I discovered that my soprano-virtuoso.db is over 2.5 GB. that’s right 2.5 gigs.
Soon enough I found out that quite a bit of the data in the database was auto-generated by strigi, here is the SPARQL query to spit the strigi generated records:
nepomukcmd --foo query "select distinct ?g where { ?g ?r . }"
more information about nepomukcmd is here
and here is the nepomuk line to delete the strigi data:
for a in `nepomukcmd --foo query "select distinct ?g where { \
?g ?r . }"`;
do nepomukcmd rmgraph "$a"; done
As this did not reduce the database file size at all, I have moved on to reading virtuoso’s generous documentation and discovered that: “Virtuoso does not relinquish space in the DB file back to the file system as records are removed” .
Next in my quest to reclaim my disk space back I have tried to rebuild the database. The steps are listed in virtuoso’s documentation. there are details listed below that may help when it comes to operating on a nepomuk (or soprano, really) flavored virtuoso database.
spoiler alert
I’m listing the gory details of how to compact a soprano virtuoso database , but please do note (before you follow my instructions) that it has not reduce the file size to my liking!.
how to compact a soprano virtuoso database by rebuilding it
this required me to connect to the virtuoso using the isql command line interface, to achieve this one needs to
1. shutdown the nepomuk-server: I’m using the qdbus this is the command-line I use:
qdbus org.kde.NepomukServer /nepomukserver org.kde.NepomukServer.quit
2. obtain the config file that nepomuk is using to launch virtuoso, this was trickier than I have expected, I had to look in nepomuk server’s log to discover where nepomuk auto-generates the virtuoso config file it’s using and what option it is using to kick virtuoso into life.
Starting Virtuoso server: "/opt/local/bin/virtuoso-t" ("+foreground", "+configfile", "/var/folders/tM/tM6xx7GQHa0fSuQgvBAM7k+++TI/-Tmp-/virtuoso_ME9685.ini", "+wait")
3. I created a copy of virtuoso.ini and made the following changes to it
under the [Parameters] section I have
disabled LiteMode by changing to
LiteMode=0
Also under the [Parameters] section I have changed the server port to 1112 (for some reason port 1111 was taken) as follows:
ServerPort=1112
my [Database] section was already set by nepomuk to point at the right files
[Database] DatabaseFile=/Users/.../KDE/.../nepomuk/.../virtuosobackend/soprano-virtuoso.db ErrorLogFile=/Users/.../KDE/.../nepomuk/.../virtuosobackend/soprano-virtuoso.log TransactionFile=/Users/.../KDE/.../nepomuk/.../virtuosobackend/soprano-virtuoso.trx xa_persistent_file=/Users/.../KDE/.../nepomuk/.../virtuosobackend/soprano-virtuoso.pxa
4. it’s time to kick virtuoso to life with the new virtuoso.ini file so the new settings will come into effect, here is the command-line I used
/opt/local/bin/virtuoso-t -df +foreground +debug +configfile /mcradle/temp/virtuoso.ini
5. following the instructions in the virtuoso backup guide I needed to issue a SHUTDOWN; command to the virtuoso server so a checkpoint is created (and the server is shut down…). in order to do that one needs to start the isql utility that comes with virtuoso, here is my command-line to do that:
isql -S 1112
once it’s up I just type in
SHUTDOWN;
6. once the server has exit I have relaunched the server in “backup dump mode” by adding the -b switch as follows:
/opt/local/bin/virtuoso-t -df -b +foreground +debug +configfile /mcradle/temp/virtuoso.ini
7. I then backed up the database as follows:
mv ./soprano-virtuoso.db ./backup_before_crash_restore-soprano-virtuoso-db
8. the next step is to start the server in restore mode, and again the command I used:
/opt/local/bin/virtuoso-t -df +restore-crash-dump +foreground +debug +configfile /mcradle/temp/virtuoso.ini
as mentioned in my earlier spoiler above the database rebuild still left me with a file bigger than 1GB. for reasons (and calculations) I won’t get into at the moment I know that the data within the database shouldn’t take more than a megabyte, at most.
so I’ve tried to figure out how many records do I have in the quad-store by running the following SQL statement in the isql interface
select count(*) from "DB"."DBA"."RDF_QUAD";
this returned 21,009, which is way more than I have expected to have there. so it’s not surprising that an explicit compact also didn’t help much (here is the command I ran from the isql interface)
DB..VACUUM ('DB.DBA.RDF_QUAD');
again no effect on the file size, what so ever.
And the quest for a smaller virtuoso database, as it seems, has just begun.