SPARUL Update Queries Via SPARQLWrapper And A Virtuoso Server

If you ever had a burning desire to use SPARUL updates (aka SPARQL/1.1) on an Ubuntu machine via SPARQLWrapper you probably seen the following error message

Traceback (most recent call last):
File "./wquery", line 93, in test
ret = m_sparql.query()
File "/usr/local/lib/python2.7/dist-packages/SPARQLWrapper/Wrapper.py", line 390, in query
return QueryResult(self._query())
File "/usr/local/lib/python2.7/dist-packages/SPARQLWrapper/Wrapper.py", line 369, in _query
raise e
urllib2.HTTPError: HTTP Error 406: Unacceptable

it turns out that it is resolved on 1.5.3, however at the time of the writing PIP only carries SPARQLWrapper 1.5.2. to get the fresh version one must pull from SVN:

svn checkout svn://svn.code.sf.net/p/sparql-wrapper/code/trunk sparql-wrapper-code

And here is the test code to write a RDF record to the quad-store:

 iquery = 'INSERT IN GRAPH <nepomuk:/sourceforge.net/users/mcradle/resources/ontologies/magic-bucket> {<http://www.example.org/subject> <http://example.org/predicate> <http://example.org/ob>}'
 m_sparql = SPARQLWrapper(endpoint = "http://localhost:8893/sparql/", updateEndpoint = "http://localhost:8893/sparql-auth/")
 m_sparql.addDefaultGraph ("nepomuk:/sourceforge.net/users/mcradle/resources/ontologies/magic-bucket")
 m_sparql.setCredentials(user = "user", passwd = "password")
 m_sparql.setQuery(iquery)
 m_sparql.setReturnFormat(JSON)
 ret = m_sparql.query()

This now ends with HTTP/1.1 Error 401:

mcradle@carver:~/workdir/remember/nepomuk/nepomuk$ ./wquery "http://www.example2.org/subject" "http://www.example2.org/predicate13" "http://www.example2.org/object"
INSERT IN GRAPH <nepomuk:/sourceforge.net/users/mcradle/resources/ontologies/magic-bucket> {<http://www.example2.org/subject> <http://www.example2.org/predicate13> <http://www.example2.org/object> }
send: 'GET /sparql-auth?output=xml&format=xml&results=xml&update=INSERT+IN+GRAPH+%3Cnepomuk%3A%2Fsourceforge.net%2Fusers%2Fmcradle%2Fresources%2Fontologies%2Fmagic-bucket%3E+%7B%3Chttp%3A%2F%2Fwww.example2.org%2Fsubject%3E+%3Chttp%3A%2F%2Fwww.example2.org%2Fpredicate13%3E+%3Chttp%3A%2F%2Fwww.example2.org%2Fobject%3E+%7D HTTP/1.1\r\nAccept-Encoding: identity\r\nAccept: */*\r\nHost: localhost:8891\r\nConnection: close\r\nAuthorization: Digest cmVtZW1iZXI6cmVtZW1iZXI=\n\r\nUser-Agent: sparqlwrapper 1.5.3 (http://sparql-wrapper.sourceforge.net/)\r\n\r\n'
reply: 'HTTP/1.1 401 Unauthorized\r\n'
header: Server: Virtuoso/06.01.3127 (Linux) i686-pc-linux-gnu
header: Connection: close
header: Content-Type: text/html; charset=UTF-8
header: Date: Sat, 01 Jun 2013 16:40:16 GMT
header: Accept-Ranges: bytes
header: Content-Length: 0
Traceback (most recent call last):
 File "./wquery", line 118, in <module>
 registerStatement2 (t_subject, t_predicate, t_object)
 File "./wquery", line 107, in registerStatement2
 ret = m_sparql.query()
 File "/home/mcradle/workdir/remember/sparqlwrapper/sparql-wrapper-code/src/SPARQLWrapper/Wrapper.py", line 391, in query
 return QueryResult(self._query())
 File "/home/mcradle/workdir/remember/sparqlwrapper/sparql-wrapper-code/src/SPARQLWrapper/Wrapper.py", line 370, in _query
 raise e
urllib2.HTTPError: HTTP Error 401: Unauthorized

But this is because SPARQLWrapper does not support digest auth , so I monkey patched my 1.5.3 SPARQLWrapper to see if it helps:

mcradle@carver:~/workdir/remember/sparqlwrapper/sparql-wrapper-code$ diff -u /mcradle/temp/sparqlwrapper/sparql-wrapper-code/src/SPARQLWrapper/Wrapper.py ./src/SPARQLWrapper/Wrapper.py                                                
--- /tmp/sparqlwrapper/sparql-wrapper-code/src/SPARQLWrapper/Wrapper.py        2013-05-19 00:34:20.597501904 +0300                                                                                                             
+++ ./src/SPARQLWrapper/Wrapper.py      2013-06-01 19:53:38.808090704 +0300                                                                                                                                                             
@@ -343,8 +344,15 @@                                                                                                                                                                                                                    
                                                                                                                                                                                                                                        
         request.add_header("User-Agent", self.agent)                                                                                                                                                                                   
         request.add_header("Accept", acceptHeader)                                                                                                                                                                                     
         if (self.user and self.passwd):                                                                                                                                                                                                
             request.add_header("Authorization", "Basic " + base64.encodestring("%s:%s" % (self.user,self.passwd)))                                                                                                                     
+            passman = urllib2.HTTPPasswordMgrWithDefaultRealm()                                                                                                                                                                        
+            passman.add_password(None, self.updateEndpoint, self.user, self.passwd)                                                                                                                                                    
+            auth_handler = urllib2.HTTPDigestAuthHandler(passman)                                                                                                                                                                      
+            opener = urllib2.build_opener(auth_handler)                                                                                                                                                                                
+            urllib2.install_opener(opener)                                                                                                                                                                                             
+                                                                                                                                                                                                                                       
         return request                                                                                                                                                                                                                 
                                                                                                                                                                                                                                        
     def _query(self):                                     

it further turns out that virtuoso-opensource 6.1.4+dfsg1-0ubuntu1 is not compliant with the new update= notation that replaces the query= syntax in case of a SPARQL 1.1 update operation, according to the SPARQL 1.1 spec anyway.

so one needs to override the SPARQLWrapper Query keyword when a SPARQL 1.1 update is used in a SPARQLWrapper Query:

sparql.queryType= SELECT

I can finally update my virtuoso database by using SPARQLWrapper!

Advertisement
Posted in Uncategorized | Tagged , , , , , , , , , | Leave a comment

Compiling zbar on OSX

If you’re trying to compile zbar the open source bar code reader on OS X with jpeg support and want to use the MacPorts jpeg lib, and getting this error

checking for jpeg_read_header in -ljpeg... no
configure: error: in `/skroner/temp/zbar/zbar-0.10':
configure: error: unable to find libjpeg! ensure CFLAGS/LDFLAGS are set appropriately or configure --without-jpeg        

you can use the following command line to get it working:

CPPFLAGS="-I/opt/local/include" LDFLAGS="-L/opt/local/lib" ./configure --disable-video --without-qt --without-python --without-gtk --with-libiconv-prefix=/opt/local --with-jpeg=yes

it’s a cool program!

Posted in "software enginerring", foss, ocr, open source, OSX, workaround | Tagged , , , , | 1 Comment

Mount box.com Storage As a Webdav On OSX 5.8

Update April 21st: I Have managed to get box.com to mount and to write files, however I can only copy one file at a time. after each file the wdfs process crashes. at this point I’m not investigating further

Don’t know what the deal thee problem lies with my reluctance to upgrade my OSX 5.8 to a newer cat version, but I just couldn’t mount box.com directly using finder. I mean it technically did mount but it was impossible to access any files not for read nor write.

searching the net I did came up with an alternative code to mount webdav resources called wdfs. in spite the fact that it didn’t list any updates since 2007(!). I’ve decided to give it a try.

after an easy confgiure-make-make install I issued:

wdfs https://www.box.com/dav  /Volumes/box/ -o debug

and I could read my webdav, which was a major step forward.
writing to the webdav, however, seems to be a different story. I could upload a file using the web interface — and then append to it.
so

echo "1234" >> ./foo 

yielded the expected results as long as foo existed.
if the file didn’t exist I got the following errors on stderr (thanks to the -o debug flag off course!)

unique: 2, opcode: OPEN (14), nodeid: 11, insize: 48
## GET error: Could not read chunk size: connection was closed by server
   unique: 2, error: -2 (No such file or directory), outsize: 16
unique: 3, opcode: LOOKUP (1), nodeid: 8, insize: 44
Could not read chunk size: connection was closed by server

looking for the error in wdfs source code and some further debugging — it turns out that it comes from the wdfs_open() function.
below is an abbreviated version for clarity:

static int wdfs_open(const char *localpath, struct fuse_file_info *fi)
{
.
.
.

	struct open_file *file = g_new0(struct open_file, 1);
	file->modified = false;

	file->fh = get_filehandle();
		remotepath = get_remotepath(localpath);
.
.
.

	/* GET the data to the filehandle even if the file is opened O_WRONLY,
	 * because the opening application could use pwrite() or use O_APPEND
	 * and than the data needs to be present. */
	if (ne_get(session, remotepath, file->fh)) {
		fprintf(stderr, "## GET error: %s\n", ne_get_error(session));
		FREE(remotepath);
		return -ENOENT;
	}

.
.
.
}

It turns out that wdfs insists on issuing an HTTP GET even if we’re in the midst of creating the file.
not sure why the server reply with a

"Could not read chunk size: connection was closed by server"

but one thing for sure, if I just ignore the error as follows:

	if (ne_get(session, remotepath, file->fh)) {
		fprintf(stderr, "## GET (wdfs_open) error: %s\n", ne_get_error(session));

		/* Mcradle -> Patch don't return an error 
		FREE(remotepath);
		return -ENOENT;
		*/
	}

the file is created just fine, and I can issue a cp command, and verify that the data made it.
this brings a whole new meaning to the term monkey patching 🙂

Please be warned: your mileage may vary, as I’m not sure how legit is to ignore this error, and why is it there to begin with.

Posted in "software enginerring", open source, OSX, workaround | Tagged , , , , , , , | Leave a comment

Dump, Merge And Import Graphs From a Virtuoso Database

I have figure out how to extract my RDF data from a nepomuk-soprano-virtuoso database. I must say that I have crafted my own tools to create the data on top of the soprano model, so I can control the graph all my data belongs to.
It wasn’t always like that – but I’ll get to that pretty soon.

as detailed in my previous post – rebuilding the virtuoso database left me with a file way too bloated for my taste to start backing up.

So I’ve decided to try the strategy of dumping the data, filter it, and import it back to a clean database.

running isql and virtuoso like I described in my previous post, I have used the following SQL function in isql:

CREATE PROCEDURE dump_graphs 
  ( IN  dir               VARCHAR  :=  '/Users/mcradle/Library/Preferences/KDE/share/apps/nepomuk/repository/main/data/virtuosobackend/dumps'   , 
    IN  file_length_limit INTEGER  :=  1000000000
  )
  {
    DECLARE inx INT;
    inx := 1;
    SET ISOLATION = 'uncommitted';
    FOR ( SELECT * 
            FROM ( SPARQL DEFINE input:storage "" 
                   SELECT DISTINCT ?g { GRAPH ?g { ?s ?p ?o } . 
                                        FILTER ( ?g != virtrdf: ) 
                                      } 
                 ) AS sub OPTION ( LOOP )) DO
      {

       dbg_printf ('about to dump %s',"g");
       
        dump_one_graph ( "g", 
                         sprintf ('%s/graph%06d_', dir, inx), 
                         file_length_limit
                       );
	       dbg_printf ('dump done %s',"g");
        inx := inx + 1;
      }
  }
;
CREATE PROCEDURE dump_one_graph 
  ( IN  srcgraph           VARCHAR  , 
    IN  out_file           VARCHAR  , 
    IN  file_length_limit  INTEGER  := 1000000000
  )
  {
    DECLARE  file_name  varchar;
    DECLARE  env, ses      any;
    DECLARE  ses_len, 
             max_ses_len, 
             file_len, 
             file_idx      integer;
    SET ISOLATION = 'uncommitted';
    max_ses_len := 10000000;
    file_len := 0;
    file_idx := 1;
    file_name := sprintf ('%s%06d.ttl', out_file, file_idx);
    string_to_file ( file_name || '.graph', 
                     srcgraph, 
                     -2
                   );
    string_to_file ( file_name, 
                     sprintf ( '# Dump of graph <%s>, as of %s\n', 
                               srcgraph, 
                               CAST (NOW() AS VARCHAR)
                             ), 
                     -2
                   );
    env := vector (dict_new (16000), 0, '', '', '', 0, 0, 0, 0);
    ses := string_output ();
    FOR (SELECT * FROM ( SPARQL DEFINE input:storage "" 
                         SELECT ?s ?p ?o { GRAPH `iri(?:srcgraph)` { ?s ?p ?o } } 
                       ) AS sub OPTION (LOOP)) DO
      {
        http_ttl_triple (env, "s", "p", "o", ses);
        ses_len := length (ses);
        IF (ses_len > max_ses_len)
          {
            file_len := file_len + ses_len;
            IF (file_len > file_length_limit)
              {
                http (' .\n', ses);
                string_to_file (file_name, ses, -1);
                file_len := 0;
                file_idx := file_idx + 1;
                file_name := sprintf ('%s%06d.ttl', out_file, file_idx);
                string_to_file ( file_name, 
                                 sprintf ( '# Dump of graph <%s>, as of %s (part %d)\n', 
                                           srcgraph, 
                                           CAST (NOW() AS VARCHAR), 
                                           file_idx), 
                                 -2
                               );
                 env := vector (dict_new (16000), 0, '', '', '', 0, 0, 0, 0);
              }
            ELSE
              string_to_file (file_name, ses, -1);
            ses := string_output ();
          }
      }
    IF (LENGTH (ses))
      {
        http (' .\n', ses);
        string_to_file (file_name, ses, -1);
      }
  }
;
 

one must have the directory of which the graphs to be dumped in the DirsAllowed parameter within the [Parameters] section of virtuoso.ini
for example:

[Parameters]
DirsAllowed= /Users/mcradle/Library/Preferences/KDE/share/apps/nepomuk/repository/main/data/virtuosobackend/dumps/,./dumps,dumps

then issue

dump_graphs();

on the isql commandline.

this should create a bunch of files in your dump directory similarly to the following:

bash-3.2$ ls -lhtra | head
total 8856
-rw-r--r--    1 mcradle  staff    36B Feb 19 00:14 graph000002_000001.ttl.graph
-rw-r--r--    1 mcradle  staff    43B Feb 19 00:14 graph000001_000001.ttl.graph
-rw-r--r--    1 mcradle  staff   2.9K Feb 19 00:14 graph000001_000001.ttl
-rw-r--r--    1 mcradle  staff    49B Feb 19 00:16 graph000090_000001.ttl.graph
-rw-r--r--    1 mcradle  staff   1.1K Feb 19 00:16 graph000090_000001.ttl
-rw-r--r--    1 mcradle  staff    49B Feb 19 00:16 graph000089_000001.ttl.graph
-rw-r--r--    1 mcradle  staff   947B Feb 19 00:16 graph000089_000001.ttl
-rw-r--r--    1 mcradle  staff    49B Feb 19 00:16 graph000088_000001.ttl.graph
-rw-r--r--    1 mcradle  staff   791B Feb 19 00:16 graph000088_000001.ttl

as it turns out my RDF data was spread across many grpahs, I think nepomuk does that by default, I don’t even claim to understand why and how.
I wanted to unify them into one graph to make the data export/import easier in the future.

Python and librdf to the rescue, it turns out that this is not that hard, following the python snippet to merge a few turtle file containing each a graph (exactly the form created by the dump_graphs() procedure above)

files_to_be_merged = ("./graph000030_000001.ttl",
                      "./graph000031_000001.ttl",
                      "./graph000032_000001.ttl",
                      "./graph000033_000001.ttl",
                      "./graph000034_000001.ttl",
                      "./graph000100_000001.ttl",
                      "./graph000013_000001.ttl")

import rdflib

store = rdflib.Graph()

for fil in files_to_be_merged:
    store.parse (fil, format='n3')

f = open('/Users/mcradle/.../dumps/unified-graph.ttl', 'w')
f.write (store.serialize(format='turtle'))

note that since I’ve created my own turtle file I need to create my own .ttl.graph file.
it turns out that it’s a file containing the graph name, for example:

bash-3.2$ cat graph000341_000001.ttl.graph 
nepomuk:/ctx/f665ed7d-ec19-4454-ae4c-70635f1b442f

so I’ve created my own unified-graph.ttl.graph with my own graph name.

I then deleted the following virtuoso database files:

soprano-virtuoso-temp.db
soprano-virtuoso.db 
soprano-virtuoso.lck
soprano-virtuoso.loc
soprano-virtuoso.log
soprano-virtuoso.pxa
soprano-virtuoso.trx

and launched nepomuk to have the database recreated. the newly created database was way smaller: around 10MB, I’ve just shrank my database in a factor of almost 200!
next I left with importing unified-graph.ttl back into the database. for that I’ve killed the nepomuk server and relaunched the standalone virtuoso.
it’s important to see that the dump directory is still allowed as per when we dumped the graphs.

to import back the graph with my data: in ‘isql’ prompt I pasted the following:

CREATE PROCEDURE load_graphs 
  ( IN  dir  VARCHAR := 'dumps' )
{
  DECLARE arr ANY;
  DECLARE g VARCHAR;

  arr := sys_dirlist (dir, 1);
  log_enable (2, 1);
  FOREACH (VARCHAR f IN arr) DO
    {
      IF (f LIKE '*.ttl')
      {
        DECLARE CONTINUE HANDLER FOR SQLSTATE '*'
	    {
	          log_message (sprintf ('Error in %s', f));
		      };
  		        g := file_to_string (dir || '/' || f || '.graph');
			dbg_printf ('g is %s', "g");
			  DB.DBA.TTLP_MT (file_open (dir || '/' || f), g, g, 255);
			  }
    }
  EXEC ('CHECKPOINT');
}
;

and ran the procedure to start the import process

load_graphs ();
Posted in Uncategorized | Tagged , , , , , , , , , , | 1 Comment

Compacting Nepomuk’s Virtuoso database for fun and (little) profit

I wanted to backup my nepomuk generated RDF data sat in my virtuoso database, at first I thought I could just copy my virtuosobackend directory onto dropbox, sym link it back to /Users/mcradle/…/KDE/../nepomuk/../data/ and get it over with.
this was before I discovered that my soprano-virtuoso.db is over 2.5 GB. that’s right 2.5 gigs.

Soon enough I found out that quite a bit of the data in the database was auto-generated by strigi, here is the SPARQL query to spit the strigi generated records:

nepomukcmd --foo query "select distinct ?g where { ?g  ?r . }"

more information about nepomukcmd is here

and here is the nepomuk line to delete the strigi data:

for a in `nepomukcmd --foo query "select distinct ?g where { \
?g ?r . }"`;
do nepomukcmd rmgraph "$a"; done

As this did not reduce the database file size at all, I have moved on to reading virtuoso’s generous documentation and discovered that: “Virtuoso does not relinquish space in the DB file back to the file system as records are removed” .

Next in my quest to reclaim my disk space back I have tried to rebuild the database. The steps are listed in virtuoso’s documentation. there are details listed below that may help when it comes to operating on a nepomuk (or soprano, really) flavored virtuoso database.

spoiler alert

I’m listing the gory details of how to compact a soprano virtuoso database , but please do note (before you follow my instructions) that it has not reduce the file size to my liking!.

how to compact a soprano virtuoso database by rebuilding it

this required me to connect to the virtuoso using the isql command line interface, to achieve this one needs to

1. shutdown the nepomuk-server: I’m using the qdbus this is the command-line I use:

qdbus org.kde.NepomukServer /nepomukserver org.kde.NepomukServer.quit

2. obtain the config file that nepomuk is using to launch virtuoso, this was trickier than I have expected, I had to look in nepomuk server’s log to discover where nepomuk auto-generates the virtuoso config file it’s using and what option it is using to kick virtuoso into life.

Starting Virtuoso server: "/opt/local/bin/virtuoso-t" ("+foreground", "+configfile", "/var/folders/tM/tM6xx7GQHa0fSuQgvBAM7k+++TI/-Tmp-/virtuoso_ME9685.ini", "+wait")

3. I created a copy of virtuoso.ini and made the following changes to it
under the [Parameters] section I have
disabled LiteMode by changing to

LiteMode=0

Also under the [Parameters] section I have changed the server port to 1112 (for some reason port 1111 was taken) as follows:

ServerPort=1112

my [Database] section was already set by nepomuk to point at the right files

[Database]
DatabaseFile=/Users/.../KDE/.../nepomuk/.../virtuosobackend/soprano-virtuoso.db
ErrorLogFile=/Users/.../KDE/.../nepomuk/.../virtuosobackend/soprano-virtuoso.log
TransactionFile=/Users/.../KDE/.../nepomuk/.../virtuosobackend/soprano-virtuoso.trx
xa_persistent_file=/Users/.../KDE/.../nepomuk/.../virtuosobackend/soprano-virtuoso.pxa

4. it’s time to kick virtuoso to life with the new virtuoso.ini file so the new settings will come into effect, here is the command-line I used

/opt/local/bin/virtuoso-t -df +foreground +debug +configfile /mcradle/temp/virtuoso.ini

5. following the instructions in the virtuoso backup guide I needed to issue a SHUTDOWN; command to the virtuoso server so a checkpoint is created (and the server is shut down…). in order to do that one needs to start the isql utility that comes with virtuoso, here is my command-line to do that:

isql -S 1112

once it’s up I just type in
SHUTDOWN;

6. once the server has exit I have relaunched the server in “backup dump mode” by adding the -b switch as follows:

/opt/local/bin/virtuoso-t -df -b +foreground +debug +configfile /mcradle/temp/virtuoso.ini

7. I then backed up the database as follows:
mv ./soprano-virtuoso.db ./backup_before_crash_restore-soprano-virtuoso-db

8. the next step is to start the server in restore mode, and again the command I used:

/opt/local/bin/virtuoso-t -df +restore-crash-dump +foreground +debug +configfile /mcradle/temp/virtuoso.ini

as mentioned in my earlier spoiler above the database rebuild still left me with a file bigger than 1GB. for reasons (and calculations) I won’t get into at the moment I know that the data within the database shouldn’t take more than a megabyte, at most.

so I’ve tried to figure out how many records do I have in the quad-store by running the following SQL statement in the isql interface

select count(*) from "DB"."DBA"."RDF_QUAD";

this returned 21,009, which is way more than I have expected to have there. so it’s not surprising that an explicit compact also didn’t help much (here is the command I ran from the isql interface)

DB..VACUUM ('DB.DBA.RDF_QUAD');

again no effect on the file size, what so ever.

And the quest for a smaller virtuoso database, as it seems, has just begun.

Posted in cli, Command Line, nepomuk, open source, Uncategorized | Tagged , , , , , , , | 1 Comment

Terminate nepomukstrigiservice but keep running Nepomuk on OSX

I have a sneaking suspicion that I’m the only guy on earth that is interested in this setup, but who knows?

I’m running KDE 4.65 Max OSX 10.5.8 via MacPorts.

bash-3.2$ port info kdebase4
kdebase4 @4.6.5, Revision 1 (kde, kde4)
Replaced by:          kde4-baseapps
Variants:             debug, docs, universal

Description:          Core desktop applications and libraries for the KDE4 desktop. This port installs the file manager dolphin file manager.
Homepage:             http://www.kde.org

Build Dependencies:   cmake, pkgconfig, automoc
Library Dependencies: qt4-mac, phonon
Platforms:            darwin
License:              unknown
Maintainers:          snc@macports.org, sharky@macports.org

It could seem like an odd choice running KDE on OS X, I know, the reason I do it is to gain access to the KDE Semantic project Nepomuk.
I’m planning more posts that will elaborate on why, and what is it that I’m doing with nepomuk but this post is much more specific and narrow in scope.

Having started to use nepomuk to tag my various files I have been constantly bothered by a process seriously hammering my cpu (and the hard drive).
The process name was nepomukservicestub, but I had a few instances of this process distinguished by a string parameter I could see with ps

bash-3.2$ ps -ef | grep -i nepomukservicestub

.../MacOS/nepomukservicestub nepomukstorage                           
.../MacOS/nepomukservicestub nepomukqueryservice                      
.../MacOS/nepomukservicestub nepomukremovablestorageservice           
.../MacOS/nepomukservicestub nepomukbackupsync
.../MacOS/nepomukservicestub nepomukstrigiservice                                                                                          

the one instance eating up my cpu was “nepomukstrigiservice”. now strigi is a well known(tm) file indexing service being used by nepomuk to populate parts of it’s own database. for reason I’ll go into in a future post I have no immediate use for the data generated by strigi.
so I decided to disable strigi rather than trying to fix the obvious problem with it (I had enough distraction as it is now, anyway).

strigi does come with a a control app called “strigiclient” however clicking the stop button on it had no effect on the nepomukstrigiservice that kept consuming above 90% of my poor MacBook Pro CPU cycles.

what I end up doing is using the dbus interface to nepomuk itself, ended up using dbus-tools from here checking out their SVN repo.

I wanted to share the command that had the load taken off my poor cpu, let me stress: this is useful under specific terms since it leaves nepomuk functional but on the other hand it disables the automatic indexing. for a lot of users (I’m tempted to say most) this will leave the system in a rather pointless state as the text (and possibly other properties) of new files will not be indexed!.

with that warning in mind here is the command I used to get rid of this process:

dbus org.kde.NepomukServer /servicemanager stopService %'"nepomukstrigiservice"'

where dbus is the dbus-tools executable.

Posted in "software enginerring", cli, Command Line, foss, KDE, nepomuk, open source, OSX, productivity tools, Technology, workaround | Tagged , , , , , , , , , | Leave a comment

Compiling SVN Tesseract on OSX

Should ever the need to compile Tesseract from SVN arise (version v3.01 at the time of the writing) Please note:

In order to fetch the source issue:

bash-3.2$ svn checkout http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr-read-only

you have to install Leptonica beforehand (or via macports like me)

bash-3.2$ sudo port install leptonica

if you want to use autotools and libtool from macports (again like me) you’ll have to hack the  runautoconf in the tesseract source directory (‘tesseract-ocr-read-only\‘) prior to running it to call glibtoolize instead of libtoolize, rumor has it that libtoolize has been renamed to glibtoolize by the MacrPorts maintainers to avoid eclipsing  the apple /usr/bin/libtoolize from apple (that conveniently  enough is not compatible with it’s GNU counter part). following is the modified line in runautoconf:

.
.
.
echo "Running libtoolize"
glibtoolize
.
.
.

The next step is to run the modified runautoconf:

bash-3.2$ ./runautoconf

next you’ll have to hack the tesseract ./configure script to include where macports installs leptonica (which is /opt/local/include/leptonica)

.
.
.
have_lept=no
if test "$LIBLEPT_HEADERSDIR" = "" ; then
  LIBLEPT_HEADERSDIR="/usr/local/include /usr/include /opt/local/include/leptonica"
fi
.
.
.

if you skip or mess up the previous step you’ll see  the following error when runnig ./configure:

bash-3.2$ ./configure
checking build system type... i686-apple-darwin9.8.0
.
.
checking for Leffler libtiff library... checking linking with -ltiff... ok
setting LIBTIFF_CFLAGS=
setting LIBTIFF_LIBS=-ltiff
checking for leptonica... configure: error: leptonica not found

it’s easy to forget to run runautoconf script before running ./configure.
and … offcourse call 'make'

note that it is also important to call

sudo make install

in order for the language files to be copied to the location Tesseract expects to find them at.

Posted in foss, ocr, OSX | Tagged , , | 12 Comments

The Buffer Kool Aid

I’ve already drank the emacs kool aid, but today something has finally dropped: I now understand what Steve Yegge means in his famous “effective emacs” essay:

There is nothing else quite like the Emacs buffer experience in all of application-dom. Once you realize how consistent and powerful this model is, all other applications become slightly distasteful, because you know their UI is just getting in your way.

this little (and important IMO) realization came to me when I was trying to display the output of a clearcase man command and realized I can issue:
M-x shell-command and feed cleartool man rmview and get a standard emacs buffer in return, which in turn means all my default key bindings and buffer tricks will just work.

This is important. In the same sense that ‘everything is a file’ makes a powerful abstraction that in turn makes the tools that operates on a file used on a wide variety of objects, ‘everything is a buffer’ offers the same powerful abstractions to UIs.
too bad the modern UI interfaces turned away from this metaphor twenty years ago.

Posted in emacs, foss | Tagged , , | Leave a comment

OSX and Windows XP: The Quest For Synergy

Part I – The Synergetic Way

Synergy is high there on the list of software I don’t know how I could have been productive without. There is one huge obvious elephant in the room when it comes to Synergy, though: the security model, or lack of one really. Not only that every key press travels in clear text over the network it does not even authenticate clients nor servers.

Bottom line there is no security to talk about: Synergy is one big flying security vulnerability. Lucky me: there are ways to add an after-thought security to a deployed solution.

But first allow me to spell out my motivation: as usual I’m trying to make the technology work just like I want it to. See I carry work home and I carry it in a form of a Laptop running windows. I also have a MacBook Pro as my primary system at home connected to a 24″ Dell UltraSharp 24FPW hooked up via a DVI, I’d like my Windows Laptop to be connected to the VGA port on the same monitor so I could use the 24″ screen real estate to do some work. Now I’d like to do that such that all I need to do is dock my Windows XP and use my existing Mouse, Keyboard and obviously the monitor. When I disconnect the XP Laptop I would like to just undock it shove it to my bag and go without hassle and leaving my MacBook fully connected. The XP Laptop is connecting to the network via my home wireless network – so some encryption/authentication would not be a bad idea. I could have spare the effort and fall back to use Cord but the VPN software I use just refuses to authenticate when I just remote in, besides why push all those pixels via the network when I can be connected directly via VGA?

So I’ve decided to use synergy to control the XP mouse when my Mac is serving as a synergy server and tunnel this via stunnel. Now all I have to do in order to use the XP Laptop is: dock it, press the power button ( wake it up) and then switch inputs by pressing a button at the front of the monitor .
A couple of gotcha I’ve hit even before getting to the stunnel part:

When testing synergy I’ve hit a problem where if the machine would auto-lock I wouldn’t be able to get to feed a password in to log. ctrl-alt-del had no effect.

examining the logs I’ve noticed the following error:
DEBUG: emulating ctrl+alt+del press
DEBUG: can't open Winlogon desk: 5

to make a long story short it turns out that under windows NT and descendants the screen saver and the login windows are running under a different desktop each,
Here’s what MSDN has to say:

By default, there are three desktops in the interactive window station: Default, ScreenSaver, and Winlogon.

The only way Synergy is able to interact with the Winlogon Desktop is if it runs as a systems service. This means that under windows when configuring Synergy’s Auto Start one has to choose to have “synergy start automatically when the computer does”
. If the synergy server restarts this means for whatever reason that the client will loose access to the Winlogon Desktop.
So there shouldn’t be any problem to login if synergy runs as a system service and the synergy client is not restarted, the ctrl-alt-del combination should just work.
However if the screensaver kicks in I’m unable to dismiss it with synergy driven input device, this is something that I have not yet found a direct solution to.
I did, however found a workaround.

Later I also discovered that the synergy server running on the OSX must run as root.

Another Issue I have seen is what I had to call ‘alt-key at half-duplex’,

I’ve tried to solicit some help from superusers , here is an extract of the longish description:

the alt key does not function as a modifier at all by that I mean: if to use notepad as an example:

If I press alt and release it then the first menu item gets highlighted and I can press ‘f’ and get the file menu (this is a normal behavior). However if I’m holding down alt and then simultaneously press ‘f’ I get the first menu item highlighted. this is not a normal effect of this key combination, with a properly functioning alt key the file menu would have been displayed.

It turns out this is easily solvable by upgrading to the Mac OSX beta version (1.5) of synergy server.

Here is the synergy.conf I’m using on the server:

   section: screens
       mcradles-macbook-pro.local:
       mcradleXP:
    end

     section: aliases
        mcradles-macbook-pro.local:
                computer.home
                192.168.6.2
    end

    section: links
       mcradles-macbook-pro.local:
           right = mcradleXP
       mcradleXP:
           left = mcradles-macbook-pro.local
    end

In the next post I’ll start covering the security aspect and how to tunnel synergy via stunnel.

Posted in foss, open source, OSX, productivity tools, remote, sync, Technology | Tagged , , , , , , | Leave a comment

CoRD And Swapped Mouse Buttons: The Freedom To Be Particular

A few years ago I’ve swapped between my mouse’s right and left buttons n my XP laptop, this was due to a sports injury that I have long recovered from since, I however kept the mouse button swapped.
Even when setting up a new system much more recently I still set it up with the mouse buttons swapped.
partly because the mouse was already to the left of the keyboard, also known as sheer laziness but partly also because I enjoy the feeling of being able to get accustomed to a change,
and partly because I think it makes me use the mouse less and hence forces me to learn more keyboard shortcuts (not that my extensive and expending Emacs usage is not driving me in that direction anyway)

On the other hand at home my MacBook mouse buttons are not reversed.

It all nice and cozy until I try to use CoRD to remotely connect from my MacBook to my XP – then I have to remember to right click when I want to left click and vice versa and it’s very confusing,
I can operate a mouse with the buttons swapped (obviously: I do it every weekday) but it becomes natural to me only if it’s on the left of the keyboard.

So this brought me to a Stallman-esk moment where I had a piece of software that was generally functional but I had to extend it in a quirky way that is unique to me (and presumably of no interest to software owners),
luckily enough CoRD is an open source software and it took me under 30 minutes to hack it such that it will send a left button indication instead of right button and vice versa.

technically I just had to

1. get the source

svn co https://cord.svn.sourceforge.net/svnroot/cord/trunk CoRD

piece of cake.

2. find where the buttons are handled.

sort of piece of cake.
Now since this fix is so unique to me and my bizarre preferences that I don’t expect it to be useful enough to be worth even sending a patch back,
if anything it will clutter up the code or even worse the GUI (not that I expect anybody to actually accept such a fix let alone install it in the GUI),
since all of the above it became clear to me it’s not worthwhile looking for solutions that will fit elegantly in CoRD as a whole therefore I allowed myself to hack a solution: (in the preliminary meaning of the word ‘hacking’)
I just swapped the values of button 1 and button 2 in “constant.h” as follows:

/* mcradle hack: swap buttons to support reversed buttons on xp server

#define MOUSE_FLAG_BUTTON1 0x1000
#define MOUSE_FLAG_BUTTON2 0x2000
*/
#define MOUSE_FLAG_BUTTON1 0x2000
#define MOUSE_FLAG_BUTTON2 0x1000

3. compile and serve over a bed of lettuce

It took me longer to write this post to describe what I’ve done than it took me to actually modify the whole thing,
It made me think about our collective use of technology: as more and more software is involved we are able to tweak technology to
better fit our needs rather than to get used to how the technology was designed to begin with,
and for programmers it make sense to get out of their way to use open source software over proprietary one every time, because they *have the ability* to tweak stuff. (and even learn a trick or two along the way)
I’m still debating my (Philosophical, if you will) position on software and freedom but from pure practical stand point if I would to use a Microsoft close source client I just would have had to live with the way it works, like it or not.

Posted in "software enginerring", CoRD, open source, OSX, productivity tools, remote | Tagged , , , , | Leave a comment