Boost::python str to std::string

As with everything in Boost::Python, it might not always be obvious how to best extract certain types to C++

One thing to remember is that most (if not all) C++ types are supported by boost::python::extract

So, to extract a std::string object from python, it’s best to use the extract function again:

std::string tmp;
tmp = boost::python::extract<std::string>("test");
std::cout << tmp << std::endl;

Assuming the python file:

# test.py
test = "This is a test script"

The C++ code above should print This is a test script

flash shape to svg

Intro

I have made a script to convert a flash shape to svg (the easy way).

I hope it converts all flash shapes, because the shape I needed it for only contained “moveTo“, “splineTo” and “lineTo” functions.

Requirement

This script requires a dump from swftoolsswfdump:

swfdump -D test.swf > test.dump

The script

# Copyright 2011 Mathias Teugels. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification, are
# permitted provided that the following conditions are met:
#
#   1. Redistributions of source code must retain the above copyright notice, this list of
#      conditions and the following disclaimer.
#
#   2. Redistributions in binary form must reproduce the above copyright notice, this list
#      of conditions and the following disclaimer in the documentation and/or other materials
#      provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY MATHIAS TEUGELS ``AS IS'' AND ANY EXPRESS OR IMPLIED
# WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
# FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL  OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
# ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
# ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
# The views and conclusions contained in the software and documentation are those of the
# authors and should not be interpreted as representing official policies, either expressed
# or implied, of Mathias Teugels.

import sys

f = open(sys.argv[1])
l = f.readlines()
f.close()

res = """<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg xmlns:svg="http://www.w3.org/2000/svg" xmlns="http://www.w3.org/2000/svg" version="1.1" id="svg2">"""

res += '<path d="'

for line in l:
    line = line.replace('n', '').replace(')', '').replace('(', '')

    analyse = line.split(' ')[-5:]

    if analyse[0] == 'splineTo':
        res += "Q%s,%s %s,%s " % (analyse[1],analyse[2],analyse[3],analyse[4])
    else:
        analyse = analyse[2:]

        if analyse[0] == 'moveTo':
            res += "M%s,%s " % (analyse[1], analyse[2])

        elif analyse[0] == 'lineTo':
            res += "L%s,%s " % (analyse[1],analyse[2])

res += 'Z" style="fill:#000000"/>'
res += "</svg>"
print res

Usage

python shape2svg.py /path/to/your/dump > /path/to/your/svg

The svg will contain one element with a path that should create the same shape.

Feedback

If any shapes do not convert properly, I’d love to hear about it: Talk to me at identi.ca/cpf

boost::python iterating over dictionaries

When working with boost::python, it’s quickly noticeable the lack of documentation involved in integrating Python in a C++ application, however using boost::python is a lot easier then using the normal Python/C API so it’s recommended to whomever considers using Python in a C++ application.

Tricky

The downside of it is the lack of examples, and some things not working as expected (perhaps as documented, but I couldn’t find a decent documentation involving Python-in-C++)

So that lack of examples makes the whole ordeal very tedious and error-prone, as with the implementation I’m about to show.

The expected

When programming in C++ trying to access Python variables, one has access to several boost-wrappers to represent most python variable types, one of which is a dictionary (boost::python::dict)

test.py:
d = {
  'key': 'value',
  'another': 'yeah'
}

That’s the easy stuff, now, accessing it from C++:

test.cpp:
#include <boost/python.hpp>
#include <iostream>

int main()
{
  // Initialize Python
  Py_Initialize();

  // Get the main namespace/module up-and-running
  boost::python::object main_module = boost::python::import("__main__");
  boost::python::object main_namespace = main_module.attr("__dict__");

  // Run the python executable, in this case "test.py"
  boost::python::exec_file(boost::python::str("test.py"), main_namespace);

  // Get the 'd' variable
  // This method could be shortened by using boost::python::object d = main_namespace["d"]
  // However, using extract is more general, and could be used in templated functions (hint hint)
  boost::python::object d = boost::python::extract<boost::python::object>(main_namespace["d"]);

  // Get the dictionary, because just the "object" isn't enough to access
  // Note that the previous part can actually be skipped, but it's included for sake of brevity
  boost::python::dict d_dict = boost::python::extract<boost::python::dict>(d);

  // Now, we're going to try and "print" everything involved
  boost::python::list iterkeys = (boost::python::list)d_dict.iterkeys();
  for (int i = 0; i < boost::python::len(iterkeys); i++)
  {
    // Because we know they're strings, we can do this
    std::string key = boost::python::extract<std::string>(iterkeys[i]);
    std::string value = boost::python::extract<std::string>(d_dict[iterkeys[i]]);

    std::cout << "Key: " << key << std::endl;
    std::cout << "Value: " << value << std::endl << std::endl;
  }
}

Compile using g++ -lboost_python -lpython2.6 -I/usr/include/python2.6 test.cpp

Executing this should give:

12216 cpf@core ~/junk/cpp % ./a.out
Key: another
Value: yeah

Key: key
Value: value

The unexpected

Now, change test.py to represent:

test.py:
d = {
 'key': {
 'another': 'yeah'
 }
}

And once again execute the file, notice the error (terminate called after throwing an instance of ‘boost::python::error_already_set’), this error is the one and only error boost::python has!

To figure out what went wrong, change test.cpp like this:

// Because we know they're strings, we can do this
std::string key = "";
std::string value = "";
try
{
  key = boost::python::extract<std::string>(iterkeys[i]);
  value = boost::python::extract<std::string>(d_dict[iterkeys[i]]);
} catch (boost::python::error_already_set const &)
{
  PyErr_Print();
}

It should give you the error: TypeError: No registered converter was able to produce a C++ rvalue of type std::string from this Python object of type dict

Using boost::python::dict for value all the way fixes this.

Another word

During testing of a program I was writing to include Python in it and read a (lot) of variables, I had the weird occasion in which a dict only had 1 key/value pair, and using iterkeys[i] would crash saying TypeError: No to_python (by-value) converter found for C++ type: boost::python::api::proxy<boost::python::api::item_policies>

The solution is to use iterkeys.pop(0) instead of iterkeys[i]

HOWEVER: BE CAREFUL DOING THIS, THIS WILL CHANGE THE ITERKEYS ON-THE-FLY!

Solr: Indexing speed 1111 docs/sec

On popular demand, I’m going to try and describe what I did to get the 1111 docs/sec indexing (link) on a 1.200.000 documents index. Please don’t be surprised to how little I have to do with the entire story, and how much is thanks to Solr’s great code.

The Machine

First and foremost, small part about the machine I conducted the testing on. Hostnamed ‘searcht’ (for convenience). These values have been collected as according to how much I could figure out through ssh. I suppose they’re all that’s relevant:

CPU: GenuineIntel E7330 @ 2.40GHz
RAM: 4GB (Don’t know how to figure out DDR2/3, memory speed, etc…)
Running Redhat linux 5

The script

For indexing I used the post.sh script, delivered standard with Solr 1.4, but slightly modified:

FILES=$*
URL=http://localhost:8999/solr/update

for f in $FILES; do
 echo Posting file $f to $URL `date`
 curl $URL -F stream.file=$f #--data-binary @$f -H 'Content-type:text/xml; charset=utf-8' 
 echo
done

#send the commit command to make sure all the changes are flushed and visible
curl $URL --data-binary '<commit/>' -H 'Content-type:text/xml; charset=utf-8'
echo

The numbers

Indexing an 1.205.164 docs index on searcht: (Values are standard values of the time command, and the QTime as returned by Solr)

All tests are appended by a manual

curl http://localhost:8999/solr/update --data-binary -H 'Content-type:text/xml;charset=utf-8'
Solr running on 1024M RAM
time ./post.sh /vol/indexes/testing/rug01.20100324.harvest.proc
 RealĀ  18m26.515s
 User   0m 0.015s
 Sys    0m 0.045s
 Qtime: 1 105 253 (ADD)
              830 (COMMIT)
           47 787 (OPTIMIZE)
Solr running on 1024M RAM
time ./post.sh /vol/indexes/testing/rug01.20100324.harvest.proc
 Real  18m53.599s
 User   0m 0.022s
 Sys    0m 0.037s
 Qtime: 1 132 498 (ADD)
              752 (COMMIT)
           48 646 (OPTIMIZE)
Solr running on 2048M RAM
time ./post.sh /vol/indexes/testing/rug01.20100324.harvest.proc
 Real  18m44.525s
 User   0m 0.013s
 Sys    0m 0.015s
 Qtime: 1 123 135 (ADD)
              849 (COMMIT)
           48 493 (OPTIMIZE)

I tried the same with the 2048M RAM to try and speed up the indexing, however the difference is very small and / or not even proven in the tests. The HDD speed might actually have more of a bottleneck than the memory usage.

I hope this described what people wished to know. Any question will be answered as soon as possible and development still continues (Next stop will probably be using EmbeddedSolrServer for indexing & Complete parsing)

Solr: Finding out all values in a field

Solr and Lucene are truly amazing things, capable of fast indexing and querying vast amounts of data.

However, when coming from a conventional database structure, it’s quite hard getting to the thinking pattern Lucene uses vs SQL (e.g.)

SQL: select fieldname from database

The equivalent of select fieldname from database as known in SQL databases, is one of those fun ones. One would think it would translate simply to something like:

q=fieldname%3A*&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard

Basically you’re querying for any value in the fieldname. However, Lucene/Solr doesn’t support this!

q=*%3A*&start=0&rows=10&fl=$%2Cscore&qt=standard&wt=standard&facet=true&facet.field=fieldname

Yeah, just facetting on the field actually gives you all the possibilities & the count of items in that category.

Howto: Hide flash

When using flash and lightbox (and lookalikes) it’s a common issue that the flash movie which should be under the lightbox pops out of there (often in weird places and not 100%)

So, an elegant way to solve this in Javascript would be to have the flash moved or disappeared.

Hide it (display none)

Using simple javascript one can hide the flash (or the div where the flash resides in) to have it re-displayed when necessary.

document.getElementById('div_flash').style.display = 'none' // hide
document.getElementById('div_flash').style.display = 'block' // show

And the same but in jQuery:

$('#div_flash').css('display', 'none') // hide
$('#div_flash').css('display', 'block') // show

However, in certain cases of movies, this will reload the movie (like what I encountered using the amCharts flash movie). Using this method to hide / show the movie is not prefered then.

Move it (margin-top or similar)

The other method, which prevents reloading the flash movie for certain cases, would be to move the flash movie.

It’s best to take in account whether or not it’s ok to have a scroll appear. Mostly this is not wanted so I’ll just use negative margins that seems to prevent the scroll to appear (at least in firefox):

document.getElementById('div_flash').style.marginTop = '-99999px' // hide
document.getElementById('div_flash').style.marginTop = '0' // show

And the jQuery equivalent:

$('#div_flash').css('margin-top', '-99999px') // hide
$('#div_flash').css('margin-top', '0') // show

It’s not required to work with margin-top, but it’s best to work with either margin-top or margin-left, because margin-bottom and margin-right (with negative values) will probably generate a scrollbar.

Internship: An introduction

My internship: Global search @ ugent.be

Lagging a bit behind, I’m going to describe my internship.

In a nutshell, my internship is about search, a whole lot of search. Since the portal site of the university at Ghent (http://www.ugent.be) is moving to and running Plone as its main CMS and a whole lot of data is going into this portal site, people need to be able to search all that data.

Solr logoSo, the challenge is to implement a search system taking care of as much as possible, preferably in an open source environment! The idea was to use Apache Solr as main search engine, which uses Lucene. The fun stuff about this is that we stay within the field of open source, and still provide a strong search engine.

That being said, I’m currently working towards understanding Solr, since I have no experience what-so-ever with Solr (and only very little experience in Plone) this is going to challenge me, and I like challenges!

So, now you know what the assignment is, let’s break it down into steps. Because we’re in a western world and we enjoy the clear path of steps. Obviously that’s needed, since just starting it head-on will end in failure.

First: Learn Solr

First I’m going to discover as much as possible about Solr. It so happens that the university’s library has an implementation of Solr running for their search. I’ve also been given the book Solr 1.4 Enterprise Search Server by David Smiley and Eric Pugh to study. Using that book I hope to get a clear vision of Solr.

Next: Case study

As mentioned in the first step, the university’s library is running a Solr implementation. It just so happens that implementation is currently barely documented. We proposed to do this for them, since they’ve been (and still will be probably) a great help in getting us up to speed with Solr.

I’m basically going to put their techniques in documentation, case study style. That way I have seen a Solr implementation running (all parts of it) and I’ll have a better vision on how to implement it on the scale we want.

After that: Plone + Solr

When that’s done, the goal is to have a Plone plugin to integrate Solr as easy as possible into any Plone setup. Since there’s been found some evidence that people in the Plone community have already started such effort (or once started it) it’s possible that we will communicate with them to finish the module.

Extra extra

Every internship has its main goals and some extra, in case the main goals were achieved too early (or just in time to do more work), I have some extras too.

As extra we have the expansion of the global search. Since the portal site of UGent is not all the content hosted by the university, and we want as much as possible searchable, it’s possible we look for ways to have the external UGent websites searchable too. Possibly by crawler (Nutch maybe)

Internship category

Since it’s best to blog my improvements (and findings, if any) and possibly help people along the way, I created this small subsection on my site. For people interested in Solr and / or plone (possibly the improvement or development of a plone module for Solr, although it seems such effort is already on the way) I hope to help as much as possible!

[howto] Install bazaar explorer in debian

Since I’m quite the python fan, and also indirectly a fan of bazaar, I’m really liking the bazaar explorer. Especially on windows. Since I wanted the same application on linux, tried to install the explorer.

Somehow, somewhere something went wrong. Obviously not a good thing. After installing the bazaar explorer (first through downloading the tarball here, secondly by executing “bzr lp:bzr-explorer explorer” in the ~/.bazaar/plugins folder) I stumbled upon an error:

Unable to load plugin 'explorer' from '/usr/lib/python2.5/site-packages/bzrlib/plugins'

and later the same from my ~/.bazaar/plugins folder…

The solution

Actually deceivingly simple, the source installation page from bazaar explorer said:

If this fails to start, ensure that you have compatible versions of dependent products installed, namely: QBzr, PyQt, Qt and bzr.

Making sure you’ve got these requirements fixes the whole ordeal.

Qbzr on debian

My solution was simply to install qbzr on debian, since I already had bzr, qt and pyqt installed.

To install qbzr on debian, one needs to add the following to /etc/apt/sources.list (following these guidelines):

deb http://ppa.launchpad.net/qbzr-dev/ppa/ubuntu jaunty main

Updating apt and installing qbzr (as root of course, or sudo):

# aptitude update
# aptitude install qbzr

Now, enjoy your bazaar explorer!
2009-10-26-105132_1280x780_scrot
PS: when you have the error on the key, here‘s an explanation on how to add it to apt

Haiku OS: Alpha 1 released

2009-09-14-150703_1280x800_scrot

The Haiku OS project site has published the first version ISO so I immediatelly went to the download page of the Haiku OS site, since I’m a big pro of the haiku idea. Downloading the iso (with sourceforge mirror) only took 10 minutes, but it isn’t nice of them to zip it (like an iso in itself isn’t good enough)

Preparing Virtualbox with pretty much everything standard (for an OS not known by Virtualbox that is):

The iso mounted in Virtualbox really did a nice bootup time, installer could use some work, especially since exiting (square box left-above) the installer / setup actually makes the total system do nothing anymore (it should reboot, or failsafe to desktop imho). Total setup time seems to have come down to 10 minutes, extremely fast!

In the end, there’s got to be large credits for the installer being so straight-forward. In the end, if one reads the installer instructions as they pop-up in the process, one can install with ease. Most files seemed to be html files, which is quite funny ^^

Also, having an OS with 16835 files is seriously nice. Don’t really know if I should say this, but at the moment, it’s the smallest OS I’ve got in my entire Virtualbox setup. And I will most certainly play around with this one a _lot_ considering I hope it will become what I always wished for in the Desktop Linux experience.

What is also quite amazing imho is the fact that python, perl, and more are installed by default. Because I believe scripting will become more and more common in the real world, this can go the correct way!
Having played around with this some, taking some (loaded) screenshots, etc… I think Haiku OS has gone a long way, but still has very long to go. Positive points (for now):

  • Extremely fast boot-time, and a no-nonsense approach to the whole deal
  • Having the BeOS ideas and not trying to re-invent things seems to have effect

Negative points (points to work on):

  • alt-tab feature, or something more obvious than the Deskbar to switch between applications

These points and conclusions are what I saw on first-impression. And I like what I see at the moment! Although the project has a long way to go still, it seems to gain momentum every time I try it, and I hope to start developing for it soon myself (although I prefer developing for servers :P )

2009-09-14-151101_1280x800_scrot

Share this post, help Haiku gain even more momentum, try things out and comment here (or directly on the haiku site)