<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Lorenzo Bolla</title>
    <link href="http://lbolla.info/blog/feed" rel="self" />
    <link href="http://lbolla.info/blog/" />
    <updated>2013-03-21T15:57:12+00:00</updated>
    <id>http://lbolla.info/blog/</id>
    <entry>
        <title type="html"><![CDATA[Simple chat with Postgres LISTEN/NOTIFY and Tornado&#39;s IOLoop]]></title>
        <author><name>lbolla</name></author>
        <link href="http://lbolla.info/blog/2013/03/21/chat-postgres-ioloop"/>
        <published>2013-03-21T16:00:00+00:00</published>
        <updated>2013-03-21T15:57:12+00:00</updated>
        <id>http://lbolla.info/blog/2013/03/21/chat-postgres-ioloop</id>
        <category scheme="http://lbolla.info/blog/tag/#tornado" term="tornado" label="tornado" />
        <category scheme="http://lbolla.info/blog/tag/#postgres" term="postgres" label="postgres" />
        <category scheme="http://lbolla.info/blog/tag/#python" term="python" label="python" />
        <content type="html" xml:base="http://lbolla.info/" xml:lang="en">
            <![CDATA[ <p>Postgres supports from version 8.4 a very interesting functionality:
<a href="http://www.postgresql.org/docs/9.2/static/sql-listen.html">LISTEN</a>/<a href="http://www.postgresql.org/docs/9.2/static/sql-notify.html">NOTIFY</a> allows sending asynchronous messages to clients
connected to the database.</p>
<p>As in a normal &ldquo;chat&rdquo;, a client &ldquo;subscribed&rdquo; (<code>LISTEN</code>) to a channel receives
all the messages that other clients &ldquo;sent&rdquo; (<code>NOTIFY</code>) on that channel.</p>
<p>Since version 9.0, a notification message can have a payload string as long as
8000 bytes.</p>
<p>In order to experiment with this feature, I&#39;ve implemented a simple chat based
on Tornado&#39;s <a href="http://www.tornadoweb.org/en/stable/ioloop.html">IOLoop</a>. Each client subscribes to a channel (or &ldquo;room&rdquo; in
chat jargon) and listens to it <a href="http://www.tornadoweb.org/en/stable/ioloop.html#tornado.ioloop.IOLoop.add_handler">adding a callback</a> to react to a new
notification. In the meantime, in another thread, the client is free to write
and submit messages to the &ldquo;room&rdquo;. Here is a screenshot of the chat in action:</p>
<p><img src="/blog/img/chat.png" alt="Chatting at London Chess Candidates 2013" title="Chatting example"/></p>
<p>This is the code, available also on <a href="https://gist.github.com/lbolla/5213919">gist</a>:</p>

<script src="https://gist.github.com/lbolla/5213919.js"></script>
]]>
        </content>
    </entry><entry>
        <title type="html"><![CDATA[Use Postgres advanced types in Python]]></title>
        <author><name>lbolla</name></author>
        <link href="http://lbolla.info/blog/2013/03/06/custom-types-postgres-in-python"/>
        <published>2013-03-06T12:00:00+00:00</published>
        <updated>2013-03-06T12:36:03+00:00</updated>
        <id>http://lbolla.info/blog/2013/03/06/custom-types-postgres-in-python</id>
        <category scheme="http://lbolla.info/blog/tag/#postgres" term="postgres" label="postgres" />
        <category scheme="http://lbolla.info/blog/tag/#sqlalchemy" term="sqlalchemy" label="sqlalchemy" />
        <category scheme="http://lbolla.info/blog/tag/#python" term="python" label="python" />
        <category scheme="http://lbolla.info/blog/tag/#psycopg2" term="psycopg2" label="psycopg2" />
        <content type="html" xml:base="http://lbolla.info/" xml:lang="en">
            <![CDATA[ <p>Postgres has a lot of useful <a href="http://www.postgresql.org/docs/9.2/static/datatype.html">builtin data types</a>, but only some of them are
mapped to Python types when accessing the DB using <a href="http://initd.org/psycopg/">psycopg2</a>.</p>
<p>Extending the support to other types is not straightforward, and involves the
following steps:</p>

<ul>
<li>Create a Python class to store the data, e.g. <code>class Point</code></li>
<li>Write a function to convert a <code>Point</code> to its SQL string representation,
 e.g. <code>adapt_point</code></li>
<li>Write the inverse function to parse the SQL string representation of a
 <code>Point</code> and return and instance of a <code>Point</code>, e.g. <code>cast_point</code></li>
<li>Finally bind all these functions and types, see <code>register_point_type</code></li>
</ul>
<p>The complete code is as follows, also available <a href="https://gist.github.com/lbolla/5098907">as a gist</a>:</p>

<script src="https://gist.github.com/lbolla/5098907.js"></script>
]]>
        </content>
    </entry><entry>
        <title type="html"><![CDATA[Tornado Redis chat]]></title>
        <author><name>lbolla</name></author>
        <link href="http://lbolla.info/blog/2013/02/12/tornado-redis-chat"/>
        <published>2013-02-12T11:00:00+00:00</published>
        <updated>2013-03-06T12:31:31+00:00</updated>
        <id>http://lbolla.info/blog/2013/02/12/tornado-redis-chat</id>
        <category scheme="http://lbolla.info/blog/tag/#tornado" term="tornado" label="tornado" />
        <category scheme="http://lbolla.info/blog/tag/#redis" term="redis" label="redis" />
        <category scheme="http://lbolla.info/blog/tag/#websocket" term="websocket" label="websocket" />
        <content type="html" xml:base="http://lbolla.info/" xml:lang="en">
            <![CDATA[ <p><a href="http://redis.io/">redis</a> is often described as an &ldquo;in-memory persistent key-value store&rdquo;, but
<a href="http://openmymind.net/2012/1/23/The-Little-Redis-Book/">it&#39;s much more than that</a>. One of its nicest features is its support for
the <a href="http://redis.io/topics/pubsub"><code>Publish/Subscribe messaging paradigm</code></a>, which makes it easy to
implement, for example, a chat server.</p>
<p>In order to learn how to use it, I decided to implement a chat server using
Redis and Tornado. This is a classical exercise, and <a href="https://gist.github.com/pelletier/532067">others</a> have done the
same: but their solution has some pitfalls that I tried to fix.</p>
<p>The code is forked from <a href="https://gist.github.com/pelletier/532067">pelletier</a>&#39;s, with some improvements:</p>

<ul>
<li>Support for the latest Python Redis&#39;s client <a href="http://redis-py.readthedocs.org/en/latest/">redis-py</a> version 2.6.9</li>
<li>Thread-safety: using the only method in Tornado&#39;s IOLoop that is
<a href="http://www.tornadoweb.org/documentation/ioloop.html?highlight=add_callback#tornado.ioloop.IOLoop.add_callback">thread-safe</a></li>
<li>Tested with Python 3.3</li>
</ul>
<p>This is the code, available also on <a href="https://gist.github.com/lbolla/4754600">gist</a>:</p>

<script src="https://gist.github.com/lbolla/4754600.js"></script>
]]>
        </content>
    </entry><entry>
        <title type="html"><![CDATA[Blocking tasks in Tornado]]></title>
        <author><name>lbolla</name></author>
        <link href="http://lbolla.info/blog/2013/01/22/blocking-tornado"/>
        <published>2013-01-22T14:33:56+00:00</published>
        <updated>2013-01-24T09:33:50+00:00</updated>
        <id>http://lbolla.info/blog/2013/01/22/blocking-tornado</id>
        <category scheme="http://lbolla.info/blog/tag/#tornado" term="tornado" label="tornado" />
        <category scheme="http://lbolla.info/blog/tag/#futures" term="futures" label="futures" />
        <category scheme="http://lbolla.info/blog/tag/#python" term="python" label="python" />
        <content type="html" xml:base="http://lbolla.info/" xml:lang="en">
            <![CDATA[ <p>Every now and then a <a href="https://groups.google.com/d/topic/python-tornado/NVA5sTFIlPo/discussion">new discussion is raised on Tornado&#39;s mailling list about what is the best way to execute blocking tasks</a>. It turns out that there are 3 feasible options, in order of increasing complexity:</p>

<ul>
<li><em>Optimize blocking calls</em>. Often, a slow DB query, or an overly complicate template are the blocking bottleneck. Rather than complicating the webserver, the first thing to try is to speed them up. This is sufficient 99% of the time.</li>
<li><em>Execute the slow task in a separate thread or process</em>. This means off-loading the task to a different thread (or process) to the one running the <code>IOLoop</code>, which is then free to accept other requests.</li>
<li><em>Use an asynchronous driver/library to run the task</em>. For example, something like <a href="http://www.gevent.org/">gevent</a>, <a href="http://emptysquare.net/blog/introducing-motor-an-asynchronous-mongodb-driver-for-python-and-tornado/">motor</a> and the like.</li>
</ul>
<p>This blog post is about the second option, in particular using Python&#39;s <a href="http://docs.python.org/3/library/concurrent.futures.html#module-concurrent.futures"><code>concurrent.futures</code></a> package.</p>
<p>For example, consider this simple web server, with a blocking &ldquo;SleepHandler&rdquo; handler:</p>

<pre><code>import time

import tornado.ioloop
import tornado.web


class MainHandler(tornado.web.RequestHandler):

    def get(self):
        self.write(&quot;Hello, world %s&quot; % time.time())


class SleepHandler(tornado.web.RequestHandler):

    def get(self, n):
        time.sleep(float(n))
        self.write(&quot;Awake! %s&quot; % time.time())


application = tornado.web.Application([
    (r&quot;/&quot;, MainHandler),
    (r&quot;/sleep/(\d+)&quot;, SleepHandler),
])


if __name__ == &quot;__main__&quot;:
    application.listen(8888)
    tornado.ioloop.IOLoop.instance().start()</code></pre>
<p>Try to visit <code>http://localhost:8888/sleep/10</code> in one tab and <code>http://localhost:8888/</code> in another: you&#39;ll see that &ldquo;Hello, world&rdquo; is not printed in the second tab until the first one has finished, after 10 seconds. Effectively, the first call is blocking the IOLoop, who cannot serve the second tab.</p>
<p>You can make the &ldquo;SleepHandler&rdquo; Tornado-friendly by executing it in another thread. Below is a decorator that can be used to &ldquo;unblock&rdquo; it: </p>

<pre><code>from concurrent.futures import ThreadPoolExecutor
from functools import partial, wraps

import tornado.ioloop
import tornado.web


EXECUTOR = ThreadPoolExecutor(max_workers=4)


def unblock(f):

    @tornado.web.asynchronous
    @wraps(f)
    def wrapper(*args, **kwargs):
        self = args[0]

        def callback(future):
            self.write(future.result())
            self.finish()

        EXECUTOR.submit(
            partial(f, *args, **kwargs)
        ).add_done_callback(
            lambda future: tornado.ioloop.IOLoop.instance().add_callback(
                partial(callback, future)))

    return wrapper


class SleepHandler(tornado.web.RequestHandler):

    @unblock
    def get(self, n):
        time.sleep(float(n))
        return &quot;Awake! %s&quot; % time.time()</code></pre>
<p>Very simply, the <code>unblock</code> decorator submits the decorated function to the thread pool, which returns a future; a callback is added to this future to return control to the IOLoop, by calling <code>add_callback</code>, which eventually will call <code>self.finish</code> and conclude the request.</p>
<p>Note that the decorated function must be itself be decorated with <code>tornado.web.asynchronous</code>, in order to not call <code>self.finish</code> too soon! Moreover, <code>self.write</code> is not thread-safe (thanks mrjoes!) therefore it must be called in the main thread with the future&#39;s result as parameter.</p>
<p>Full code is below, available on <a href="https://gist.github.com/4594879">gist</a>.</p>

<script src="https://gist.github.com/4594879.js"></script>
]]>
        </content>
    </entry><entry>
        <title type="html"><![CDATA[UnshareMe]]></title>
        <author><name>lbolla</name></author>
        <link href="http://lbolla.info/blog/2012/12/12/unshareme"/>
        <published>2012-12-12T11:17:03+00:00</published>
        <updated>2012-12-12T11:49:04+00:00</updated>
        <id>http://lbolla.info/blog/2012/12/12/unshareme</id>
        <category scheme="http://lbolla.info/blog/tag/#go" term="go" label="go" />
        <category scheme="http://lbolla.info/blog/tag/#web" term="web" label="web" />
        <category scheme="http://lbolla.info/blog/tag/#hash" term="hash" label="hash" />
        <content type="html" xml:base="http://lbolla.info/" xml:lang="en">
            <![CDATA[ <p>Last my weekend project was to write something similar to <a href="http://wehaslinks.com/">WeHasLinks</a>. In
fact, <a href="http://wehaslinks.com/">WeHasLinks</a> is a file sharing website, but I misread it as
&ldquo;We-Hash-Links&rdquo; and the funny thing is that they <em>indeed</em> hash their links (for
obvious reasons&hellip;). Anyway, <a href="http://wehaslinks.com/">WeHasLinks</a>&#39;s links are hashed so that only
the user who visited the page is allowed to them.</p>
<p>I liked the idea very much, and I decided to implement it in <a href="http://golang.org">go</a>, as an
exercise!  You can find the code <a href="https://github.com/lbolla/unshareme">on github</a>. A demo is available at
<a href="http://unshareme.lbolla.info/">unshareme.lbolla.info</a>.</p>
<p>The links are encrypted using <a href="http://en.wikipedia.org/wiki/Advanced_Encryption_Standard">AES-256</a> and validated using <a href="http://en.wikipedia.org/wiki/Hash-based_message_authentication_code">HMAC</a>, which
is the standard way to encrypt secure cookies in web apps. In fact,
<a href="http://www.gorillatoolkit.org/pkg/securecookie">gorilla</a> provides a library to do just that. The code looks pretty much
like this:</p>

<pre><code>var hashKey = securecookie.GenerateRandomKey(32)
var blockKey = securecookie.GenerateRandomKey(32)
var encodeName = &quot;encodeName&quot;
var sc = securecookie.New(hashKey, blockKey)
...

func encode(msg PersonalURL) (string, error) {
    enc, err := sc.Encode(encodeName, msg)
    ...</code></pre>
<p>&ldquo;Personalization&rdquo; of links is done coupling each link with the remote IP
visiting the page.</p>

<pre><code>// Store URI and IP together
type PersonalURL struct {
    URI string
    IP string
}</code></pre>
<p>When visited, the web app will decode the link, verify that the remote IP
visiting it is the same as the IP who requested the links in the first place
and redirect to the <em>real</em> url. Otherwise, a 400 will be raised.</p>
<p><em>Per se</em>, the app is very simple but I learnt a lot about <a href="http://golang.org">go</a> while
implementing itt: in particular, that in term of speed of development it&#39;s very
close to a scripting language <a href="http://golang.org">go</a>&#39;s standard library is amazing and
<a href="http://www.gorillatoolkit.org/">gorilla</a> is a very nice complement for web apps.</p>
<p>One thing I didn&#39;t like, is how <a href="http://golang.org/pkg/html/template/">templates</a> are handled: it&#39;s overly
complicated to specify a relative path for the templates directory and
templates are not compiled into the source code automatically. The easiest
solution I found was to specify the path on the command line. In this case,
[10][yesod has a better solution].</p>
<p>Full code, for reference:</p>

<pre><code>package main

import (
    &quot;encoding/base64&quot;
    &quot;flag&quot;
    &quot;fmt&quot;
    &quot;github.com/gorilla/securecookie&quot;
    &quot;github.com/gorilla/mux&quot;
    &quot;html/template&quot;
    &quot;log&quot;
    &quot;net/http&quot;
    &quot;net/url&quot;
    &quot;path/filepath&quot;
    &quot;strings&quot;
)

// Random stuff for encoding
var hashKey = securecookie.GenerateRandomKey(32)
var blockKey = securecookie.GenerateRandomKey(32)
var encodeName = &quot;encodeName&quot;
var sc = securecookie.New(hashKey, blockKey)

// Router for handlers
var router = mux.NewRouter()

// Store URI and IP together
type PersonalURL struct {
    URI string
    IP string
}

// Flags
var templates_path = flag.String(&quot;t&quot;, &quot;src/unshareme/tmpl/&quot;, &quot;Path to the templates&quot;)
var templates = template.New(&quot;&quot;)

func encode(msg PersonalURL) (string, error) {
    enc, err := sc.Encode(encodeName, msg)
    if err != nil {
        return &quot;&quot;, err
    }

    b64enc := base64.URLEncoding.EncodeToString([]byte(enc))

    return b64enc, nil
}

func decode(enc string) (msg PersonalURL, err error) {
    b64enc, err := base64.URLEncoding.DecodeString(enc)
    if err != nil {
        return
    }

    err = sc.Decode(encodeName, string(b64enc), &amp;msg)
    if err != nil {
        return
    }

    return
}

// Only works for IPv4, like 127.0.0.1:12345, not IPv6 like [::1]:12345
func remoteIP(r *http.Request) string {
    // Get it from headers, as set by nginx
    ip := r.Header.Get(&quot;X-Real-IP&quot;)
    if ip == &quot;&quot; {
        // Strips port number
        ip = strings.Split(r.RemoteAddr, &quot;:&quot;)[0]
    }
//         log.Print(&quot;IP:&quot;, ip)
    return ip
}

func MainHandler(w http.ResponseWriter, r *http.Request) {
    err := templates.ExecuteTemplate(w, &quot;index.html&quot;, nil)
    if err != nil {
        http.Error(w, err.Error(), http.StatusInternalServerError)
    }
}

func EncodeHandler(w http.ResponseWriter, r *http.Request) {
    u, err := url.Parse(r.URL.Query().Get(&quot;u&quot;))
    if err != nil {
        log.Print(err.Error())
        http.Error(w, &quot;&quot;, http.StatusBadRequest)
        return
    }

    if u.Scheme == &quot;&quot; {
        http.Error(w, &quot;Invalid scheme&quot;, http.StatusBadRequest)
        return
    }

    msg := PersonalURL{URI: u.String(), IP: remoteIP(r)}
    enc, err := encode(msg)
    if err != nil {
        log.Print(err.Error())
        http.Error(w, &quot;&quot;, http.StatusBadRequest)
        return
    }

    link, _ := router.Get(&quot;Decode&quot;).URL(&quot;enc&quot;, enc)
    fmt.Fprint(w, link.String())
}

func DecodeHandler(w http.ResponseWriter, r *http.Request) {
    vars := mux.Vars(r)
    dec, err := decode(vars[&quot;enc&quot;])
    if err != nil {
        log.Print(err.Error())
        http.Error(w, &quot;&quot;, http.StatusBadRequest)
        return
    }

    if rip := remoteIP(r); dec.IP != rip {
        log.Print(dec.IP, rip)
        http.Error(w, &quot;&quot;, http.StatusBadRequest)
        return
    }

    http.Redirect(w, r, dec.URI, http.StatusFound)
    return
}

func main() {
    flag.Parse()
    templates = template.Must(template.ParseFiles(filepath.Join(*templates_path, &quot;index.html&quot;)))
    router.Handle(&quot;/favicon.ico&quot;, http.NotFoundHandler())
    router.HandleFunc(&quot;/&quot;, MainHandler).Methods(&quot;GET&quot;)
    router.HandleFunc(&quot;/enc&quot;, EncodeHandler).Methods(&quot;GET&quot;)
    router.HandleFunc(&quot;/dec/{enc}&quot;, DecodeHandler).Methods(&quot;GET&quot;).Name(&quot;Decode&quot;)
    http.Handle(&quot;/&quot;, router)
    log.Fatal(http.ListenAndServe(&quot;:7001&quot;, nil))
}</code></pre>
]]>
        </content>
    </entry><entry>
        <title type="html"><![CDATA[Useful scripts - Watch]]></title>
        <author><name>lbolla</name></author>
        <link href="http://lbolla.info/blog/2012/11/30/useful-scripts-watch"/>
        <published>2012-11-30T16:20:00+00:00</published>
        <updated>2012-11-30T14:07:20+00:00</updated>
        <id>http://lbolla.info/blog/2012/11/30/useful-scripts-watch</id>
        <category scheme="http://lbolla.info/blog/tag/#acme" term="acme" label="acme" />
        <category scheme="http://lbolla.info/blog/tag/#useful-scripts" term="useful-scripts" label="useful-scripts" />
        <category scheme="http://lbolla.info/blog/tag/#python" term="python" label="python" />
        <content type="html" xml:base="http://lbolla.info/" xml:lang="en">
            <![CDATA[ <p>This is the fifth post of a <a href="/blog/tag/useful-scripts/">series</a> describing simple scripts that I wrote
to ease my life as a programmer.</p>
<p>They are <a href="https://github.com/lbolla/cmd">available on github</a>: fork &amp; hack at will!</p>
<p><a href="https://github.com/lbolla/cmd/blob/master/Watch"><code>Watch</code></a> reacts to changes in a directory executing a command provided by the
user. It can be used, for example, to monitor a directory and run some
unittests as soon as files in it change. This is exactly how I am using <code>Watch</code>
in <a href="http://acme.cat-v.org/"><code>acme</code></a>.</p>
<p><img src="/blog/img/watch_acme.png" alt="Watch in acme"/></p>
<p><code>Watch</code> is based on the <a href="https://github.com/seb-m/pyinotify"><code>pyinotify</code></a> library, a very slim, one file library
that I included <a href="https://github.com/lbolla/cmd">my repo</a> for simplicity. Basically, <code>pyinotify</code> relies on
<code>inotify</code>, an event-driven notifier merged in the Linux kernel since version
2.6.13: given a directory to watch, it raises events that users can process
defining handlers in the <code>ProcessEvent</code> class.</p>
<p>One note is that <code>Watch</code> refuses to run its command more often that once every
3 seconds. This is to avoid that multiple events raised on the same directory
too quickly queue up too many processes.</p>
<p>Here is the code:</p>

<pre><code>#!/usr/bin/env python

# Watch for modified files in localdir (.) and react.
# ./Watch &lt;cmd&gt;
# i.e.: ./Watch flake8 .

from pylib.pyinotify import WatchManager, EventsCodes, ProcessEvent, Notifier
from subprocess import call
import sys
import time


class ProcessManager(ProcessEvent):

    LAST_TIME = None

    def __init__(self, cmds):
        super(ProcessEvent, self).__init__()
        self.cmds = cmds

    def is_too_soon(self):
        return self.LAST_TIME and time.time() - self.LAST_TIME &lt; 3

    def process_IN_CLOSE_WRITE(self, event):
        # For some reason, this event is triggered twice
        if not self.is_too_soon():
            call(self.cmds)
            self.LAST_TIME = time.time()


def main():

    dir = '.'
    cmds = sys.argv[1:]

    wm = WatchManager()

    mask = EventsCodes.ALL_FLAGS['IN_CLOSE_WRITE']

    notifier = Notifier(wm, ProcessManager(cmds))
    wm.add_watch(dir, mask, rec=True)

    while True:
        try:
            notifier.process_events()
            if notifier.check_events():
                notifier.read_events()
        except KeyboardInterrupt:
            notifier.stop()
            break


if __name__ == '__main__':
    main()</code></pre>
]]>
        </content>
    </entry><entry>
        <title type="html"><![CDATA[Useful scripts - htmlind and xmlind]]></title>
        <author><name>lbolla</name></author>
        <link href="http://lbolla.info/blog/2012/11/23/useful-scripts-htmlind-and-xmlind"/>
        <published>2012-11-23T15:02:00+00:00</published>
        <updated>2012-11-23T15:03:26+00:00</updated>
        <id>http://lbolla.info/blog/2012/11/23/useful-scripts-htmlind-and-xmlind</id>
        <category scheme="http://lbolla.info/blog/tag/#useful-scripts" term="useful-scripts" label="useful-scripts" />
        <category scheme="http://lbolla.info/blog/tag/#python" term="python" label="python" />
        <category scheme="http://lbolla.info/blog/tag/#acme" term="acme" label="acme" />
        <content type="html" xml:base="http://lbolla.info/" xml:lang="en">
            <![CDATA[ <p>This is the forth post of a <a href="/blog/tag/useful-scripts/">series</a> describing simple scripts that
I wrote to ease my life as a programmer.</p>
<p>In this post I&#39;ll describe 2 simple scripts to indent nicely <code>HTML</code>
and <code>XML</code> files. I use them primarily with <a href="http://acme.cat-v.org/"><code>acme</code></a>, to pipe
selected text and get back nicely formatted output.</p>
<p>Code is available here: <a href="https://github.com/lbolla/cmd/blob/master/htmlind"><code>htmlind</code></a> and <a href="https://github.com/lbolla/cmd/blob/master/xmlind"><code>xmlind</code></a>. Both programs are written in Python and make use of specialized libraries freely available online. In particular, <code>xmlind</code> uses <a href="http://docs.python.org/2/library/xml.dom.minidom.html"><code>xml.dom.minidom</code></a>, included in Python&#39;s standard library, and <code>htmlind</code> uses a modified version of <a href="https://github.com/lbolla/cmd/blob/master/pylib/BeautifulSoup.py"><code>BeautifulSoup</code></a>.</p>
<p>The most interesting part of these script is the modification to <code>BeautifulSoup</code>, in order to support variable <code>tabstop</code> width in pretty printing. The patch is <a href="https://github.com/lbolla/cmd/commit/0079356bab483b5739748e170f4c6bedef0e5b84">here</a>: it basically allows a user to set <code>tabstop</code> width as an environmental variable (<code>$tabstop</code>) which defaults to &ldquo;4&rdquo;.</p>
<p>For example:</p>

<pre><code>% echo '&lt;a&gt;&lt;b&gt;text text&lt;/b&gt;&lt;c&gt;more text&lt;/c&gt;&lt;/a&gt;' | htmlind
&lt;a&gt;
    &lt;b&gt;
        text text
    &lt;/b&gt;
    &lt;c&gt;
        more text
    &lt;/c&gt;
&lt;/a&gt;

% tabstop=1 echo '&lt;a&gt;&lt;b&gt;text text&lt;/b&gt;&lt;c&gt;more text&lt;/c&gt;&lt;/a&gt;' | htmlind
&lt;a&gt;
 &lt;b&gt;
  text text
 &lt;/b&gt;
 &lt;c&gt;
  more text
 &lt;/c&gt;
&lt;/a&gt;</code></pre>
]]>
        </content>
    </entry><entry>
        <title type="html"><![CDATA[Useful scripts - csvfmt, xmlfmt, jsonfmt]]></title>
        <author><name>lbolla</name></author>
        <link href="http://lbolla.info/blog/2012/11/16/useful-scripts-csvfmt-jsonfmt-htmlfmt"/>
        <published>2012-11-16T13:45:52+00:00</published>
        <updated>2012-11-16T13:46:59+00:00</updated>
        <id>http://lbolla.info/blog/2012/11/16/useful-scripts-csvfmt-jsonfmt-htmlfmt</id>
        <category scheme="http://lbolla.info/blog/tag/#useful-scripts" term="useful-scripts" label="useful-scripts" />
        <category scheme="http://lbolla.info/blog/tag/#python" term="python" label="python" />
        <content type="html" xml:base="http://lbolla.info/" xml:lang="en">
            <![CDATA[ <p>This is the third post of a <a href="/blog/tag/useful-scripts/">series</a> describing simple scripts that
I wrote to ease my life as a programmer.</p>
<p>In this post, I&#39;ll describe 3 scripts to &ldquo;pretty print&rdquo; some common
file types, to improve readability: <a href="https://github.com/lbolla/cmd/blob/master/csvfmt"><code>csvfmt</code></a>, <a href="https://github.com/lbolla/cmd/blob/master/xmlfmt"><code>xmlfmt</code></a> and
<a href="https://github.com/lbolla/cmd/blob/master/jsonfmt"><code>jsonfmt</code></a>.</p>
<p><a href="https://github.com/lbolla/cmd/blob/master/csvfmt"><code>csvfmt</code></a> takes a <code>CSV</code> (&ldquo;Comma Separated Values&rdquo;) file from
<code>stdin</code>, parses it and pretty print each record as a Python
dictionary.</p>

<pre><code>#!/usr/bin/env python

import csv
import sys
import pprint

for row in csv.DictReader(sys.stdin):
    pprint.pprint(row)</code></pre>
<p>Output looks like this:</p>

<pre><code>% echo 'a,b,c
1,2,3
4,5,6
' | csvfmt
{'a': '1', 'b': '2', 'c': '3'}
{'a': '4', 'b': '5', 'c': '6'}</code></pre>
<p><a href="https://github.com/lbolla/cmd/blob/master/xmlfmt"><code>xmlfmt</code></a> takes an <code>XML</code> file from either <code>stdin</code> or a file
(specified on the cmd line) and extracts all the text from it. This
script is thought to be used to read the text embedded in <code>XML</code> tags,
and it&#39;s analogous to <code>[htmlfmt</code>]<a href="http://man.cat-v.org/plan_9/1/fmt">5</a>. If you want to format an <code>XML</code>
file, maintaining the <code>XML</code> tags, use <code>[xmllint -format</code>]<a href="http://xmlsoft.org/xmllint.html">6</a>, or my
<code>[xmlind</code>]<a href="described%20another%20blog%20post%20of%20this%20series.">7</a></p>

<pre><code>#!/usr/bin/env python

import xml.dom.minidom
from pylib.xmlutil import getText, getInput

dom = xml.dom.minidom.parse(getInput())
print(getText(dom))</code></pre>
<p>For example:</p>

<pre><code>% echo '&lt;a&gt;a text&lt;b&gt;b text&lt;/b&gt;more a text&lt;/a&gt;' | xmlfmt
a textb textmore a text</code></pre>
<p><a href="https://github.com/lbolla/cmd/blob/master/jsonfmt"><code>jsonfmt</code></a> takes a <code>JSON</code> file from <code>stdin</code> and pretty prints it as
a Python object.</p>

<pre><code>#!/usr/bin/env python

import json
import sys
import pprint

pprint.pprint(json.load(sys.stdin))</code></pre>
<p>Try it out:</p>

<pre><code>$&gt; curl 'http://search.twitter.com/search.json?q=lorenzo' | jsonfmt
{u'completed_in': 0.035,
 u'max_id': 267982040698351617L,
 u'max_id_str': u'267982040698351617',
 u'next_page': u'?page=2&amp;max_id=267982040698351617&amp;q=lorenzo',
 u'page': 1,
 u'query': u'lorenzo',
 u'refresh_url': u'?since_id=267982040698351617&amp;q=lorenzo',
 u'results': [{u'created_at': u'Mon, 12 Nov 2012 13:27:52 +0000',
               u'from_user': u'michael_174',
               u'from_user_id': 234373960,
               u'from_user_id_str': u'234373960',
               u'from_user_name': u'Michael Adhiyatama',
               u'geo': None,
               u'id': 267982040698351617L,
               u'id_str': u'267982040698351617',
               u'iso_language_code': u'in',
 etc. etc.</code></pre>
<p>All three scripts are written in Python and available <a href="https://github.com/lbolla/cmd">here</a>.</p>
]]>
        </content>
    </entry><entry>
        <title type="html"><![CDATA[Wordpress to markdown]]></title>
        <author><name>lbolla</name></author>
        <link href="http://lbolla.info/blog/2012/11/12/wordpress-to-markdown"/>
        <published>2012-11-12T15:00:00+00:00</published>
        <updated>2012-11-12T15:33:59+00:00</updated>
        <id>http://lbolla.info/blog/2012/11/12/wordpress-to-markdown</id>
        <category scheme="http://lbolla.info/blog/tag/#markdown" term="markdown" label="markdown" />
        <category scheme="http://lbolla.info/blog/tag/#python" term="python" label="python" />
        <category scheme="http://lbolla.info/blog/tag/#wordpress" term="wordpress" label="wordpress" />
        <content type="html" xml:base="http://lbolla.info/" xml:lang="en">
            <![CDATA[ <p>Recently, I moved away from <a href="http://wordpress.com/">Wordpress</a>. I did it primarily because
<a href="http://john.onolan.org/ghost/">Wordpress is so much more than <strong>just</strong> a blogging platform</a> and
what I needed was just a simple way of publishing posts with embedded
code, links and images. Moreover, writing blogs using Wordpress&#39;s web
editor is less than ideal&hellip;</p>
<p>The biggest problem to solve when moving away from Wordpress is how to
not lose all your posts. Luckily, Wordpress allows you to <a href="http://codex.wordpress.org/Tools_Export_Screen">export all
your stuff in XML</a>, but you also need a way to import them in
whatever other blogging platform you are going to use.</p>
<p>After some research, I decided to choose a <a href="http://mickgardner.com/2011/04/27/An-Introduction-To-Static-Site-Generators.html">static site generator</a>.
Out of all the <a href="http://siliconangle.com/blog/2012/03/20/5-minimalist-static-html-blog-generators-to-check-out/">available alternatives</a>, I picked <a href="http://liquidluck.readthedocs.org/en/latest/">Felix
Felicis</a> (aka &ldquo;liquidluck&rdquo;): it&#39;s written in Python, very simple to
customize and extend, and with some pleasing themes. Other solutions,
like <a href="https://github.com/mojombo/jekyll">jekyll</a>, <a href="http://publicstatic.org/">public-static</a>, etc. are way too &ldquo;powerful&rdquo;
(read &ldquo;complicated&rdquo;) for my taste.</p>
<p>Unfortunately, unlike other more popular alternatives, <a href="http://liquidluck.readthedocs.org/en/latest/">Felix
Felicis</a> does not come with an &ldquo;importer&rdquo; of Wordpress&#39;s XML file.
So, I decided to <a href="https://github.com/lbolla/wp2md/tree/liquidluck">fork one of the existing solutions and adapt it to
my needs</a>.</p>
<p>I also forked the <a href="https://github.com/lepture/liquidluck-theme-moment">liquid luck&#39;s default theme</a> and <a href="https://github.com/lbolla/liquidluck-theme-moment">created my
own</a>.</p>
<p>If you want to do like me, migrate away from Wordpress and use Felix
Felicis as your static site generator, do the following:</p>

<ol>
<li>Export your posts from Wordpress in an XML file</li>
<li><code>git clone</code> my fork of <code>wp2md</code> and run it over the XML file</li>
<li>Manually check that all your links and posts have been properly
exported: mine needed almost zero editing!</li>
</ol>
]]>
        </content>
    </entry><entry>
        <title type="html"><![CDATA[Useful scripts - c+/c-]]></title>
        <author><name>lbolla</name></author>
        <link href="http://lbolla.info/blog/2012/11/09/useful-scripts-cc"/>
        <published>2012-11-09T16:20:00+00:00</published>
        <updated>2012-11-30T14:07:20+00:00</updated>
        <id>http://lbolla.info/blog/2012/11/09/useful-scripts-cc</id>
        <category scheme="http://lbolla.info/blog/tag/#rc" term="rc" label="rc" />
        <category scheme="http://lbolla.info/blog/tag/#acme" term="acme" label="acme" />
        <category scheme="http://lbolla.info/blog/tag/#useful-scripts" term="useful-scripts" label="useful-scripts" />
        <category scheme="http://lbolla.info/blog/tag/#python" term="python" label="python" />
        <content type="html" xml:base="http://lbolla.info/" xml:lang="en">
            <![CDATA[ <p>This is the second post of a <a href="/blog/tag/useful-scripts/">series</a> describing simple scripts that I wrote
to ease my life as a programmer.</p>
<p>They are <a href="https://github.com/lbolla/cmd">available on github</a>: fork &amp; hack at will!</p>

<h2 id="toc_0">c+/c-</h2>
<p>In this post I&#39;ll describe a very simple script, <a href="https://github.com/lbolla/cmd/blob/master/c%2B"><code>c+</code></a>, and its counterpart
<a href="https://github.com/lbolla/cmd/blob/master/c-"><code>c-</code></a>.</p>
<p><code>c+</code> prepends every line of <code>stdin</code> with <code>#</code>. <code>c-</code> strips <code>#</code> from the
beginning of each line of <code>stdin</code>. I use these scripts to comment/uncomment
lines in Python scripts when using <a href="http://acme.cat-v.org/"><code>acme</code></a>.</p>
<p><img src="/blog/img/cc_acme.png" alt="c+/c- in acme"/></p>
<p>Here is the code:</p>

<h3 id="toc_1">c+</h3>

<pre><code>#!/usr/bin/env rc

sed 's/^/#/'</code></pre>

<h3 id="toc_2">c-</h3>

<pre><code>#!/usr/bin/env rc

sed 's/^#//'</code></pre>
]]>
        </content>
    </entry><entry>
        <title type="html"><![CDATA[Useful scripts - a+/a-]]></title>
        <author><name>lbolla</name></author>
        <link href="http://lbolla.info/blog/2012/11/02/useful-scripts-aa"/>
        <published>2012-11-02T15:42:02+00:00</published>
        <updated>2012-11-12T14:07:01+00:00</updated>
        <id>http://lbolla.info/blog/2012/11/02/useful-scripts-aa</id>
        <category scheme="http://lbolla.info/blog/tag/#rc" term="rc" label="rc" />
        <category scheme="http://lbolla.info/blog/tag/#acme" term="acme" label="acme" />
        <category scheme="http://lbolla.info/blog/tag/#useful-scripts" term="useful-scripts" label="useful-scripts" />
        <content type="html" xml:base="http://lbolla.info/" xml:lang="en">
            <![CDATA[ <p>This is the first post of a <a href="/blog/tag/useful-scripts/">series</a> describing simple scripts that I wrote
to ease my life as a programmer.</p>
<p>They are implemented in various languages (<code>python</code>, <code>bash</code>, <code>go</code>) and thought
to be used in <code>Linux</code>. Some of them are &ldquo;general purpose&rdquo;, while others are
specifically designed to interface other tools I use (for example,
<a href="http://acme.cat-v.org/"><code>acme</code></a>.)</p>
<p>All of them tend to have the following properties:</p>

<ul>
<li>  Input from <code>stdin</code>, output to <code>stdout</code>, errors to <code>stderr</code></li>
<li>  Return zero on success, non-zero on failure</li>
<li>  Do one thing only</li>
<li>  Not too much customizable</li>
</ul>
<p>These properties allow the scripts to remain very simple, be composable and
easy to remember.</p>
<p>They are <a href="https://github.com/lbolla/cmd">available on github</a>: fork &amp; hack at will!</p>

<h2 id="toc_0">a+/a-</h2>
<p>In this post I&#39;ll describe a very simple script, <a href="https://github.com/lbolla/cmd/blob/master/a%2B"><code>a+</code></a>, and its counterpart
<a href="https://github.com/lbolla/cmd/blob/master/a-"><code>a-</code></a>. They are the first I wrote when I started using <a href="http://acme.cat-v.org/"><code>acme</code></a>.</p>
<p><code>a+</code> indents every line of <code>stdin</code> by 4 spaces. <code>a-</code> &ldquo;de-indents&rdquo; it by the
same amount. The amount of spaces (4) is fixed (to resist the temptation to
change it), and indentation is done with <a href="http://www.python.org/dev/peps/pep-0008/#tabs-or-spaces">spaces and not tabs</a>.</p>
<p>The code is trivial: it uses <a href="http://swtch.com/plan9port/man/man1/sed.html"><code>sed</code></a> and <a href="http://swtch.com/plan9port/man/man1/rc.html"><code>rc</code></a>, the <a href="http://swtch.com/plan9port/">Plan9&#39;s shell
ported to *nix</a> (although, in this case, any shell would do.) Here it is:</p>

<h3 id="toc_1">a+</h3>

<pre><code># !/usr/bin/env rc  

sed 's/^/ /'</code></pre>

<h3 id="toc_2">a-</h3>

<pre><code># !/usr/bin/env rc  

sed 's/^ //'</code></pre>
]]>
        </content>
    </entry><entry>
        <title type="html"><![CDATA[URL Counter in GO]]></title>
        <author><name>lbolla</name></author>
        <link href="http://lbolla.info/blog/2012/10/10/url-counter-in-go"/>
        <published>2012-10-10T14:33:56+00:00</published>
        <updated>2012-11-06T14:01:39+00:00</updated>
        <id>http://lbolla.info/blog/2012/10/10/url-counter-in-go</id>
        <category scheme="http://lbolla.info/blog/tag/#go" term="go" label="go" />
        <category scheme="http://lbolla.info/blog/tag/#google" term="google" label="google" />
        <content type="html" xml:base="http://lbolla.info/" xml:lang="en">
            <![CDATA[ <p>After having <a href="/blog/tag/#haskell">tinkered with Haskell for quite a bit</a>, I decided that I
needed some rest from theory and esoteric concepts, and a more pragmatic
programming language to explore.</p>
<p>I&#39;ve spent the last few days refreshing my memories on <a href="http://golang.org/">Go</a>: I hadn&#39;t
touched it for almost 2 years and I must say that I find it changed: for the
better.</p>
<p>Here is a short tutorial on how to write a simple web application in Go, and
publish it on <a href="https://developers.google.com/appengine/">Google App Engine</a>. The application is not a mere exercise,
but scratches an itch I recently had: it counts how many times each of its
handlers is hit. So, for example, visiting:
<a href="http://go-count-urls.appspot.com/hello">go-count-urls.appspot.com/hello</a> returns how many times the <code>/hello</code>
handler has been visited. You can use it as a trivial real-time tracker.</p>
<p>For example, I used it to verify that an email I sent to someone was actually
opened (and presumably read). I just picked a random URL path (like
<a href="http://go-count-urls.appspot.com/random-string-here">go-count-urls.appspot.com/random-string-here</a>) and created an html
email with an empty <code>img</code> tag pointing to it: <code>&lt;img
src=&quot;http://go-count-urls.appspot.com/random-string-here&quot; width=0 height=0 /&gt;</code>.
Every time the email client opens the email, it requires that URL and the hit
is recorded. I admit that this use is pretty lame, and that there are <a href="http://www.spypig.com/">other
services</a> doing this, but I needed a real-world problem to work on!</p>
<p>So here we go! </p>

<h2 id="toc_0">Setup your development environment</h2>
<p>First of all, download and install the <a href="https://developers.google.com/appengine/docs/go/gettingstarted/devenvironment">App Engine Go software development kit</a>. Then create the following directory structure: </p>

<pre><code>go-count-urls/
    app.yaml
    app/
        counter.go</code></pre>

<h2 id="toc_1">Show me the code!</h2>
<p>The whole application is made of just one file <code>[counter.go</code>]<a href="https://github.com/lbolla/go-count-urls/blob/master/app/counter.go">6</a>. Here it is, comments inline:</p>

<pre><code>package counter
import (
    &quot;appengine&quot;
    &quot;appengine/datastore&quot;
    &quot;fmt&quot;
    &quot;net/http&quot;
    &quot;time&quot;
)

// Object to store in Google's Datastore. Keeps track of how many times a
// URL was hit and when.
type Counter struct {
    Path      string
    Count     int
    Timestamp time.Time
}

// Return a brand new Counter
func getEmptyCounter(path string) Counter {
    return Counter{Path: path, Count: 0, Timestamp: time.Now()}
}

// Increment the counter for a URL. If it's the first time this URL is
// visited, create a brand new Counter before incrementing it.
// On error, return and empty counter and an error.
func inc(c appengine.Context, key *datastore.Key, path string) (Counter, error)
{
    var x Counter

    if err := datastore.Get(c, key, &amp;amp;x); err != nil &amp;amp;&amp;amp; err !=
datastore.ErrNoSuchEntity {
        return getEmptyCounter(path), err
    }

    // Increment it, and update the last modified time
    x.Path = path
    x.Count++
    x.Timestamp = time.Now()

    // Save the counter
    if _, err := datastore.Put(c, key, &amp;amp;x); err != nil {
        return getEmptyCounter(path), err
    }

    return x, nil
}

// This is the only handler. It just picks the paths, removed the leading
// slash and stores it in the Datastore. As a key in the Datastore, the URL
// itself is used.
func handle(w http.ResponseWriter, r *http.Request) {

    key := r.URL.Path[1:]
    if key == &quot;&quot; {
        // Return 404 on the root handler (we might want a splash page here...)
        http.NotFound(w, r)
        return
    } else if key == &quot;favicon.ico&quot; {
        // We are not interested in tracking favicon.ico
        w.WriteHeader(http.StatusNoContent)
        return
    }

    c := appengine.NewContext(r)

    // For how to use the Datastore see
https://developers.google.com/appengine/docs/go/datastore/overview
    count, err := inc(c, datastore.NewKey(c, key, &quot;singleton&quot;, 0, nil),
r.URL.Path)
    if err != nil {
        http.Error(w, err.Error(), http.StatusInternalServerError)
        return
    }

    // Write something
    w.Header().Set(&quot;Content-Type&quot;, &quot;text/plain; charset=utf-8&quot;)
    fmt.Fprintf(w, &quot;Path=%s, Count=%d, When=%s&quot;, count.Path, count.Count,
count.Timestamp)
}

// Initialize the application, binding URLS to handlers.
func init() {
    http.HandleFunc(&quot;/&quot;, handle)
}</code></pre>

<h2 id="toc_2">Try it out!</h2>
<p>Launch the application using the SDK; from go-count-urls directory type: </p>

<pre><code>$&gt; $GAE_PATH/dev_appserver.py .</code></pre>
<p>Now visit <a href="http://localhost:8080/hello">localhost:8080/hello</a>. Refresh. Refresh again. And again&hellip; </p>

<h2 id="toc_3">Publish</h2>
<p>Publishing the application on Google infrastructure is a matter of seconds: </p>

<pre><code>$&gt; $GAE_PATH/appcfg.py update .</code></pre>
<p>You can visit it at: <a href="http://go-count-urls.appspot.com/hello">go-count-urls.appspot.com/hello</a>. The code is available here: <a href="https://github.com/lbolla/go-count-urls">github.com/lbolla/go-count-urls</a>.</p>
]]>
        </content>
    </entry><entry>
        <title type="html"><![CDATA[Asynchronous programming with Tornado]]></title>
        <author><name>lbolla</name></author>
        <link href="http://lbolla.info/blog/2012/10/03/asynchronous-programming-with-tornado"/>
        <published>2012-10-03T12:33:49+00:00</published>
        <updated>2012-12-01T15:43:59+00:00</updated>
        <id>http://lbolla.info/blog/2012/10/03/asynchronous-programming-with-tornado</id>
        <category scheme="http://lbolla.info/blog/tag/#tornado" term="tornado" label="tornado" />
        <category scheme="http://lbolla.info/blog/tag/#web" term="web" label="web" />
        <category scheme="http://lbolla.info/blog/tag/#asynchronous" term="asynchronous" label="asynchronous" />
        <content type="html" xml:base="http://lbolla.info/" xml:lang="en">
            <![CDATA[ <p>Asynchronous programming can be tricky for beginners, therefore I think it&#39;s
useful to iron some basic concepts to avoid common pitfalls. For an explanation
about generic asynchronous programming, I recommend you one of the <a href="http://en.wikipedia.org/wiki/Asynchrony">many</a>
<a href="http://www.cs.brown.edu/courses/cs196-5/f12/handouts/async.pdf">resources</a> <a href="http://krondo.com/?page_id=1327">online</a>. I will focus solely on asynchronous programming in
<a href="http://www.tornadoweb.org/documentation/index.html">Tornado</a>.</p>
<p>From Tornado&#39;s homepage: </p>

<blockquote>
<p>FriendFeed&#39;s web server is a relatively simple, non-blocking web server
written in Python. The FriendFeed application is written using a web
framework that looks a bit like web.py or Google&#39;s webapp, but with
additional tools and optimizations to take advantage of the non-blocking web
server and tools. Tornado is an open source version of this web server and
some of the tools we use most often at FriendFeed. The framework is distinct
from most mainstream web server frameworks (and certainly most Python
frameworks) because it is non-blocking and reasonably fast. Because it is
non-blocking and uses epoll or kqueue, it can handle thousands of
simultaneous standing connections, which means the framework is ideal for
real-time web services. We built the web server specifically to handle
FriendFeed&#39;s real-time features every active user of FriendFeed maintains an
open connection to the FriendFeed servers. (For more information on scaling
servers to support thousands of clients, see The C10K problem.)</p>
</blockquote>
<p>The first step as a beginner is to figure out if you <em>really</em> need to go
asynchronous. Asynchronous programming is more complicated that synchronous
programming, because, as someone described, it does not fit human brain nicely.</p>
<p>You should use asynchronous programming when your application needs to monitor
some resources and react to changes in their state. For example, a web server
sitting idle until a request arrives through a socket is an ideal candidate. Or
an application that has to execute tasks periodically or delay their execution
after some time. The alternative is to use multiple threads (or processes) to
control multiple tasks and this model becomes quickly complicated.</p>
<p>The second step is to figure out if you <em>can</em> go asynchronous. Unfortunately in
Tornado, not all the tasks can be executed asynchronously.</p>
<p>Tornado is single threaded (in its common usage, although in supports multiple
threads in advanced configurations), therefore any &ldquo;blocking&rdquo; task will block
the whole server.  This means that a blocking task will not allow the framework
to pick the next task waiting to be processed. The selection of tasks is done
by the <a href="http://www.tornadoweb.org/documentation/ioloop.html?highlight=ioloop#tornado.ioloop.IOLoop"><code>IOLoop</code></a>, which, as everything else, runs in the only available
thread.</p>
<p>For example, this is a <em>wrong</em> way of using <code>IOLoop</code>:</p>

<script src="https://gist.github.com/3826189.js?file=async_generic.py"></script>
<p>Note that <code>blocking_call</code> is called correctly, but, being
blocking (<code>time.sleep</code> blocks!), it will prevent the execution of the following
task (the second call to the same function). Only when the first call will end,
the second will be called by <code>IOLoop</code>. Therefore, the output in console is
sequential (&ldquo;sleeping&rdquo;, &ldquo;awake!&rdquo;, &ldquo;sleeping&rdquo;, &ldquo;awake!&rdquo;).</p>
<p>Compare the same
&ldquo;algorithm&rdquo;, but using an &ldquo;asynchronous version&rdquo; of <code>time.sleep</code>, i.e.
<code>add_timeout</code>:</p>

<script src="https://gist.github.com/3826189.js?file=async_sleep_1.py"></script>
<p>In this case, the first
task will be called, it will print &ldquo;sleeping&rdquo; and then it will ask <code>IOLoop</code> to
schedule the execution of the rest of the routine after 1 second. <code>IOLoop</code>,
having the control again, will fire the second call the function, which will
print &ldquo;sleeping&rdquo; again and return control to <code>IOLoop</code>. After 1 second <code>IOLoop</code>
will carry on where he left with the first function and &ldquo;awake&rdquo; will be
printed. Finally, the second &ldquo;awake&rdquo; will be printed, too. So, the sequence of
prints will be: &ldquo;sleeping&rdquo;, &ldquo;sleeping&rdquo;, &ldquo;awake!&rdquo;, &ldquo;awake!&rdquo;. The two function
calls have been executed concurrently (<a href="http://stackoverflow.com/questions/1897993/difference-between-concurrent-programming-and-parallel-programming">not in parallel</a>, though!).</p>
<p>So, I hear you asking, &ldquo;how do I create functions that can be executed
asynchronously&rdquo;? In Tornado, every function that has a &ldquo;callback&rdquo; argument can
be used with <code>gen.engine.Task</code>. <em>Beware though</em>: being able to use <code>Task</code> does
not make the execution asynchronous! There is no magic going on: the function
is simply scheduled to execution, executed and whatever is passed to <code>callback</code>
will become the return value of <code>Task</code>. See below:</p>

<script src="https://gist.github.com/3826189.js?file=async_generic.py"></script>
<p>Most beginners expect to be able to just write: <code>Task(my_func)</code>, and
automagically execute <code>my_func</code> asynchronously. This is not how Tornado works.
This is how <a href="http://golang.org/">Go</a> works! And this is my last remark:</p>

<blockquote>
<p>In a function that is going to be used &ldquo;asynchronously&rdquo;, only asynchronous
libraries should be used.</p>
</blockquote>
<p>By this, I mean that blocking calls like <code>time.sleep</code> or
<code>urllib2.urlopen</code> or <code>db.query</code> will need to be substituted by their equivalent
asynchronous version. For example, <code>IOLoop.add_timeout</code> instead of
<code>time.sleep</code>, <code>AsyncHTTPClient.fetch</code> instead of <code>urllib2.urlopen</code> etc. For DB
queries, the situation is more complicated and specific asynchronous drivers to
talk to the DB are needed. For example: <a href="http://blog.mongodb.org/post/30927719826/motor-asynchronous-driver-for-mongodb-and-python">Motor</a> for MongoDB. </p>
]]>
        </content>
    </entry><entry>
        <title type="html"><![CDATA[Experiments with ExtJS]]></title>
        <author><name>lbolla</name></author>
        <link href="http://lbolla.info/blog/2012/09/20/experiments-with-extjs"/>
        <published>2012-09-20T17:35:21+00:00</published>
        <updated>2012-11-04T17:32:53+00:00</updated>
        <id>http://lbolla.info/blog/2012/09/20/experiments-with-extjs</id>
        <category scheme="http://lbolla.info/blog/tag/#extjs" term="extjs" label="extjs" />
        <category scheme="http://lbolla.info/blog/tag/#javascript" term="javascript" label="javascript" />
        <content type="html" xml:base="http://lbolla.info/" xml:lang="en">
            <![CDATA[ <p>For a non-designer, <a href="http://www.sencha.com/products/extjs/" title="Ext JS">Ext JS</a> is kind-of a blessing. It is a self-contained
fully-fledged Javascript framework, with loads of fancy re-usable browser-compatible
professionally-looking widgets. It&#39;s only lacking in documentation: finding
your way through the <a href="http://docs.sencha.com/ext-js/4-1/">API documentation</a> is daunting at best.</p>
<p>So, I bought <a href="http://www.amazon.co.uk/Ext-Web-Application-Development-Cookbook/dp/1849516863/ref=sr_1_1?s=books&amp;ie=UTF8&amp;qid=1348157999&amp;sr=1-1">this</a>, and while working my way through it, I decided to share
some experiments. You can find them <a href="http://lbolla.github.com/extjs-experiments/">here</a>. These are the ones I prefer:</p>

<ul>
<li><a href="http://lbolla.github.com/extjs-experiments/cookbook/p107a/">lbolla.github.com/extjs-experiments/cookbook/p107a/</a></li>
<li><a href="http://lbolla.github.com/extjs-experiments/cookbook/p120/">lbolla.github.com/extjs-experiments/cookbook/p120/</a></li>
</ul>
]]>
        </content>
    </entry><entry>
        <title type="html"><![CDATA[Errors printing PDFs with CUPS]]></title>
        <author><name>lbolla</name></author>
        <link href="http://lbolla.info/blog/2012/09/13/errors-printing-pdfs-with-cups"/>
        <published>2012-09-13T09:38:57+00:00</published>
        <updated>2012-11-04T17:32:53+00:00</updated>
        <id>http://lbolla.info/blog/2012/09/13/errors-printing-pdfs-with-cups</id>
        <category scheme="http://lbolla.info/blog/tag/#cups" term="cups" label="cups" />
        <content type="html" xml:base="http://lbolla.info/" xml:lang="en">
            <![CDATA[ <p>This is another of those posts <em>to not forget</em>. If printing a PDF file with
<code>lp</code> prints a blank page with error messages like: </p>

<blockquote>
<p>ERROR: configurationerror OFFENDING COMMAND: setpagedevice STACK: &ndash;nostringval&ndash; &hellip;</p>
</blockquote>
<p>the problem is probably that your PDF has a certain page size (let&#39;s say
<em>letter</em>) but your printer expects another (let&#39;s say <em>A4</em>).</p>
<p>Check your printer
settings and your PDf (with <code>lpinfo pdffile</code>) to verify. If this is the case,
print with this command instead:</p>

<pre><code>lp -o fit-to-page pdffile</code></pre>
]]>
        </content>
    </entry><entry>
        <title type="html"><![CDATA[Tornado vs. Warp vs. Yesod]]></title>
        <author><name>lbolla</name></author>
        <link href="http://lbolla.info/blog/2012/09/02/tornado-vs-warp-vs-yesod"/>
        <published>2012-09-02T17:07:02+00:00</published>
        <updated>2012-12-01T15:43:59+00:00</updated>
        <id>http://lbolla.info/blog/2012/09/02/tornado-vs-warp-vs-yesod</id>
        <category scheme="http://lbolla.info/blog/tag/#benchmark" term="benchmark" label="benchmark" />
        <category scheme="http://lbolla.info/blog/tag/#haskell" term="haskell" label="haskell" />
        <category scheme="http://lbolla.info/blog/tag/#python" term="python" label="python" />
        <category scheme="http://lbolla.info/blog/tag/#tornado" term="tornado" label="tornado" />
        <category scheme="http://lbolla.info/blog/tag/#warp" term="warp" label="warp" />
        <category scheme="http://lbolla.info/blog/tag/#yesod" term="yesod" label="yesod" />
        <content type="html" xml:base="http://lbolla.info/" xml:lang="en">
            <![CDATA[ <p>Today I tried to benchmark 3 web servers that I&#39;ve used recently: </p>

<ol>
<li><a href="http://www.tornadoweb.org/" title="Tornado">Tornado</a></li>
<li><a href="http://hackage.haskell.org/package/warp/" title="Warp">Warp</a></li>
<li><a href="http://www.yesodweb.com/" title="Yesod">Yesod</a></li>
</ol>
<p>In fact, these are only <strong>2</strong> web servers, because <code>Yesod</code> runs on top of Warp
and it&#39;s a fully fledged web framework, rather than a web server: but this was
also the intent of the benchmark, i.e. to measure how slower all its goodies
made Yesod with respect to Warp.</p>
<p>Tornado and Warp are obviously very different
web servers (async vs. threaded, interpreted vs. compiled, etc.) but, <a href="http://ziutek.github.com/web_bench/">who
cares</a>?</p>
<p>The benchmark is very simple: a single handler returning &ldquo;Hello
World&rdquo;, very original. Obviously, this is hardly a real world example, but it
can give indications even if only with &ldquo;orders of magnitude&rdquo; of approximation.</p>
<p>Nonetheless, the results were very interesting. First of all, here is the code.</p>

<h3 id="toc_0">Tornado</h3>

<script src="https://gist.github.com/3567006.js?file=tornadoweb.py"></script>

<h3 id="toc_1">Warp</h3>

<script src="https://gist.github.com/3567006.js?file=warp.hs"></script>

<h3 id="toc_2">Yesod</h3>

<script src="https://gist.github.com/3567006.js?file=yesod"></script>
<p>And the results, obtained using httperf:</p>

<pre><code>$&gt; httperf --hog --client=0/1 --server=localhost --port=8080 --uri=/ --rate=1000 --send-buffer=4096 --recv-buffer=16384 --num-conns=100 --num-calls=100 --burst-length=20</code></pre>

<table>
<tr>
<th>Tornado</th>
<td>518 req/s</td>
</tr>
<tr>
<th>Warp</th>
<td>10079 req/s</td>
</tr>
<tr>
<th>Yesod</th>
<td>929 req/s</td>
</tr>
<tr>
<th>Yesod w/o session management</th>
<td>7924 req/s</td>
</tr>
</table>
<p>Wait! <em>What?!</em> Yesod is 10 times slower than Warp!?</p>
<p>I asked an explanation to the Yesod developers and <a href="https://github.com/yesodweb/yesod/issues/415">they tracked down the
issue</a>: the work of these guys is an example worth studying of how to
benchmark and debug code! Anyway, it looks like the issue is that serializing
timestamps is incredibly inefficient: I hope a patch will be ready soon! In the
meantime, I strongly suggest you to disable session management from Yesod if
you want high performance. <em>(In the code shown, I&#39;ve also disabled Hamlet,
Yesod&#39;s templating system, but it turned out that it didn&#39;t make much
difference: <a href="https://gist.github.com/3567006/76f6245c21adc5576f201bcf83437269e8f56d93">code using Hamlet is in gist</a>.)</em></p>
<p>Overall, though, even on my crappy single-core old laptop, the result is
amazing: Warp/Yesod is ~20 times faster than one of the fastest Python web
servers.</p>
]]>
        </content>
    </entry><entry>
        <title type="html"><![CDATA[Auto add reviewers in Gerrit]]></title>
        <author><name>lbolla</name></author>
        <link href="http://lbolla.info/blog/2012/08/01/auto-add-reviewers-in-gerrit"/>
        <published>2012-08-01T11:04:21+00:00</published>
        <updated>2012-11-04T17:32:53+00:00</updated>
        <id>http://lbolla.info/blog/2012/08/01/auto-add-reviewers-in-gerrit</id>
        <category scheme="http://lbolla.info/blog/tag/#gerrit" term="gerrit" label="gerrit" />
        <category scheme="http://lbolla.info/blog/tag/#javascript" term="javascript" label="javascript" />
        <category scheme="http://lbolla.info/blog/tag/#git" term="git" label="git" />
        <content type="html" xml:base="http://lbolla.info/" xml:lang="en">
            <![CDATA[ <p>If you are using <a href="http://code.google.com/p/gerrit/" title="Gerrit">Gerrit</a> for code review and project management of
git-based projects, you might find yourself manually adding the same bunch of
reviewers to your patches every single time.</p>
<p>In the past, I alleviated the
problem with a simple Javascript bookmarklet: add it to your browser and
click it while watching the patch in Gerrit.</p>

<script src="https://gist.github.com/1303423.js?file=add_reviewer_bookmarklet.js"></script>
<p>But there&#39;s a better method: do it
from command line, when pushing your local commits to Gerrit. Just add these
lines to your <code>.git/config</code>:</p>

<pre><code>pushurl = ssh://user@gerrit:29418/project
push = HEAD:refs/for/master
receivepack = git receive-pack --reviewer reviewer1 --reviewer reviewer2</code></pre>
<p>Now, when you want to push a review, just do: <code>git push review</code> and &ldquo;reviewer1&rdquo;
and &ldquo;reviewer2&rdquo; will be added to your patchset.</p>
]]>
        </content>
    </entry><entry>
        <title type="html"><![CDATA[Broadcom 43xx driver install in Arch Linux]]></title>
        <author><name>lbolla</name></author>
        <link href="http://lbolla.info/blog/2012/07/31/broadcom-43xx-driver-install-in-arch-linux"/>
        <published>2012-07-31T10:13:28+00:00</published>
        <updated>2012-11-04T17:32:53+00:00</updated>
        <id>http://lbolla.info/blog/2012/07/31/broadcom-43xx-driver-install-in-arch-linux</id>
        <category scheme="http://lbolla.info/blog/tag/#linux" term="linux" label="linux" />
        <category scheme="http://lbolla.info/blog/tag/#wireless" term="wireless" label="wireless" />
        <content type="html" xml:base="http://lbolla.info/" xml:lang="en">
            <![CDATA[ <p>This is a vintage post to remind me how to install <code>b43</code> drivers on Arch Linux for my &ldquo;shiny&rdquo; Belkin PCMCIA card (dated 2002&hellip;).</p>

<ul>
<li>First install the firmware extractor: <code>$&gt; pacman -S b43-fwcutter</code>.</li>
<li>Then install the firmware itself: <code>$&gt; yaourt -S b43-firmware</code>.</li>
</ul>
<p>Check <code>dmesg</code>. You should see something like: </p>

<blockquote>
<p>Broadcom 43xx driver loaded [ Features: PMNLS ]</p>
</blockquote>
<p>and <code>lsmod | grep b43</code>: </p>

<blockquote>
<p>b43 330774 0 bcma 19281 1 b43 mac80211 341044 1 b43 cfg80211 147429 2 b43,mac80211 ssb 42167 2 b43,b44 pcmcia 31182 2 b43,ssb mmc_core 72742 2 b43,ssb</p>
</blockquote>
<p>Finally, try to connect:</p>

<pre><code>$&gt; pacman -S wifi-select
$&gt; wifi-select</code></pre>
]]>
        </content>
    </entry><entry>
        <title type="html"><![CDATA[EMpy has moved!]]></title>
        <author><name>lbolla</name></author>
        <link href="http://lbolla.info/blog/2012/06/25/empy-has-moved"/>
        <published>2012-06-25T14:22:21+00:00</published>
        <updated>2012-12-01T15:43:59+00:00</updated>
        <id>http://lbolla.info/blog/2012/06/25/empy-has-moved</id>
        <category scheme="http://lbolla.info/blog/tag/#python" term="python" label="python" />
        <category scheme="http://lbolla.info/blog/tag/#electromagnetism" term="electromagnetism" label="electromagnetism" />
        <category scheme="http://lbolla.info/blog/tag/#empy" term="empy" label="empy" />
        <category scheme="http://lbolla.info/blog/tag/#numerical" term="numerical" label="numerical" />
        <content type="html" xml:base="http://lbolla.info/" xml:lang="en">
            <![CDATA[ <p>EMpy has <a href="http://lbolla.github.com/EMpy/">moved to Github</a>!</p>
]]>
        </content>
    </entry><entry>
        <title type="html"><![CDATA[Pipelines in Python]]></title>
        <author><name>lbolla</name></author>
        <link href="http://lbolla.info/blog/2012/04/20/pipelines-in-python"/>
        <published>2012-04-20T14:07:10+00:00</published>
        <updated>2012-11-05T12:26:45+00:00</updated>
        <id>http://lbolla.info/blog/2012/04/20/pipelines-in-python</id>
        <category scheme="http://lbolla.info/blog/tag/#python" term="python" label="python" />
        <content type="html" xml:base="http://lbolla.info/" xml:lang="en">
            <![CDATA[ <p>Generators (<a href="http://www.python.org/dev/peps/pep-0255/">PEP 255 &ldquo;Simple Generators&rdquo;</a>) and Coroutines (<a href="http://www.python.org/dev/peps/pep-0342/">PEP 342
&ldquo;Coroutines via Enhanced Generators&rdquo;</a>) are the cleanest way I&#39;ve come across
so far to implement the concept of a &ldquo;pipeline&rdquo; in Python. </p>

<h2 id="toc_0">First approximation</h2>
<p>A pipeline is made of: </p>

<ul>
<li>a <em>Producer</em>, that generates data;</li>
<li>many _Stage_s, that receive data from the previous stage and send it to the next;</li>
<li>a <em>Consumer</em>, that receives data from the last stage.</li>
</ul>
<p>The producer is a coroutine that only <em>send_s data, generated internally from
some initial state. _Stage_s are coroutines that both receive and send
messages. The _consumer</em> only receives data. Chaining is done in function
<em>pipeline</em>: each argument but the last is instantiated with an instance of the
next stage. The full pipeline is <em>started</em> by issuing a <em>next</em> (or
<em>send(None)</em>) to the <em>Producer</em>.</p>
<p>In the following example, a stream of integers is produced and pushed down the
pipeline: each stage adds 1 and finally the result is printed in the consumer.</p>

<script src="https://gist.github.com/2428213.js?file=pipeline_1.py"></script>

<h2 id="toc_1">Wrapping it up</h2>
<p>A pattern emerges, so we&#39;d better wrap it up in a class. Moreover, let&#39;s split
the &ldquo;architecture&rdquo; of the pipeline from the behavior of each stage.</p>

<script src="https://gist.github.com/2428213.js?file=pipeline_3.py"></script>

<h2 id="toc_2">More useful example</h2>
<p>As a more interesting application, here is how to use a pipeline to implement a
simple crawler, to download links from <a href="http://news.ycombinator.com/">news.ycombinator.com/</a> and find
all the posts where the word &ldquo;Python&rdquo; is mentioned.</p>

<script src="https://gist.github.com/2428213.js?file=pipeline_4.py"></script>

<h2 id="toc_3">Cleaning things up</h2>
<p>Things are still far from clean and bulletproof. One step in the right
direction is to follow the suggestions found in <a href="http://www.dabeaz.com/Fcoroutines/Coroutines.pdf">David Beazley&#39;s presentation
on coroutines</a>.</p>

<script src="https://gist.github.com/2428213.js?file=pipeline_5.py"></script>
<p>The previous examples is by no means &ldquo;production ready&rdquo;, but maybe someone will
find some good idea to apply to real world problems.</p>
]]>
        </content>
    </entry>
</feed>