<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-9190981547700935267</id><updated>2012-01-06T17:06:11.728-08:00</updated><category term='compiler'/><title type='text'>Ken Pu</title><subtitle type='html'>Assistant Professor, UOIT</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://kenpuca.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://kenpuca.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Ken Pu</name><uri>https://profiles.google.com/110746162749419123574</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-p_hYiNFOMGk/AAAAAAAAAAI/AAAAAAAAAAA/bPaKHuA9dVs/s512-c/photo.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>21</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-9190981547700935267.post-2302598994101086372</id><published>2012-01-06T17:06:00.000-08:00</published><updated>2012-01-06T17:06:11.739-08:00</updated><title type='text'>Farewell to Google blogger for now...</title><content type='html'>Too bad that I have to switch away from Google blogger and move to Tumblr. &amp;nbsp;Tumblr's customizability and better designed API are the two reasons for my migration.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9190981547700935267-2302598994101086372?l=kenpuca.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kenpuca.blogspot.com/feeds/2302598994101086372/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9190981547700935267&amp;postID=2302598994101086372' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/2302598994101086372'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/2302598994101086372'/><link rel='alternate' type='text/html' href='http://kenpuca.blogspot.com/2012/01/farewell-to-google-blogger-for-now.html' title='Farewell to Google blogger for now...'/><author><name>Ken Pu</name><uri>https://profiles.google.com/110746162749419123574</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-p_hYiNFOMGk/AAAAAAAAAAI/AAAAAAAAAAA/bPaKHuA9dVs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9190981547700935267.post-7244553481591080586</id><published>2011-01-18T01:52:00.000-08:00</published><updated>2011-01-18T02:08:32.549-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='compiler'/><title type='text'>A really good tutorial on Lex and yacc</title><content type='html'>I'm teaching compilers, and came across&amp;nbsp;this excellent tutorial on Lex and Yacc &lt;a href="http://epaperpress.com/lexandyacc/index.html"&gt;online&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9190981547700935267-7244553481591080586?l=kenpuca.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kenpuca.blogspot.com/feeds/7244553481591080586/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9190981547700935267&amp;postID=7244553481591080586' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/7244553481591080586'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/7244553481591080586'/><link rel='alternate' type='text/html' href='http://kenpuca.blogspot.com/2011/01/really-good-tutorial-on-lex-and-yacc.html' title='A really good tutorial on Lex and yacc'/><author><name>Ken Pu</name><uri>https://profiles.google.com/110746162749419123574</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-p_hYiNFOMGk/AAAAAAAAAAI/AAAAAAAAAAA/bPaKHuA9dVs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9190981547700935267.post-7610707474379769507</id><published>2011-01-04T09:21:00.001-08:00</published><updated>2011-01-18T01:55:01.360-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='compiler'/><title type='text'>Unix in Windows - the native way</title><content type='html'>Compiler is an old technology - that means that it is not a native citizen of Microsoft Windows.  UOIT is still pre-dominantly a Windows-based environment in which student's laptops are imaged with Windows 7.  I actually began to like Windows since Windows 7...  It's.... shall we say, tolerable to an UNIX fanboy like me.  So, if you are less adventurous and want to work for Compilers in Windows, you will need roll-up your sleeves and go through some very simple setup procedures.&lt;br /&gt;&lt;br /&gt;You will need setup MinGW (minimal Gnu for Windows) environment, and then install a few MinGW packages using the mingw-get tool.&lt;br /&gt;&lt;br /&gt;Installing MinGW:&lt;br /&gt;&lt;br /&gt;Go and download the MinGW installer from its sourceforge site.  The file is called:&lt;br /&gt;&lt;div style="text-align: center;"&gt;mingw-get-inst-YYYYMMDD.exe&amp;nbsp;&lt;/div&gt;where the YYYYMMDD is the timestamp reflecting the version of the install.  At time of writing, the version is 20101030.&lt;br /&gt;&lt;br /&gt;Run it.  Choose gcc and g++.  This will create the MinGW program group under the start &amp;gt; all programs.&lt;br /&gt;Try out the MinGW unix shell, running natively under windows.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;Start &amp;gt; All Programs &amp;gt; MinGW &amp;gt; MinGW Shell&lt;/div&gt;&lt;br /&gt;[Optional] Setup some environment variables.  This is done by going to the properties of your computer.&lt;br /&gt;Create an environment var. "HOME" to a directory of your choice.&lt;br /&gt;Create an environment var. "SHELL" with the value "/bin/bash".  This creates a better experience at the command line.&lt;br /&gt;&lt;br /&gt;Install MinGW packages:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Run the mingw shell.&lt;/li&gt;&lt;li&gt;mingw-get install msys-flex&lt;/li&gt;&lt;li&gt;mingw-get install msys-bison&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;That's it (for now).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9190981547700935267-7610707474379769507?l=kenpuca.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kenpuca.blogspot.com/feeds/7610707474379769507/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9190981547700935267&amp;postID=7610707474379769507' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/7610707474379769507'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/7610707474379769507'/><link rel='alternate' type='text/html' href='http://kenpuca.blogspot.com/2011/01/unix-in-windows-native-way.html' title='Unix in Windows - the native way'/><author><name>Ken Pu</name><uri>https://profiles.google.com/110746162749419123574</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-p_hYiNFOMGk/AAAAAAAAAAI/AAAAAAAAAAA/bPaKHuA9dVs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9190981547700935267.post-7069583779446741458</id><published>2011-01-04T09:20:00.003-08:00</published><updated>2011-01-04T09:21:21.209-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='compiler'/><title type='text'>Tricky bug to find in Antlr</title><content type='html'>&lt;span class="Apple-style-span" style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;I am scratching my head for an hour before I realized that my bug was due to a spelling mismatch between a lexer rule in the lexer grammar and the parser grammar. Antlr never complained that the mis-spelled rule is missing. &amp;nbsp;Only at the run-time does it crash complaining of invalid input.&lt;/span&gt;&lt;br /&gt;&lt;div style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;So, take care with rule names when working with multiple files.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9190981547700935267-7069583779446741458?l=kenpuca.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kenpuca.blogspot.com/feeds/7069583779446741458/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9190981547700935267&amp;postID=7069583779446741458' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/7069583779446741458'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/7069583779446741458'/><link rel='alternate' type='text/html' href='http://kenpuca.blogspot.com/2011/01/tricky-bug-to-find-in-antlr.html' title='Tricky bug to find in Antlr'/><author><name>Ken Pu</name><uri>https://profiles.google.com/110746162749419123574</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-p_hYiNFOMGk/AAAAAAAAAAI/AAAAAAAAAAA/bPaKHuA9dVs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9190981547700935267.post-8842481164520244769</id><published>2011-01-04T09:20:00.001-08:00</published><updated>2011-01-04T09:21:42.162-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='compiler'/><title type='text'>Modifying token text in ANTLR</title><content type='html'>&lt;span class="Apple-style-span" style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;When parsing markup languages, often, we want to keep only the text and strip away the markup syntactic artifacts. &amp;nbsp;Using the rule driven code execution, we can do this right in the lexer:&lt;/span&gt;&lt;br /&gt;&lt;div style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;&lt;div&gt;&lt;span style="font-family: 'courier new', monospace;"&gt;TAG: '&amp;lt;' (~'&amp;gt;')+ '&amp;gt;' {state.text = $text.substring(1, $text.length()-1);};&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;All done.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9190981547700935267-8842481164520244769?l=kenpuca.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kenpuca.blogspot.com/feeds/8842481164520244769/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9190981547700935267&amp;postID=8842481164520244769' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/8842481164520244769'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/8842481164520244769'/><link rel='alternate' type='text/html' href='http://kenpuca.blogspot.com/2011/01/modifying-token-text.html' title='Modifying token text in ANTLR'/><author><name>Ken Pu</name><uri>https://profiles.google.com/110746162749419123574</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-p_hYiNFOMGk/AAAAAAAAAAI/AAAAAAAAAAA/bPaKHuA9dVs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9190981547700935267.post-8517204097181802757</id><published>2011-01-04T09:19:00.001-08:00</published><updated>2011-01-04T09:21:59.024-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='compiler'/><title type='text'>Emitting multiple tokens from a single lexer rule in ANTLR</title><content type='html'>&lt;span class="Apple-style-span" style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;Sounds crazy, but it's actually extremely useful.&lt;/span&gt;&lt;br /&gt;&lt;div style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;&lt;a href="http://www.antlr.org/wiki/pages/viewpage.action?pageId=3604497" rel="nofollow" style="color: #0033cc; text-decoration: underline;"&gt;http://www.antlr.org/wiki/pages/viewpage.action?pageId=3604497&lt;/a&gt;&lt;/div&gt;&lt;div style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;In summary:&lt;/div&gt;&lt;div style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;Lexer.emit(Token t) constructs the token and saves it somewhere.&lt;/div&gt;&lt;div style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;Lexer.nextToken() gets the next token in the stream.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9190981547700935267-8517204097181802757?l=kenpuca.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kenpuca.blogspot.com/feeds/8517204097181802757/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9190981547700935267&amp;postID=8517204097181802757' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/8517204097181802757'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/8517204097181802757'/><link rel='alternate' type='text/html' href='http://kenpuca.blogspot.com/2011/01/emitting-multiple-tokens-from-single.html' title='Emitting multiple tokens from a single lexer rule in ANTLR'/><author><name>Ken Pu</name><uri>https://profiles.google.com/110746162749419123574</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-p_hYiNFOMGk/AAAAAAAAAAI/AAAAAAAAAAA/bPaKHuA9dVs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9190981547700935267.post-7672898362916948421</id><published>2011-01-04T09:18:00.000-08:00</published><updated>2011-01-04T09:18:10.176-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='compiler'/><title type='text'>Manually controlling the look-ahead in semantic predicate</title><content type='html'>&lt;span class="Apple-style-span" style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;Example:&lt;/span&gt;&lt;br /&gt;&lt;div style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;&lt;span style="font-family: 'courier new', monospace;"&gt;HTML&lt;/span&gt;&lt;/div&gt;&lt;div style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;&lt;span style="font-family: 'courier new', monospace;"&gt;&lt;span&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;: {input.LA(1) == '&amp;lt;' &amp;amp;&amp;amp; input.LA(2) == 'h' &amp;amp;&amp;amp; input.LA(3) == 't'}?=&amp;gt; ''&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;&lt;span style="font-family: 'courier new', monospace;"&gt;&lt;span&gt;&lt;span&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;&lt;span style="font-family: 'courier new', monospace;"&gt;&lt;span&gt;&lt;span&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;&lt;span&gt;&lt;span&gt;&lt;span style="font-family: arial, sans-serif;"&gt;This rule is activated only if the next three characters are '&amp;lt;', 'h' and 't'.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9190981547700935267-7672898362916948421?l=kenpuca.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kenpuca.blogspot.com/feeds/7672898362916948421/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9190981547700935267&amp;postID=7672898362916948421' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/7672898362916948421'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/7672898362916948421'/><link rel='alternate' type='text/html' href='http://kenpuca.blogspot.com/2011/01/manually-controlling-look-ahead-in.html' title='Manually controlling the look-ahead in semantic predicate'/><author><name>Ken Pu</name><uri>https://profiles.google.com/110746162749419123574</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-p_hYiNFOMGk/AAAAAAAAAAI/AAAAAAAAAAA/bPaKHuA9dVs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9190981547700935267.post-2237150752506704916</id><published>2011-01-04T09:16:00.000-08:00</published><updated>2011-01-04T09:17:01.246-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='compiler'/><title type='text'>Installing Eclipse Antlr Plugin</title><content type='html'>&lt;span class="Apple-style-span" style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;Antlr has an&lt;/span&gt;&lt;span class="Apple-style-span" style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;&amp;nbsp;&lt;/span&gt;&lt;span class="Apple-style-span" style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;&lt;a href="http://antlrv3ide.sourceforge.net/" style="color: #0033cc; cursor: text; text-decoration: underline;" target="_blank"&gt;Eclipse plugin&lt;/a&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="color: #444444; font-family: Arial, Verdana, sans-serif; font-size: 14px; line-height: 21px;"&gt;. &amp;nbsp;I've had a lot of issues with plugin for Eclipse 3.6, so I would recommend to stick to Eclipse 3.5, and use the Antlr IDE version 2.0.2. &amp;nbsp;You will need to follow these steps:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="color: #444444; font-family: Arial, Verdana, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 14px; line-height: 21px;"&gt;Install the dynamic language toolkit first from Eclipse.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="color: #444444; font-family: Arial, Verdana, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 14px; line-height: 21px;"&gt;Install the Antlr IDE from:&amp;nbsp;http://antlrv3ide.sourceforge.net/updates/2.0.2&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="color: #444444; font-family: Arial, Verdana, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 14px; line-height: 21px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9190981547700935267-2237150752506704916?l=kenpuca.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kenpuca.blogspot.com/feeds/2237150752506704916/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9190981547700935267&amp;postID=2237150752506704916' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/2237150752506704916'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/2237150752506704916'/><link rel='alternate' type='text/html' href='http://kenpuca.blogspot.com/2011/01/installing-eclipse-antlr-plugin.html' title='Installing Eclipse Antlr Plugin'/><author><name>Ken Pu</name><uri>https://profiles.google.com/110746162749419123574</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-p_hYiNFOMGk/AAAAAAAAAAI/AAAAAAAAAAA/bPaKHuA9dVs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9190981547700935267.post-7629937897055522599</id><published>2010-12-12T10:00:00.000-08:00</published><updated>2010-12-12T10:00:51.522-08:00</updated><title type='text'>Setting up github repo</title><content type='html'>You really need a Linux environment because Git has not been made Windows friendly (to my knowledge).&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Install git, apt-get install git-core.&lt;/li&gt;&lt;li&gt;Create a project on github, say called "proj1".&lt;/li&gt;&lt;li&gt;On the local machine create a Git repository:&lt;/li&gt;&lt;ol&gt;&lt;li&gt;mkdir my-proj1&lt;/li&gt;&lt;li&gt;cd my-proj1&lt;/li&gt;&lt;li&gt;git init&lt;/li&gt;&lt;li&gt;touch README&lt;/li&gt;&lt;li&gt;git add README&lt;/li&gt;&lt;li&gt;git commit -m 'Creating the project'&lt;/li&gt;&lt;/ol&gt;&lt;li&gt;Now you have a master branch. &amp;nbsp; Create a remote repository to mirror this master branch.&lt;/li&gt;&lt;ol&gt;&lt;li&gt;git remote add github-repo git@github.com:kenpu/proj1.git&lt;/li&gt;&lt;li&gt;git push github-repo master&lt;/li&gt;&lt;/ol&gt;&lt;/ol&gt;This starts a project hosted by Github.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9190981547700935267-7629937897055522599?l=kenpuca.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kenpuca.blogspot.com/feeds/7629937897055522599/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9190981547700935267&amp;postID=7629937897055522599' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/7629937897055522599'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/7629937897055522599'/><link rel='alternate' type='text/html' href='http://kenpuca.blogspot.com/2010/12/setting-up-github-repo.html' title='Setting up github repo'/><author><name>Ken Pu</name><uri>https://profiles.google.com/110746162749419123574</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-p_hYiNFOMGk/AAAAAAAAAAI/AAAAAAAAAAA/bPaKHuA9dVs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9190981547700935267.post-2172071388051440349</id><published>2010-12-05T08:12:00.000-08:00</published><updated>2010-12-05T08:12:32.275-08:00</updated><title type='text'>Installing Twisted 10 on Windows</title><content type='html'>I have been on Twisted 8.0 for the longest time. &amp;nbsp;After getting a new laptop, I decided to setup my dev. environment with all new and shiny versions. &amp;nbsp;So, I went to Twisted 10.2.0, and to a bit of a surprise, it didn't come with zope.interface. &amp;nbsp;Here is a summary of what worked for me from a lot of blogs and forum search.&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Download Twisted-10.2.0 for Python for Windows MSI from the &lt;a href="http://twistedmatrix.com/trac/wiki/Downloads"&gt;Twisted download section&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;Run it. &amp;nbsp;It will install the Twisted packages, but not the zope interfaces used by Twisted.&lt;/li&gt;&lt;li&gt;Download setuptools-0.6c11 for Windows from the &lt;a href="http://pypi.python.org/pypi/setuptools"&gt;Python Package Index download section&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;Now, you have an easy_install.exe in your Python\Scripts directory. &amp;nbsp;It's a good idea to add Python\Scripts into your PATH.&lt;/li&gt;&lt;li&gt;Run easy_install zope.interface, and that will locate the proper zope.interface package from the repository.&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;All done.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9190981547700935267-2172071388051440349?l=kenpuca.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kenpuca.blogspot.com/feeds/2172071388051440349/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9190981547700935267&amp;postID=2172071388051440349' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/2172071388051440349'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/2172071388051440349'/><link rel='alternate' type='text/html' href='http://kenpuca.blogspot.com/2010/12/installing-twisted-10-on-windows.html' title='Installing Twisted 10 on Windows'/><author><name>Ken Pu</name><uri>https://profiles.google.com/110746162749419123574</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-p_hYiNFOMGk/AAAAAAAAAAI/AAAAAAAAAAA/bPaKHuA9dVs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9190981547700935267.post-8320768263682412852</id><published>2010-11-13T09:39:00.000-08:00</published><updated>2010-11-13T09:39:24.512-08:00</updated><title type='text'>Dropbox, Windows 7 Hibernation, and Startup cleanup...</title><content type='html'>Okay, so I was handed a new laptop (Lenovo X201 Tablet) by our IT support as part of the perks of working at my university. &amp;nbsp;I really like the laptop - well built, good battery life, and GREAT processor (Intel Core i7). &amp;nbsp;It has Windows 7 running. &amp;nbsp;While I am a Linux native, Windows 7 has managed to charm me in its vast improvement from the previous versions of Windows.&lt;br /&gt;&lt;br /&gt;Now, the trouble starts: I installed Dropbox client among lots and lots of other my favourites (Google essential pack, mu-torrient client, etc.). &amp;nbsp;So, a ton of things were added to the Windows 7. &amp;nbsp;Then end result is that my windows can no longer to into sleep mode. &amp;nbsp;The symptom is that it just hangs if the sleep mode is trigged. &amp;nbsp;I didn't know which program is causing the hanging. &amp;nbsp;This is what I did in the process of &lt;i&gt;debugging&lt;/i&gt;:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Enabled hibernation option. &amp;nbsp;This was not necessary to the eventual discovery of the problem.&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;powercfg /hibernate on&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;Downloaded an&amp;nbsp;&lt;i&gt;&lt;a href="http://technet.microsoft.com/en-us/sysinternals/bb963902.aspx"&gt;autoruns &lt;/a&gt;&lt;/i&gt;analysis problem. &amp;nbsp;It lists all the auto-runs. &amp;nbsp;I suspected that one of the programs I installed were causing the hanging.&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;Started to disable one after another using &lt;i&gt;autoruns&lt;/i&gt;.&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;Viola! &amp;nbsp;Dropbox client prevents Windows 7 to sleep for my Lenovo X201 tablet. &amp;nbsp;I don't know why, but now, I live with manually starting dropbox client when I need it.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Ken&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9190981547700935267-8320768263682412852?l=kenpuca.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kenpuca.blogspot.com/feeds/8320768263682412852/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9190981547700935267&amp;postID=8320768263682412852' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/8320768263682412852'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/8320768263682412852'/><link rel='alternate' type='text/html' href='http://kenpuca.blogspot.com/2010/11/dropbox-windows-7-hibernation-and.html' title='Dropbox, Windows 7 Hibernation, and Startup cleanup...'/><author><name>Ken Pu</name><uri>https://profiles.google.com/110746162749419123574</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-p_hYiNFOMGk/AAAAAAAAAAI/AAAAAAAAAAA/bPaKHuA9dVs/s512-c/photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9190981547700935267.post-4779306967873401818</id><published>2010-09-27T12:43:00.000-07:00</published><updated>2010-09-27T12:43:42.232-07:00</updated><title type='text'>Integrating apache and google site via proxy</title><content type='html'>&lt;a href="http://sites.google.com/"&gt;Google site&lt;/a&gt; is really quite simple to how, and it looks surprisingly good.&amp;nbsp; I've started using Google sites to organize some of my &lt;a href="http://sites.google.com/site/kenputeaching"&gt;course material&lt;/a&gt;, and it works really well.&amp;nbsp; I even started to use Google site to organize my own research activities, and I find that the non-flat structure really helps me to organize my thoughts, which makes it a preferred choice over Google doc.&lt;br /&gt;&lt;br /&gt;Now, the down side: the URL is plain awful.&amp;nbsp; My home page is called at http://leda.science.uoit.ca/kenpu, but the teaching sites are at http://sites.google.com/site/kenputeaching.&amp;nbsp; People, typically, pay a few bucks a year and buy a domain name, and create a CNAME record with the DNS provider to link the domain name to the Google site, but this does not work for me, because I want the Google site to be a URL under my main site: leda.science.uoit.ca.&lt;br /&gt;&lt;br /&gt;The solution is to configure my main server (running Apache 2) to act as a proxy server to the Google site.&amp;nbsp; You might think that this should be simple to do, but it ended up eating up a couple of hours of my time.&amp;nbsp; But now, it works...&amp;nbsp; Here is my story of getting it working for me:&lt;br /&gt;&lt;br /&gt;The software I use are:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Ubuntu 10.04 LTS as the OS.&lt;/li&gt;&lt;li&gt;Apache 2.2, the default version from &lt;code&gt;apt-get install apache2&lt;/code&gt;.&lt;/li&gt;&lt;li&gt;mod_proxy, this comes with Apache 2, so you don't need to install it explicitly.&lt;/li&gt;&lt;li&gt;mod_proxy_html, you need to install this separately, but it should be easy enough with &lt;code&gt;apt-get install libapache2-mod-proxy-html&lt;/code&gt;.&amp;nbsp; This will install proxy_html 3.0.&amp;nbsp; The latest is 3.1 which comes with some minor differences.&amp;nbsp; I assume that we use proxy_html 3.0.&lt;/li&gt;&lt;/ol&gt;The google site I wish to integrate is called:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;http://sites.google.com/site/kenputeaching &lt;/pre&gt;I want to map &lt;code&gt;http://leda.science.uoit.ca/teaching&lt;/code&gt; to the google site.&amp;nbsp; I found many documentations and tutorials (&lt;a href="http://httpd.apache.org/docs/2.2/mod/mod_proxy.html"&gt;here&lt;/a&gt;, &lt;a href="http://apache.webthing.com/mod_proxy_html/config.html"&gt;here&lt;/a&gt; and &lt;a href="http://www.askapache.com/htaccess/reverse-proxy-apache.html"&gt;here&lt;/a&gt;).&amp;nbsp; While they cover the basics, there were many pitfalls, that took me awhile to solve.&amp;nbsp; For the impatients, here are the steps that worked for me.&lt;br /&gt;&lt;br /&gt;Everything is done on my main machine &lt;i&gt;leda.science.uoit.ca&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;Map the URL /teaching to the Google site.&amp;nbsp; This uses the &lt;i&gt;ProxyPass &lt;/i&gt;directive of Apache2.&amp;nbsp; In /etc/apache2/sites-enable/000-default, add (inside the &lt;i&gt;&lt;virtualhost *:80=""&gt; &lt;/virtualhost&gt;&lt;/i&gt;section):&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;ProxyRequests OffProxyPass /teaching/ http://sites.google.com/site/kenputeaching/&lt;/pre&gt;&lt;br /&gt;If one is to visit &lt;i&gt;http://leda.science.uoit.ca/teaching&lt;/i&gt;, one sees the Google site instead.&amp;nbsp; However, some links are broken or links to &lt;i&gt;http://sites.google.com/...&lt;/i&gt;&lt;br /&gt;Next, we configure &lt;i&gt;proxy_html&lt;/i&gt; to rewrite all occurrences of &lt;i&gt;sites.google.com/site/kenputeaching/&lt;/i&gt; to &lt;i&gt;leda.science.uoit.ca/teaching&lt;/i&gt;.&amp;nbsp; This is done by inserting the following lines after the &lt;i&gt;ProxyPass&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;ProxyHTMLURLMap http://sites.google.com/site/kenputeaching /teaching&lt;br /&gt;&amp;nbsp; &amp;lt;Location /teaching/&amp;gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ProxyPassReverse http://sites.google.com/&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; SetOutputFilter proxy-html&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ProxyHTMLURLMap /site/kenputeaching /teaching&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; RequestHeader unset Accept-Encoding&lt;br /&gt;&amp;nbsp; &amp;lt;/Location&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Now, everything should work.&amp;nbsp; Make sure that you restart the Apache server when the conf file is updated, and that you clear the browser cache when reloading a page.&lt;br /&gt;&lt;br /&gt;For the more patient ones, these were the problems I wasted a lot of time figuring out:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;i&gt;SetOutputFilter proxy-html&lt;/i&gt;.&amp;nbsp; The online documentation of proxy-html didn't really emphasize this, so I never sent the output for proxy-html to perform the rewriting.&amp;nbsp; Lots of time was wasted trying to answer the question "Why isn't my rewrite rule being applied?"&amp;nbsp;&lt;/li&gt;&lt;li&gt;Google site sends the data in compressed form, so proxy-html cannot parse the compressed HTML (at least not for version 3.0).&amp;nbsp; This is addressed by &lt;i&gt;RequestHeader unset Accept-Encoding&lt;/i&gt;.&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9190981547700935267-4779306967873401818?l=kenpuca.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kenpuca.blogspot.com/feeds/4779306967873401818/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9190981547700935267&amp;postID=4779306967873401818' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/4779306967873401818'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/4779306967873401818'/><link rel='alternate' type='text/html' href='http://kenpuca.blogspot.com/2010/09/integrating-apache-and-google-site-via.html' title='Integrating apache and google site via proxy'/><author><name>Ken Pu</name><uri>https://profiles.google.com/110746162749419123574</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-p_hYiNFOMGk/AAAAAAAAAAI/AAAAAAAAAAA/bPaKHuA9dVs/s512-c/photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9190981547700935267.post-5074944417991458221</id><published>2010-02-22T18:17:00.001-08:00</published><updated>2010-02-22T18:25:46.002-08:00</updated><title type='text'>PyLucene, multiprocessing and all that...</title><content type='html'>It's been awhile since my last post.  Hopefully, I will be better at keeping up with the writing.&lt;br /&gt;&lt;br /&gt;I've been working with PyLucene for the past while, and in attempt to speed up search queries, I discovered quite a few things I didn't know before.  I thought that some are worth sharing.&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;Lucene is just as much CPU intensive as it is disk intensive.  The latter is more or less as expected, but I had no idea that Lucene requires so much CPU cycles.  So when planning to distribute your Lucene queries over multiple threads / processes, think in terms of the number of cores of your server.  Two dual core CPU implies four cores.  So having more than four Lucene threads won't really help anymore.  This is especially true of your queries are complex.&lt;/li&gt;&lt;li&gt;I tried to use Python's multiprocessing package to perform distributed PyLucene search.  PyLucene requies lucene.initVM() call which isn't multiprocess safe.  It created a ton of headaches for me.  In the end, I concluded that:&lt;br /&gt;&lt;b&gt;You can only include lucene package and call lucene.initVM(...) in your worker processes.  So the main process &lt;u&gt;cannot&lt;/u&gt; call lucene.initVM(...)&lt;/b&gt;&lt;/li&gt;&lt;li&gt;Python processes are heavy, so don't use them for just a single query.  Instead, use one multiprocess.Queue to pass a stream of queries to the worker threads, and collect the search results using another multiprocessing.Queue.&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;That's it for today.  Baby's getting fussy now...&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9190981547700935267-5074944417991458221?l=kenpuca.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kenpuca.blogspot.com/feeds/5074944417991458221/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9190981547700935267&amp;postID=5074944417991458221' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/5074944417991458221'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/5074944417991458221'/><link rel='alternate' type='text/html' href='http://kenpuca.blogspot.com/2010/02/pylucene-multiprocessing-and-all-that.html' title='PyLucene, multiprocessing and all that...'/><author><name>Ken Pu</name><uri>https://profiles.google.com/110746162749419123574</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-p_hYiNFOMGk/AAAAAAAAAAI/AAAAAAAAAAA/bPaKHuA9dVs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9190981547700935267.post-5998367999749400293</id><published>2009-10-02T10:45:00.000-07:00</published><updated>2009-10-02T12:19:01.166-07:00</updated><title type='text'>Pylucene tips and tricks</title><content type='html'>&lt;a href="http://lucene.apache.org/pylucene/"&gt;PyLucene &lt;/a&gt;is the Lucene interface to the Python language.  Here are a few how-to's for PyLucene.&lt;br /&gt;&lt;br /&gt;&lt;b&gt; Load Lucene &lt;/b&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;import lucene&lt;br /&gt;lucene.initVM(lucene.CLASSPATH)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt; Open an index for writing &lt;/b&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;writer = lucene.IndexWriter("/home/lucene/index", lucene.StandardAnalyzer())&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt; Open an index for searching &lt;/b&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;searcher = lucene.IndexSearcher("/home/lucene/index");&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt; Create a document &lt;/b&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;doc = lucene.Document()&lt;br /&gt;doc.add(lucene.Field(&lt;br /&gt;  "title", &lt;br /&gt;  "This is a long title for an essay",&lt;br /&gt;  lucene.Field.Store.YES, &lt;br /&gt;  lucene.Field.Index.TOKENIZED))&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt; Put the document into the index &lt;/b&gt;&lt;br /&gt;&lt;em&gt; Don't forget to optimize the index before closing.  It merges multiple segment files created during the writer.addDocument() phase.  This will significantly improve the search time during querying phase.&lt;/em&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;writer.addDocument(doc)&lt;br /&gt;...&lt;br /&gt;writer.optimize()&lt;br /&gt;writer.close()&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt; Making a query &lt;/b&gt;&lt;br /&gt;&lt;em&gt; When working in Python, it's particularly convenient to build queries from a string using the Lucene query syntax.  However, you may look into building queries pragmatically using the more basic building like lucene.TermQuery, lucene.BooleanQuery etc.&lt;/em&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;parser = lucene.QueryParser("title", lucene.StandardAnalyzer())&lt;br /&gt;query = parser.parse("+happy movie year:1990")&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt; Searching the index &lt;/b&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;hits = searcher.search(query)&lt;br /&gt;for h in hits:&lt;br /&gt;  hit = lucene.Hit.cast_(h)&lt;br /&gt;  id, doc = hit.getId(), hit.getDocument()&lt;br /&gt;  return (id, doc)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt; Getting the fields in a document &lt;/b&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;// doc is a PyLucene document&lt;br /&gt;for f in doc.getFields():&lt;br /&gt;  field = lucene.Field.cast_(f)&lt;br /&gt;  (k, v) = field.name(), field.stringValue()&lt;br /&gt;  print "field name = %s, its value = %s" % (k,v)&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9190981547700935267-5998367999749400293?l=kenpuca.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kenpuca.blogspot.com/feeds/5998367999749400293/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9190981547700935267&amp;postID=5998367999749400293' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/5998367999749400293'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/5998367999749400293'/><link rel='alternate' type='text/html' href='http://kenpuca.blogspot.com/2009/10/pylucene-tips-and-tricks.html' title='Pylucene tips and tricks'/><author><name>Ken Pu</name><uri>https://profiles.google.com/110746162749419123574</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-p_hYiNFOMGk/AAAAAAAAAAI/AAAAAAAAAAA/bPaKHuA9dVs/s512-c/photo.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9190981547700935267.post-7926769978805078128</id><published>2009-09-22T22:55:00.000-07:00</published><updated>2009-09-22T23:02:37.460-07:00</updated><title type='text'></title><content type='html'>&lt;span style="font-style: italic;"&gt;Where are my packages?  &lt;/span&gt;If you are programming in Python and using a Debian distribution, then you may from time to time wonder where the hack the source for your favourite Python packages are in fact located.  Many of us don't know off the top of our head because we are dumb'ed by the wonderful &lt;span style="font-style: italic;"&gt;apt&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;easy_install&lt;/span&gt; tools for quick-n-dirty installation of Python packages.&lt;br /&gt;Here is a fast and sure way of locating the source of your packages:  use the __file__ attribute of all imported package.  Consider the scenario that you need to look up django.contrib.auth source code to see how user password is encrypted by the Django Web application framework.&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;&gt;&gt;&gt; import django.contrib.auth&lt;br /&gt;&gt;&gt;&gt; django.contrib.auth.__file__&lt;br /&gt;'/usr/local/lib/python2.6/dist-packages/django/contrib/auth/__init__.pyc'&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;There, it is where it is.  Now you can vim your way through the code...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9190981547700935267-7926769978805078128?l=kenpuca.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kenpuca.blogspot.com/feeds/7926769978805078128/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9190981547700935267&amp;postID=7926769978805078128' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/7926769978805078128'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/7926769978805078128'/><link rel='alternate' type='text/html' href='http://kenpuca.blogspot.com/2009/09/where-are-my-packages-if-you-are.html' title=''/><author><name>Ken Pu</name><uri>https://profiles.google.com/110746162749419123574</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-p_hYiNFOMGk/AAAAAAAAAAI/AAAAAAAAAAA/bPaKHuA9dVs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9190981547700935267.post-5325699334734351738</id><published>2009-08-18T09:08:00.000-07:00</published><updated>2009-08-18T09:26:45.590-07:00</updated><title type='text'>Stealing data from the Web</title><content type='html'>The Web hosts so much data (often referred to as the Deep Web) that if we can freely query it, one can't even begin to imagine the questions one can ask.  Many research papers have been written on how to harness the Deep Web.  See Communications of ACM, Volume 50, Issue 5, May 2007.  But if you actually sit down and try to pull data off other sites, you very quickly hit many walls.  These walls are setup to protect the data from data collecting crawlers, so that only users through Web browsers can get to them.  I'll just talk about two of the walls: protection by POST and COOKIES.  I will use wget, but you can use a number of alternatives: curl, httpclient for Java, urllib2 for Python.&lt;br /&gt;&lt;br /&gt;1. Figure out what POST parameters they are expecting.  Use a Firefox plugin, called Firebug to monitor the HTTP headers being sent to the Web page that you are trying to pull data off of.&lt;br /&gt;&lt;br /&gt;2. Mimic browser with wget using the --user-agent parameter.&lt;br /&gt;&lt;br /&gt;3. Send the expected POST parameter using --post-data 'name=value&amp;name=value...'&lt;br /&gt;&lt;br /&gt;4. Send the expected COOKIE parameters using --load-cookie file.  But this is actually somewhat inconvenient because you have to prepare the cookie file.  I prefer to use --no-cookies --header "Cookie: name=value" instead.&lt;br /&gt;&lt;br /&gt;Many years ago, when I was looking to buy a house, I was crawling www.mls.ca site on a nightly basis to mine for a good deal.  Well, nowadays, what I wrote using PERL + WGET has become a standard feature offered by any decent realstate agent.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9190981547700935267-5325699334734351738?l=kenpuca.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kenpuca.blogspot.com/feeds/5325699334734351738/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9190981547700935267&amp;postID=5325699334734351738' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/5325699334734351738'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/5325699334734351738'/><link rel='alternate' type='text/html' href='http://kenpuca.blogspot.com/2009/08/stealing-data-from-web.html' title='Stealing data from the Web'/><author><name>Ken Pu</name><uri>https://profiles.google.com/110746162749419123574</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-p_hYiNFOMGk/AAAAAAAAAAI/AAAAAAAAAAA/bPaKHuA9dVs/s512-c/photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9190981547700935267.post-1843059440015132325</id><published>2009-07-03T20:16:00.000-07:00</published><updated>2009-07-19T22:46:08.158-07:00</updated><title type='text'>My work station setup</title><content type='html'>I recently purchased a new workstation with plenty of RAM and lots of hard-drive space.  So, what should I put on it?  I guess it depends on what one does for work.  I do a few things:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Write lots of code as rapidly as I can manage.  So I program a lot in Python and Java.  Furthermore, the type of software I write deals with large volumes of data and tends to run in server mode.  &lt;span style="font-style: italic;"&gt;Ubuntu please.&lt;/span&gt;&lt;/li&gt;&lt;li&gt;Write articles using LaTeX.  Again, I prefer to do it in a Linux environment.&lt;/li&gt;&lt;li&gt;Write presentation slides using PowerPoint.  This is mainly for my courses.  I actually enjoy making good slides for my lectures in Prosper.  Unfortunately, time is not on my side, so I'm somewhat forced to use Office.&lt;/li&gt;&lt;li&gt;Filling up forms (endlessly) in Office 2007 format.&lt;/li&gt;&lt;li&gt;Save cost on software licenses.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;So you can see, I'm sort of stuck between Linux and Windows.  Mac OS is out of the question because my workstation is all PC.  Below is a step-by-step description of how I setup my machine.&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Download Ubuntu 9.04 Desktop ISO image, and burn it on to a CD/DVD.&lt;/li&gt;&lt;li&gt;Install Ubuntu.  I allocated the entire drive to Ubuntu (ext3 and swap) partitions.  Don't worry, we will install Windows still.&lt;/li&gt;&lt;li&gt;Install build-essential, sun-java6-sdk for the basic programming needs.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Install texlive-latex and all the fonts, extra and utils packages for my technical writing.&lt;/li&gt;&lt;li&gt;Install various scientific packages for my work.&lt;/li&gt;&lt;li&gt;Install samba, openssh-server for remotely accessing my workstation&lt;br /&gt;Samba is important for Windows (to be installed later) to share files with Ubuntu.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Setup samba:&lt;br /&gt;- edit /etc/samba/smb.conf to enable home directory sharing.&lt;br /&gt;- create samba user using smbpasswd -a &lt;username&gt;&lt;br /&gt;- test samba accessibility with the command &lt;span style="font-style: italic;"&gt;&lt;br /&gt;                      smbclient //localhost/&lt;user&gt;&lt;/user&gt;&lt;/span&gt;&lt;br /&gt;&lt;/username&gt;&lt;/li&gt;&lt;li&gt;Download vmware-server 2.0 from VMWARE.  Here is a &lt;a href="http://www.howtoforge.com/how-to-install-vmware-server-2-on-ubuntu-8.10"&gt;good guide&lt;/a&gt; for Ubuntu 8.10, but it applies to 9.04 as well.  It's worth noting that vmware-server has changed completely - the system console is GONE.  It's replaced with Web interface which requires Firefox plug-in to function.&lt;/li&gt;&lt;li&gt;Install Windows XP service pack 2 as a guest OS.  I have chosen to use NAT for the network adaptor of the guest OS.  When NAT is used, the host OS takes 192.168.187.1, while the guest OS's take on higher IP addresses 192.168.187.x.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;You will first find that the arrow keys don't work.  &lt;a href="http://communities.vmware.com/thread/177321"&gt;Here&lt;/a&gt; is a fix.&lt;/li&gt;&lt;li&gt;Now, you need to have XP and Ubuntu share files.  It's easy with Samba.&lt;br /&gt;In Windows, create a new network place, and use \\192.168.187.1\&lt;username&gt;, and specify login as &lt;username&gt; and the password.&lt;/username&gt;&lt;/username&gt;&lt;/li&gt;&lt;li&gt;I had to also install Office 2007 for sharing documents with co-workers.&lt;/li&gt;&lt;/ol&gt;There done: a workstation running Ubuntu and Windows happily.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9190981547700935267-1843059440015132325?l=kenpuca.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kenpuca.blogspot.com/feeds/1843059440015132325/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9190981547700935267&amp;postID=1843059440015132325' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/1843059440015132325'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/1843059440015132325'/><link rel='alternate' type='text/html' href='http://kenpuca.blogspot.com/2009/07/my-work-station-setup.html' title='My work station setup'/><author><name>Ken Pu</name><uri>https://profiles.google.com/110746162749419123574</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-p_hYiNFOMGk/AAAAAAAAAAI/AAAAAAAAAAA/bPaKHuA9dVs/s512-c/photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9190981547700935267.post-5429187098072787383</id><published>2009-06-16T08:54:00.000-07:00</published><updated>2009-06-16T09:02:23.225-07:00</updated><title type='text'>Ubuntu 9.04 upgrade, MySQL and Appamor</title><content type='html'>My Ubuntu server was getting old, so I decided to upgrade it to the latest version (9.04), only to find that MySQL no longer starts.  Long story short: I had my &lt;span style="font-style: italic;"&gt;datadir&lt;/span&gt; set to somewhere else, not registered with AppAmor which is now used by Ubuntu.&lt;br /&gt;&lt;br /&gt;There are two solutions:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Configure appamor - (see &lt;a href="http://www.vanimpe.eu/blog/2009/01/13/requested_mask-denied_mask-errors-on-ubuntu-with-mysql/"&gt;here&lt;/a&gt;) so it allows MySQL server process to access the data directory.&lt;/li&gt;&lt;li&gt;Disable appamour altogether  (see here) so the traditional UNIX file permission model prevails again:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;sudo /etc/init.d/apparmor stop&lt;br /&gt;sudo update-rc.d -f apparmor remove&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9190981547700935267-5429187098072787383?l=kenpuca.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kenpuca.blogspot.com/feeds/5429187098072787383/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9190981547700935267&amp;postID=5429187098072787383' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/5429187098072787383'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/5429187098072787383'/><link rel='alternate' type='text/html' href='http://kenpuca.blogspot.com/2009/06/ubuntu-904-mysql-and.html' title='Ubuntu 9.04 upgrade, MySQL and Appamor'/><author><name>Ken Pu</name><uri>https://profiles.google.com/110746162749419123574</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-p_hYiNFOMGk/AAAAAAAAAAI/AAAAAAAAAAA/bPaKHuA9dVs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9190981547700935267.post-1561875824458443758</id><published>2009-05-27T12:37:00.000-07:00</published><updated>2009-05-27T12:45:59.683-07:00</updated><title type='text'>Google Android dev phone with an existing Rogers plan</title><content type='html'>I have had a cheap Rogers plan (125 min) for the past 5+ years.  I always stayed away from the fancy data plans, Internet access, or even text messaging - call me old-fashioned.  But the new Android platform and its open development environment forced me to see smart phones in a new light.  So I went and bought a Google Android dev phone, (T-mobile G1) from the Android market.  After hours of fiddling and following some well written blogs (&lt;a href="http://oliverfisher.blogspot.com/2008/10/android-g1-phone-in-canada-on-rogers.html"&gt;this one&lt;/a&gt; and &lt;a href="http://kidd411.com/blog/2009/01/getting-the-g1-to-work-in-canada.html"&gt;this one&lt;/a&gt; ) about activating the T-mobile G1 phone in Canada, I still get the "Unable to communicate with Google server" message.  Finally, this is what solved my problem.&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Put the SIM card back to my old Sony Ericsson phone.&lt;/li&gt;&lt;li&gt;Select SETTINGS &gt; MASTER RESET&lt;/li&gt;&lt;li&gt;Take out the SIM card from the Sony phone, and put it back to the Google phone.&lt;/li&gt;&lt;li&gt;Delete all APN settings, and recreate the two APN's: Rogers and Rogers MMS using the APN settings.  See&lt;br /&gt;&lt;a href="http://kidd411.com/blog/2009/01/getting-the-g1-to-work-in-canada.html"&gt;here&lt;/a&gt; for details.&lt;/li&gt;&lt;/ol&gt;Viola.  It works.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9190981547700935267-1561875824458443758?l=kenpuca.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kenpuca.blogspot.com/feeds/1561875824458443758/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9190981547700935267&amp;postID=1561875824458443758' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/1561875824458443758'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/1561875824458443758'/><link rel='alternate' type='text/html' href='http://kenpuca.blogspot.com/2009/05/google-android-dev-phone-with-existing.html' title='Google Android dev phone with an existing Rogers plan'/><author><name>Ken Pu</name><uri>https://profiles.google.com/110746162749419123574</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-p_hYiNFOMGk/AAAAAAAAAAI/AAAAAAAAAAA/bPaKHuA9dVs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9190981547700935267.post-7433344953205537498</id><published>2009-05-23T14:20:00.000-07:00</published><updated>2009-05-27T12:36:49.692-07:00</updated><title type='text'>Python Twisted: bring RFID, wireless sensors and Web data together</title><content type='html'>I have a pretty cool job (Google me, and you will see).  I get paid to play around with neat things to create even neater things - all in the name of Science and Research.  I'll describe the neat things I have been using, and we can get into the neater things we are working on.&lt;br /&gt;&lt;br /&gt;Neat thing #1: Data Streams&lt;br /&gt;&lt;br /&gt;[RFID data streams]&lt;br /&gt;At the lab, we have RFID equipment: tags and readers.  Active tags are especially interesting because they can carry dynamic data such as environmental temperature reading and vibration detection.&lt;br /&gt;&lt;br /&gt;[Wireless sensor data streams]&lt;br /&gt;Equally interesting to work with are the various wireless sensors we purchased from Point Six (Google Point Six).  One of the most useful sensors we deployed is the power sensors, allowing us to calculate the energy cost used by various electrical appliances, from computing servers to desk lamps, down to the cent.&lt;br /&gt;&lt;br /&gt;[Webcam data streams]&lt;br /&gt;We've been gathering real-time webcam feeds, especially traffic cameras around Southern Ontario.  At 1 frame per 2 seconds, with a few hundred cameras, the volume of incoming data from the traffic Webcams is actually quite high.&lt;br /&gt;&lt;br /&gt;Neat thing #2: Python Twisted&lt;br /&gt;&lt;br /&gt;I had a student working with me for summers over the past two years.  We tried a bunch of distributed infrastructure to manage the data streams to support:&lt;br /&gt;    - Live feed to Web portal&lt;br /&gt;    - Ad-hoc user queries&lt;br /&gt;    - Historic archival for user queries&lt;br /&gt;We tried to develop our own Java RMI based solution which was than extended to XML-RPC Web services, but none offered the flexibility, scalability and robustness that I wanted.  In the past year, I started looking at a Python Twisted framework, and sure enough, it is definitely the right thing for our problem.  The biggest problem with our previous approaches is the treatment of concurrency using threads.  Threads fundamentally limits the number of concurrent data streams that each process can handle.  Furthermore, we constantly needed to worry about lockings when accessing shared databases.  What we really needed was a turn-key event loop system that supports asynchronized I/O for sockets, files, HTTP, etc.  And that's exactly what Twisted was built to do.  So, thanks Twisted Lab.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;The event neater thing we are working on:&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;We try to build distributed archival service that acts as the storage the heterogeneous data streams and a query processor for ad-hoc user queries.  Here is the summary of our solution implemented using Python Twisted.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;For image streams from Web cameras, we use&lt;br /&gt;&lt;pre&gt; twisted.web.client &lt;/pre&gt;&lt;br /&gt;to query the remote Web pages for image download.  It nicely integrates with the main event loop using &lt;br /&gt;&lt;pre&gt; twisted.reactor.callLater &lt;/pre&gt;&lt;br /&gt;&lt;em&gt; Thanks to &lt;a href="http://0xfe.blogspot.com/2006/03/following-log-file-with-twisted.html"&gt;this blog&lt;/a&gt; for the example of continuously polling a data source. &lt;/em&gt;&lt;br /&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt; For the wireless sensors, the receiver communicates using the RS-232 serial port.  I very thankful that the Python community has not forgotten the good-old COM ports.  Python supports serial port for Linux and Windows with the package &lt;em&gt; pyserial &lt;/em&gt;.  Twisted provides the special event loop:&lt;br /&gt;&lt;pre&gt; twisted.internet.serialport &lt;/pre&gt;&lt;br /&gt;to interface with Win32 or Linux posix serial port communication.&lt;br /&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt; The RFID readers communicate via TCP/IP in either the push model (readers are clients) or the pull model (readers are servers).  Either way, Twisted can handle it.  In the push mode, we have a &lt;em&gt; Protocol &lt;/em&gt; to handle the data coming in.  Conversely, in the pull model, we implemented a TCP clien &lt;em&gt; twisted.internet.tcp.Client &lt;/em&gt; to probe the RFID readers.&lt;br /&gt;&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;The event-based framework of Twisted offered several advantages over the thread-based Java solution.  It's proven to be more scalable: we can handle about 30 connections using just one Twisted instance.  We are also able to fire up multiple instances to load balance.&lt;br /&gt;&lt;br /&gt;As an added advantage, we get for free the different modes for user to interact with the archived data: Web front end (&lt;em&gt;twisted.web2&lt;/em&gt;), Web services (&lt;em&gt;twisted.web.xmlrpc&lt;/em&gt;), SSH (&lt;em&gt;twisted.conch&lt;/em&gt;), and even using Chat clients (&lt;em&gt;twisted.words.protocols&lt;/em&gt;).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9190981547700935267-7433344953205537498?l=kenpuca.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kenpuca.blogspot.com/feeds/7433344953205537498/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9190981547700935267&amp;postID=7433344953205537498' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/7433344953205537498'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/7433344953205537498'/><link rel='alternate' type='text/html' href='http://kenpuca.blogspot.com/2009/05/python-twisted-bring-rfid-wireless.html' title='Python Twisted: bring RFID, wireless sensors and Web data together'/><author><name>Ken Pu</name><uri>https://profiles.google.com/110746162749419123574</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-p_hYiNFOMGk/AAAAAAAAAAI/AAAAAAAAAAA/bPaKHuA9dVs/s512-c/photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-9190981547700935267.post-4611627813667689911</id><published>2009-02-01T18:56:00.001-08:00</published><updated>2009-02-02T14:14:49.060-08:00</updated><title type='text'>Baby cries</title><content type='html'>My baby cries endlessly.&lt;br /&gt;&lt;br /&gt;&lt;object width="425" height="344"&gt;&lt;param name="movie" value="http://www.youtube.com/v/7OkfBjMHGDI&amp;hl=en&amp;fs=1"&gt;&lt;/param&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;/param&gt;&lt;param name="allowscriptaccess" value="always"&gt;&lt;/param&gt;&lt;embed src="http://www.youtube.com/v/7OkfBjMHGDI&amp;hl=en&amp;fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/9190981547700935267-4611627813667689911?l=kenpuca.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kenpuca.blogspot.com/feeds/4611627813667689911/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=9190981547700935267&amp;postID=4611627813667689911' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/4611627813667689911'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/9190981547700935267/posts/default/4611627813667689911'/><link rel='alternate' type='text/html' href='http://kenpuca.blogspot.com/2009/02/baby-cries.html' title='Baby cries'/><author><name>Ken Pu</name><uri>https://profiles.google.com/110746162749419123574</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-p_hYiNFOMGk/AAAAAAAAAAI/AAAAAAAAAAA/bPaKHuA9dVs/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry></feed>
