<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress.com" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>parser &amp;laquo; WordPress.com Tag Feed</title>
	<link>http://en.wordpress.com/tag/parser/</link>
	<description>Feed of posts on WordPress.com tagged "parser"</description>
	<pubDate>Sat, 28 Nov 2009 15:31:23 +0000</pubDate>

	<generator>http://en.wordpress.com/tags/</generator>
	<language>en</language>

<item>
<title><![CDATA[Erkenntnisse des Tages]]></title>
<link>http://ayekat.wordpress.com/2009/11/28/erkenntnisse-des-tages-2/</link>
<pubDate>Fri, 27 Nov 2009 23:14:53 +0000</pubDate>
<dc:creator>ayekat</dc:creator>
<guid>http://ayekat.wordpress.com/2009/11/28/erkenntnisse-des-tages-2/</guid>
<description><![CDATA[Erkenntnis 1 Mein AGAIN-Projekt entwickelt sich voran. Oder zumindest tut es das in der Theorie: Ein]]></description>
<content:encoded><![CDATA[Erkenntnis 1 Mein AGAIN-Projekt entwickelt sich voran. Oder zumindest tut es das in der Theorie: Ein]]></content:encoded>
</item>
<item>
<title><![CDATA[Solving the &quot;halting problem&quot;...]]></title>
<link>http://roberto.open-lab.com/2009/11/18/solving-the-halting-problem/</link>
<pubDate>Wed, 18 Nov 2009 09:23:40 +0000</pubDate>
<dc:creator>Roberto Bicchierai</dc:creator>
<guid>http://roberto.open-lab.com/2009/11/18/solving-the-halting-problem/</guid>
<description><![CDATA[When I asked Gino (alias Roberto Baldi mostmobbed) a software solution for the &#8220;halting proble]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>When I asked Gino (alias Roberto Baldi <a href="http://mostmobbed.blogspot.com/">mostmobbed</a>) a software solution for the &#8220;halting problem&#8221;, he told me &#8220;should not be so difficult&#8221;!</p>
<p>&#8220;In computability theory, the halting problem is a decision problem which can be stated as follows: given a description of a program and a finite input, decide whether the program finishes running or will run forever, given that input.&#8221; (from <a href="http://en.wikipedia.org/wiki/Halting_problem">Wikipedia</a>)</p>
<p>Alan Turing proved in 1936 that a general algorithm to solve the halting problem for all possible program-input pairs cannot exist; but as Gino says, Turing was a pessimist&#8230;</p>
<p>Why I&#8217;m scratching about the halting problem? Because I need power at <a href="http://bugsvoice.com">BugsVoice</a> user&#8217;s fingertips. In particular, I need to allow the execution of user&#8217;s JavaScript code on my server, and knowing if these scripts end may be comforting.<img style="display:inline;margin-left:0;margin-right:0;border-width:0;" title="20080102_9458" src="http://rbicchierai.files.wordpress.com/2009/11/20080102_94583.jpg?w=417&#038;h=496" border="0" alt="20080102_9458" width="417" height="496" align="right" /></p>
<p>If you already heard about XSS  you probably know that &#8220;third party&#8221; code execution, authorized or not, could be a nightmare even in the client&#8230; can you imagine how can be ugly on your server a “pirate” code?</p>
<p>Going more in detail, <a href="http://bugsvoice.com">BugsVoice</a> is a service that receives a &#8220;bug&#8221; from a customer’s server, process the request locally, serves a friendly feedback to the user and stores the bug in its database.</p>
<p>JavaScript (JS from here on) server-side execution is involved in the &#8220;processing&#8221; phase.</p>
<p>We supply a pre-filled and certified set of &#8220;rules&#8221; for processing bugs, but we even allow customers to create their own rules.</p>
<p>JS gives you the power of inspecting the error to understand what happened and gives your customer an error better than a &#8220;500 server error&#8221; in order to comfort it and recover a situation where your application credibility is going down. An interesting reading about error recovering and error feedback is  the book &#8220;<a href="http://www.amazon.com/Defensive-Design-Web-improve-messages/dp/073571410X">Defensive design for the web</a>&#8221; from 37 Signal.</p>
<p>The complete BugsVoice process includes mainly three parts:</p>
<p>1) on the customer&#8217;s server side,  an error page that catches the exception, collects as much information as possible (logged user, time, server status, database status, memory etc.) and redirects the user to our BugsVoice server (see <a href="http://blog.bugsvoice.com/2009/11/17/setting-up-an-example-error-trapping-page/" target="_blank">how to configure an error trapping page</a> on BugsVoice blog for more details).</p>
<p>2) our server, reading user preferences recovers the error template. Each template is fully dynamical and customizable; it introduces some “variables” that can be filled from the error happened. Then our server creates two JS objects: the “bug” object filled with the error collected and the “template” object filled from layout skeleton.</p>
<p>3) the JS rules are executed to fill “template” from “bug” or for rejecting the request.</p>
<p>4) the layout is rendered to the user by  using “template” and “bug” objects. The bug is stored on our server.</p>
<p>5) the user feedback is collected and stored.</p>
<p>6) a “thank you” page is displayed to the user.</p>
<p>Then there is the error management but this is interesting “only” for BugsVoice’ users, not for this post. Here some error pages from BugsVoice:</p>
<p><a href="http://bugsvoice.com" target="_blank"><img style="display:inline;border-width:0;" title="image" src="http://rbicchierai.files.wordpress.com/2009/11/image8.png?w=221&#038;h=264" border="0" alt="image" width="221" height="264" /></a> <a href="http://bugsvoice.com"><img style="display:inline;border-width:0;" title="image" src="http://rbicchierai.files.wordpress.com/2009/11/image9.png?w=221&#038;h=264" border="0" alt="image" width="221" height="264" /></a> <a href="http://bugsvoice.com" target="_blank"><img style="display:inline;border-width:0;" title="image" src="http://rbicchierai.files.wordpress.com/2009/11/image10.png?w=224&#038;h=267" border="0" alt="image" width="224" height="267" /></a> <a href="http://bugsvoice.com" target="_blank"><img style="display:inline;border-width:0;" title="image" src="http://rbicchierai.files.wordpress.com/2009/11/image11.png?w=222&#038;h=266" border="0" alt="image" width="222" height="266" /></a></p>
<p>So every user can create its own rules in order to inspect, for instance, the received bug&#8217; stacktrace trying to discover if a database problem happens, or if there is a problem with the latest version of  some browser.</p>
<p>Coming back to rules execution:<br />
during step 3) we get rules from the user configuration and we execute them on our server. We use the <a href="http://java.sun.com/javase/6/features.jsp" target="_blank">Java SE 6</a> scripting features supplying an ECMAScript engine to run rules.  A scripting engine instance is isolated from the JVM environment and you must declare the resource (libraries) you want to made available in the execution context.</p>
<p>Before executing them, the context is fed by “bug” and “template” objects. Then we run the rules…(drum roll!).</p>
<p>A basic (and friendly) rule example :</p>
<pre>if (bug.code==404)  errorPage.errorMessage="Page missing: you get this error because of...";</pre>
<p>Of course this code is safe, but what happens if an evil user composes a pleasant rule like</p>
<pre>while(true);</pre>
<pre>or</pre>
<pre>function snake(s){
  return "s"+snake(s);
}
snake(":-&#60;");</pre>
<p>&#8230; or even worst?</p>
<p>Sadly Turing beats Gino 1-0, and there is no general solution to the question “does this rule ends?”.</p>
<p>The only possible solution is to narrow the scope of the problem by introducing some fences.</p>
<p>A solution is to set up an external observer using  multi-threading and watch dogs in order to kill processes after a while, but the best solution is to avoid infinite loop situations.</p>
<p>Rules in our context are used mainly for discovering string patterns in the error stacktrace and for building better feedback; we do not need to iterate or create complex functions, so reducing the set of possible JS statement is possible without loosing “power”.</p>
<p>Luckily in JS there is a limited set of statements for iteration and recursion; so if we are able to &#8220;kill&#8221; bad intentions by forbidding dangerous statement like &#8220;while&#8221;, “for” or function definition we can run rules with confidence.</p>
<p>This way  we reduce the complex halting problem to the (quite) easy problem of  HTML sanitization  (where you must  remove some unaccepted tags. See <a href="http://roberto.open-lab.com/2009/11/05/a-java-html-sanitizer-also-against-xss/" target="_blank">XSS war: a Java HTML sanitizer</a> ).</p>
<p>Actually identifying “while” or “for” statements in a complex code is not as easy as finding the “while” string. The find/replace approach  it’s too rough, and here we need a more accurate solution in order to understand the difference between</p>
<pre>while (true);</pre>
<p>and</p>
<pre>var dummy= “while(true)”;</pre>
<p>that is obvious for us but not for a string searcher…</p>
<p>You must use something to analyze the code token by token.</p>
<p><a href="http://www.antlr.org/" target="_blank">ANTLR 3</a> supply all you need for tokenizing, parsing and walking your code. You need a JS grammar and then ANTLR will build all the stuff. We used the ES3 grammar from Xebic Reasearch  (BSD license) based on the original work of Patrick Hulsmeyer, that fits perfectly our needs.</p>
<p>With the AS3 grammar we built  parser, lexer and walker to analyze rule’s code to intercept every forbidden statement and avoid accepting dangerous scripts (at least I hope this). Only the rules that pass the test will be saved on the system and will be available to the script engine.</p>
<p>Ok, I can confess you, the post’ title is a little misleading, there is no way to solve the halting problem at least without cheating!</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Script slicing by PgMDD]]></title>
<link>http://pgolub.wordpress.com/2009/11/17/script-slicing-by-pgmdd/</link>
<pubDate>Tue, 17 Nov 2009 09:55:50 +0000</pubDate>
<dc:creator>pashagolub</dc:creator>
<guid>http://pgolub.wordpress.com/2009/11/17/script-slicing-by-pgmdd/</guid>
<description><![CDATA[Preface November, 4th. Release Candidate 1 of Database Designer for PostgreSQL 1.2.9 become availabl]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p><a href="http://pgolub.wordpress.com/files/2009/11/slicer.jpg"><img class="alignright size-medium wp-image-959" title="slicer" src="http://pgolub.wordpress.com/files/2009/11/slicer.jpg?w=285" alt="slicer" width="222" height="234" /></a></p>
<h2>Preface</h2>
<p>November, 4th. Release Candidate 1 of <a href="http://microolap.com/products/database/postgresql-designer/">Database Designer for PostgreSQL</a> 1.2.9 <a href="http://microolap.com/products/database/postgresql-designer/news/detail.php?ID=1296">become available</a>. Among three changes comparing to the last beta there is the one which attracts attention — &#8220;Execute Script In Single Transaction (Alt + F9)&#8221; functionality added. World community shocked.</p>
<p>&#8220;What means added? We thought it always was executed in single transaction&#8230;&#8221;  – resounded from all sides.</p>
<p>November, 14th. <a href="http://microolap.com/">MicroOLAP</a> Headquarters. Explanatory mission entrusted to the best agent&#8230; Me. <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<div id="attachment_970" class="wp-caption alignleft" style="width: 180px"><a href="http://pgolub.wordpress.com/files/2009/11/pgadmin-multiple-set.png"><img class="size-medium wp-image-970 " title="pgadmin-multiple-set" src="http://pgolub.wordpress.com/files/2009/11/pgadmin-multiple-set.png?w=283" alt="pgadmin-multiple-set" width="170" height="180" /></a><p class="wp-caption-text">pgAdmin shows the last result set</p></div>
<p>Right now there is no any opportunity to remember who got the idea about script slicing in SQL Executor. The gist was — each returned result set must be displayed.</p>
<p>Have a look how <a href="http://pgadmin.org/">pgAdmin</a> handles multiple result sets. As you can see only the last is available while others are discarded (we may read about this on the Messages tab).</p>
<p>One more notice. Multiple statements in pgAdmin always executed in the single transaction context. This is not a miracle since <a href="http://www.postgresql.org/docs/8.4/static/libpq-exec.html">PQsendQuery</a> function from client library used.</p>
<p>By the way, the fact that PQsendQuery used give us a hope that someday pgAdmin will handle all result sets.</p>
<div id="attachment_972" class="wp-caption alignright" style="width: 161px"><a href="http://pgolub.wordpress.com/files/2009/11/pgmdd-multiple-set.png"><img class="size-medium wp-image-972 " title="pgmdd-multiple-set" src="http://pgolub.wordpress.com/files/2009/11/pgmdd-multiple-set.png?w=252" alt="pgmdd-multiple-set" width="151" height="180" /></a><p class="wp-caption-text">PgMDD shows all result sets in separate tabs</p></div>
<table style="border-collapse:collapse;margin-top:15px;margin-bottom:15px;" border="0" cellspacing="0" cellpadding="5" bgcolor="LightYellow">
<tbody>
<tr>
<td><img class="alignnone size-full wp-image-164" title="note" src="http://pgolub.wordpress.com/files/2009/01/note.gif" alt="note" width="10" height="10" /></td>
<td>Just for note, I&#8217;m not saying pgAdmin is a dinosaur or something. I like it a lot. This is &#8220;must have&#8221; tool for sure. I&#8217;m using it because of other GUI administration utility absence. <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </td>
</tr>
</tbody>
</table>
<p>As you probably guessed PgMDD creates separate tab for each result set from the very first release.</p>
<p>There is one more important issue why script slicing was implemented. Database Designer is some kind of ideal world. You may use any names, any functions, any data types for model creating.</p>
<p>But real life is cruel. Generated script must work in any conditions even if some statements may fail, e.g. old server version, non-existent role, lack of privileges for some operations, object with the same name already exists etc. That&#8217;s why PgMDD&#8217;s SQL Executor should give the developer right of choice — abort execution or proceed anyway.</p>
<h2>How it&#8217;s made</h2>
<p>Let me one phrase before I begin: there is no any SQL parser library (or suite) on the market which suits even the basic needs  (I mean PostgreSQL dialect of course). I guarantee this!</p>
<p>God is my witness, our team tried every 3rd party library we meet. Without success.</p>
<p>In Russian speaking IT folklore there is an adage &#8220;Переписать всё нахрен!&#8221;. Loose translation is &#8220;Rewrite all from scratch!&#8221;</p>
<p>OK, the moment of glory. We made it ourselves using the <a href="http://pgolub.wordpress.com/2009/05/13/metamorphosis-of-gram-y-8-4-beta1/">native PostgreSQL grammar</a>. Yeah, bite me unbelievers!</p>
<p>Let&#8217;s omit technical details. I know nobody cares anyway. <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  Our parser is absolutely&#8230; no, I mean <strong>absolutely</strong> compatible with PostgreSQL. But with 8.3.x version. <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  It&#8217;s just a matter of time to update it, but we missed the moment.</p>
<p>So we have two reasons to add &#8220;Execute in Single Transaction&#8221; functionality:</p>
<ul>
<li>Ability to ROLLBACK all changes made by script in case of need</li>
<li>PgMDD parser cannot proceed with some PostgreSQL 8.4.x features</li>
</ul>
<p>That&#8217;s all folks!</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[So I have been playing with VPRML (Vorpe ... ]]></title>
<link>http://vorper.wordpress.com/2009/11/13/so-i-have-been-playing-with-vprml-vorpe/</link>
<pubDate>Fri, 13 Nov 2009 02:30:43 +0000</pubDate>
<dc:creator>xiofire</dc:creator>
<guid>http://vorper.wordpress.com/2009/11/13/so-i-have-been-playing-with-vprml-vorpe/</guid>
<description><![CDATA[So I have been playing with VPRML (Vorper Markup Language) and it&#8217;s parser, and I think I have]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>So I have been playing with VPRML (Vorper Markup Language) and it&#8217;s parser, and I think I have a good start on how its going to be structured.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[WYSIWYG editor]]></title>
<link>http://cmsdev.wordpress.com/2009/11/04/wysiwyg-editor/</link>
<pubDate>Wed, 04 Nov 2009 16:31:06 +0000</pubDate>
<dc:creator>LHK07</dc:creator>
<guid>http://cmsdev.wordpress.com/2009/11/04/wysiwyg-editor/</guid>
<description><![CDATA[When the user wants to write a post/page he would like to write without any code as if it was a word]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>When the user wants to write a post/page he would like to write without any code as if it was a word processor,but all the webpages need to be stored into html format.To convert normal text and images into HTML code directly a javascript program called WYSIWYG editor is used which parses the text and adds HTML tags accordingly. A good WYSIWYG editor also displays the HTML code symultaniously.<br />
A lot of editors are available freely for download and we choose &#8220;WIDGEDITOR&#8221; by <a rel="nofollow" href="http://www.themaninblue.com">Cameron Adam</a></p>
<p>because its easy to use and customize.</p>
<p>After tweaking and editing it according to the requirements and   looks this is what we got:</p>
<p><img class="alignnone size-full wp-image-21" title="preview3" src="http://cmsdev.wordpress.com/files/2009/11/preview3.jpg" alt="preview3" width="600" height="287" /></p>
<p>&#160;</p>
<p><img class="alignnone size-full wp-image-22" title="preview4" src="http://cmsdev.wordpress.com/files/2009/11/preview4.jpg" alt="preview4" width="600" height="294" /></p>
<p>&#160;</p>
<p>The converted html code is put into the database and displayed on the frontend.All images and media are stored in the html text with their locations referenced.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Convert Checkpoint R70 Binary file to readable format]]></title>
<link>http://kishur.wordpress.com/2009/11/03/convert-checkpoint-r70-binary-file-to-readable-format/</link>
<pubDate>Tue, 03 Nov 2009 15:16:31 +0000</pubDate>
<dc:creator>kishur</dc:creator>
<guid>http://kishur.wordpress.com/2009/11/03/convert-checkpoint-r70-binary-file-to-readable-format/</guid>
<description><![CDATA[Do you know that Checkpoint stores it logs in binary format. This will be only decoded to to readabl]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Do you know that Checkpoint stores it logs in binary format. This will be only decoded to to readable format when its being called by the SmartTracker program which comes with Checkpoint R70 software.</p>
<p>Alternatively you can use other commercial product such as &#8220;ManageEngine &#8211; Firewall Analyzer&#8221; &#38; &#8220;Sawmill&#8221; to read the log file.</p>
<p>What happen if you dont have money to buy those software and would like to conduct investigation on a security event which occurs recently in your organization.</p>
<p>You can still use the checkpoint &#8220;fwm logexport&#8221; command to convert the binary format log file to readable format, such as &#8220;ascii&#8221;.</p>
<p>Follow the steps below to convert your file.</p>
<p><span style="text-decoration:underline;"><strong>Steps</strong></span></p>
<p>1. Login to your Chekpoint R70 as expert mode via SSH.</p>
<p>2. Go to the directory where your logs file are kept.</p>
<p>3. Issue the command below:</p>
<p>&#8220;fwm logexport -i 2009-11-03_235900.log -o 2009-11-03_235900_read.log -p&#8221;</p>
<p>4. There should be new file &#8220;2009-11-03_235900_read.log&#8221; created in your current directory. Use the cat, vi command to read the file.</p>
<p>Where:</p>
<p>&#8220;-i &#8221; is your input log file which is in binary format.</p>
<p>&#8220;-o&#8221; is you output log file which will be in readable format.</p>
<p>&#8220;-p&#8221; is to exclude the port number from being resolve</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Using Parser Combinators - mystery on the periphery]]></title>
<link>http://fwhaslam.wordpress.com/2009/10/14/using-parser-combinators-mystery-on-the-periphery/</link>
<pubDate>Wed, 14 Oct 2009 21:34:34 +0000</pubDate>
<dc:creator>fwhaslam</dc:creator>
<guid>http://fwhaslam.wordpress.com/2009/10/14/using-parser-combinators-mystery-on-the-periphery/</guid>
<description><![CDATA[I started playing with Scalas parser combinators recently.  I had no purpose, I was just trying to f]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>I started playing with Scalas parser combinators recently.  I had no purpose, I was just trying to figure them out.  They promise to provide a parsing engine built with BNF (Backus-Naur Form) that performs lexical anlysis and evaluation.  I find that they deliver on this promise, but the implicit promise of simplicity was not delivered.</p>
<p>It is probably my fault for assuming things would be simple.  The keyword to my mistake is &#8216;combinators&#8217;.  The concept is to use simple parsers and combine them into parsers of ever increasing complexity.  So my assumption of simplicity is falsified when a parsing engine is built up into something useful.</p>
<p>Here are two things I learned:
<ol>
<li>When you construct a parser, you end up with a function that creates a function.  This created function then needs an input to process.</li>
<li>There does not appear to be a simple test for EOI (end of input).  The function &#8216;phrase()&#8217; will do this test, but cannot be used at the top level.</li>
</ol>
<p><big><b>Lesson One:</b></big> Using a Parser Combinator</p>
<p>I went nuts looking for this information.  Everyone talks about <em>creating</em> PCs, noone gives an example of <em>using</em> one.  Here is my parser:</p>
<pre>
class Sentencer extends StandardTokenParsers {
    def sentence: Parser[Any] = {ident &#124; ident ~ sentence}
}
</pre>
<p>Very simple, it tries to match a list of words.  A sentence is defined as either a single word (ident), or a word followed by a sentence.  Since the &#8217;sentence&#8217; part is to the right of the word part, this is a &#8216;right recursive&#8217; expression.  The &#8216;ident&#8217; is a StandardTokenParsers function that produces a Parser which matches individual words (ie. tokens) from the input stream.</p>
<p>Here is the same parser with a run() method to process a string input.  The method returns a ParseResult object.  Note that &#8216;lexical&#8217; is a local variable that is an instantiation of the &#8216;Scanners&#8217; trait.  &#8216;Scanner&#8217; is hidden within &#8216;Scanners&#8217;.  Using Scanner as the input is only valid for TokenParsers.   If you are using a byte level parser, there are other scanners available.</p>
<pre>
class Sentencer extends StandardTokenParsers {
    def run(text:String) = {
        var fn = sentence()
        fn(new lexical.Scanner(text));
    }
    def sentence: Parser[Any] = ident &#124; ident ~ sentence
}</pre>
<p>Now, here is how you invoke the parser:</p>
<pre>
new Sentencer.run("one two three")
</pre>
<p>The result is a ParseResult object which displays as the following string:</p>
<pre>
[1.5] parsed: one
</pre>
<p>As you can see, my parse only matched the first element of the input.  Now on to -</p>
<p><big><b>Lesson Two:</b></big> Matching to the End of Input</p>
<p>There is a function in Parsers named phrase().  As with most of the Parsers functions, it takes one or more parsers and returns a new combined parser.  In this case, it takes a single parser, then returns a Success object if the underlying parser reached the End of Input; and a Failure object otherwise.  Success and Failure are both extensions of the ParseResult abstract class.</p>
<p>My first instinct was to wrap it around my entire parser object like this:</p>
<pre>
    def run() = {
        var fn = sentence
        var fnp = phrase(fn)
        fnp(new lexical.Scanner(text))
    }
</pre>
<p>Now when I run, it still only matches the first token (&#8216;one&#8217;), then returns Failure.  It took me a day or two to wrap my head around exactly what I had done.  I finally came up with a better place to use the phrase() function:</p>
<pre>
def sentence = phrase(ident) &#124; phrase(ident~sentence)
</pre>
<p>When I run this version, it matches the entire input.  The key is that it first matches the &#8216;one&#8217;, then fails.  On failure, the pipe parser (&#8216;&#124;&#8217;) will try the alternative phrase(ident~sentence).  &#8216;ident&#8217; is matched, then the remainder of the input is matched against &#8217;sentence&#8217;.  Rinse and repeat to get the following ParseResult:</p>
<pre>
[1.14] parsed: (one~(two~three))
</pre>
<p>Given the recursive nature of the definition, it turned out I could get the same thing with the slightly simpler:</p>
<pre>
def sentence = phrase(ident) &#124; ident~sentence
</pre>
<p>Eventually I decided to ditch &#8216;phrase()&#8217; for my own EOI parser (end of input):</p>
<pre>
class Sentencer extends StandardTokenParsers {
    def run(text:String) = {
        var fn = sentence
        fn(new lexical.Scanner(text));
    }

    def sentence: Parser[Any] = { ident ~ EOI &#124; ident ~ sentence }

    def EOI: Parser[Any] = new Parser[Any] {
        def apply(in: Input) = {
            if (in.atEnd) new Success( "EOI", in )
            else Failure("End of Input expected", in)
        }
    }
}
</pre>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Turorial: Parser Descendente Recursivo]]></title>
<link>http://openfecks.wordpress.com/2009/09/28/turorial-parser-descendente-recursivo/</link>
<pubDate>Tue, 29 Sep 2009 03:51:27 +0000</pubDate>
<dc:creator>noahfx</dc:creator>
<guid>http://openfecks.wordpress.com/2009/09/28/turorial-parser-descendente-recursivo/</guid>
<description><![CDATA[Bueno para los que se preguntan como hacer un parser a partir de una gramática que hemos diseñado, a]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Bueno para los que se preguntan como hacer un parser a partir de una gramática que hemos diseñado, aquí le enseñare brevemente como hacer un parser descendente recursivo</p>
<p>Tenemos que tomar en cuenta lo siguiente:</p>
<p>1. Nuestra gramática no debe ser recursiva a la izquierda.Por ejemplo la producción</p>
<p><em>E::=Er </em></p>
<p>es recursiva  la izquierda, para este tipo de gramáticas nuestro parser no sería funcional, asi que hay que quitarle la recursividad a la izquierda</p>
<p>2.Debemos construir los conjuntos: First, Follow, para cada producción de nuestra gramatica.</p>
<p><strong>Estructura del parser:</strong></p>
<p><strong>LookAhead: </strong>Esta variable(global), inicialmente es el token de mas a la izquierda de la entrada.</p>
<p>Por cada <strong>NO TERMINAL</strong> de la gramática debe existir un procedimiento del parser.Para hacer mas fácil su programación, el nombre del procedimiento tendrá el nombre del no terminal.</p>
<p>Cada una de las opciones del no terminal formaran el cuerpo del procedimiento.</p>
<p><strong>Procedimiento Match</strong>: Con este procedimiento sabremos si es el TERMINAL correcto, lo detallare a continuación</p>
<pre>void Match(token simbolo){</pre>
<pre style="padding-left:30px;">if(lookahead==simbolo)
</pre>
<pre style="padding-left:30px;">lookahead=siguienteSimbolo;/* <em>para obtener el siguiente simbolo se puede hacer un procedimiento para pedir el siguiente toke que recibimos del analizador lexico*/</em></pre>
<pre style="padding-left:60px;">else</pre>
<pre style="padding-left:60px;">ERROR
}
</pre>
<p>Bueno, ahora a programar, escribiré un parser para la siguiente gramática:</p>
<p>terminales: x,y,z;</p>
<p>E:=xP &#124; H     /*  E produce x segido de la producción P ó la producción H       */</p>
<p>P:=Tz</p>
<p>H:=y</p>
<p>T:=yx;</p>
<p>****************************************************************</p>
<p>var token lookahead;//variable del tipo del token que ustedes utilicen</p>
<p>void main {</p>
<p>//Iniciar Scanner si s necesario</p>
<p style="padding-left:30px;">lookahead=nexToken() // funcion que devuelve el token siguiente</p>
<p style="padding-left:30px;">E();</p>
<p>}</p>
<p>void E(){</p>
<p>if(lookahead==x)</p>
<p style="padding-left:30px;">Match(x);</p>
<p style="padding-left:30px;">P();</p>
<p style="padding-left:30px;">else</p>
<p style="padding-left:30px;">H();</p>
<p>}</p>
<p>void P(){</p>
<p style="padding-left:30px;">T();</p>
<p style="padding-left:30px;">Match(z);</p>
<p>}</p>
<p>void T(){</p>
<p style="padding-left:30px;">Match(y);</p>
<p style="padding-left:30px;">Match(x);</p>
<p>}</p>
<p>void H(){</p>
<p style="padding-left:30px;">Match(z);</p>
<p>}</p>
<p>Y así de sencillo es un parser descendente Recursivo <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Si se dan cuenta el parser recorre el árbol de arriba hacia abajo de allí viene descendente,</p>
<p>Y Recursivo se debe a que el parser puede llamarse a si mismo directa o indirectamente por medio de sus funciones <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Espero que les sirva de algo,</p>
<p>Dudas, comentarios o sugerencias, no duden en escribir</p>
<p>saludos</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Parsers in Java - Sreesh B Nair]]></title>
<link>http://knwrites.wordpress.com/2009/09/18/jaxp-2/</link>
<pubDate>Fri, 18 Sep 2009 10:16:44 +0000</pubDate>
<dc:creator>knwrites</dc:creator>
<guid>http://knwrites.wordpress.com/2009/09/18/jaxp-2/</guid>
<description><![CDATA[Introduction  The Java API for XML Processing (JAXP) is for processing XML data using applications w]]></description>
<content:encoded><![CDATA[Introduction  The Java API for XML Processing (JAXP) is for processing XML data using applications w]]></content:encoded>
</item>
<item>
<title><![CDATA[Parser Error]]></title>
<link>http://yetanotherdayatwork.wordpress.com/2009/09/16/parser-error/</link>
<pubDate>Wed, 16 Sep 2009 07:41:26 +0000</pubDate>
<dc:creator>yetanotherdayatwork</dc:creator>
<guid>http://yetanotherdayatwork.wordpress.com/2009/09/16/parser-error/</guid>
<description><![CDATA[Parser ErrorDescription: An error occurred during the parsing of a resource required to service this]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Parser Error<!--          body {font-family:"Verdana";font-weight:normal;font-size: .7em;color:black;}           p {font-family:"Verdana";font-weight:normal;color:black;margin-top: -5px}          b {font-family:"Verdana";font-weight:bold;color:black;margin-top: -5px}          H1 { font-family:"Verdana";font-weight:normal;font-size:18pt;color:red }          H2 { font-family:"Verdana";font-weight:normal;font-size:14pt;color:maroon }          pre {font-family:"Lucida Console";font-size: .9em}          .marker {font-weight: bold; color: black;text-decoration: none;}          .version {color: gray;}          .error {margin-bottom: 10px;}          .expandable { text-decoration:underline; font-weight:bold; color:navy; cursor:hand; }          --><span style="font-family:Arial,Helvetica,Geneva,SunSans-Regular,sans-serif;"><strong>Description: </strong>An error occurred during the parsing of a resource required to service this  request. Please review the following specific parse error details and modify  your source file appropriately. </span><br />
<strong>Parser Error Message: </strong>Ambiguous match found.</p>
<p>=_=</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Programming Praxis - Regular Expressions, Part 1]]></title>
<link>http://bonsaicode.wordpress.com/2009/09/15/programming-praxis-regular-expressions-part-1/</link>
<pubDate>Tue, 15 Sep 2009 15:13:16 +0000</pubDate>
<dc:creator>Remco Niemeijer</dc:creator>
<guid>http://bonsaicode.wordpress.com/2009/09/15/programming-praxis-regular-expressions-part-1/</guid>
<description><![CDATA[In today&#8217;s Programming Praxis problem our task is to write a parser for simple regular express]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>In <a href="http://programmingpraxis.com/2009/09/15/regular-expressions-part-1/" target="_blank">today&#8217;s</a> Programming Praxis problem our task is to write a parser for simple regular expressions. Since Haskell has a very good parser library called Parsec, we&#8217;re going to be using that. Let&#8217;s get started.</p>
<p>First, some imports:</p>
<pre style="color:#000000;background-color:#ffffff;font-size:9pt;font-family:'Courier New';">import Control<span style="color:#ff0000;">.</span>Applicative <span style="color:#ff0000;">((&#60;</span>$<span style="color:#ff0000;">&#62;), (*&#62;), (&#60;*), (&#60;*&#62;))</span>
import Data<span style="color:#ff0000;">.</span><span style="color:#0000ff;">Char</span>
import Text<span style="color:#ff0000;">.</span>Parsec
import Text<span style="color:#ff0000;">.</span>Parsec<span style="color:#ff0000;">.</span><span style="color:#0000ff;">String</span></pre>
<p>Next we define our data structure. There are seven constructs we have to implement, split into two groups based on whether or not they can be followed by a star or not.</p>
<pre style="color:#000000;background-color:#ffffff;font-size:9pt;font-family:'Courier New';">data Elem <span style="color:#ff0000;">=</span> Lit <span style="color:#0000ff;">Char</span> <span style="color:#ff0000;">&#124;</span> Esc <span style="color:#0000ff;">Char</span> <span style="color:#ff0000;">&#124;</span> Any <span style="color:#ff0000;">&#124;</span> Set <span style="color:#0000ff;">Bool</span> <span style="color:#ff0000;">[</span>Elem<span style="color:#ff0000;">]</span> deriving <span style="color:#0000ff;">Show</span>
data Chunk <span style="color:#ff0000;">=</span> Elem Elem <span style="color:#ff0000;">&#124;</span> BoL <span style="color:#ff0000;">&#124;</span> EoL <span style="color:#ff0000;">&#124;</span> Star Elem deriving <span style="color:#0000ff;">Show</span></pre>
<p>The parser itself is not too difficult if you know how the operators from Control.Applicative work. &#60;$&#62; means apply the function on the left to the result of the parser on the right. &#60;*, *&#62; and &#60;*&#62; take only the result on the left, right and both sides respectively.</p>
<pre style="color:#000000;background-color:#ffffff;font-size:9pt;font-family:'Courier New';">regex <span style="color:#ff0000;">::</span> Parser <span style="color:#ff0000;">[</span>Chunk<span style="color:#ff0000;">]</span>
regex <span style="color:#ff0000;">= (++) &#60;</span>$<span style="color:#ff0000;">&#62;</span> bol <span style="color:#ff0000;">&#60;*&#62;</span> many chunk where
    bol <span style="color:#ff0000;">=</span> option <span style="color:#ff0000;">[] (</span><span style="color:#ec7f15;">const</span> <span style="color:#ff0000;">[</span>BoL<span style="color:#ff0000;">] &#60;</span>$<span style="color:#ff0000;">&#62;</span> char <span style="color:#ff0000;">'</span>^<span style="color:#ff0000;">')</span>
    chunk <span style="color:#ff0000;">=</span> choice <span style="color:#ff0000;">[</span>Star <span style="color:#ff0000;">&#60;</span>$<span style="color:#ff0000;">&#62;</span> <span style="color:#ec7f15;">try</span> <span style="color:#ff0000;">(</span>element <span style="color:#ff0000;">&#60;*</span> char <span style="color:#ff0000;">'*'),</span>
                    <span style="color:#ec7f15;">const</span> EoL <span style="color:#ff0000;">&#60;</span>$<span style="color:#ff0000;">&#62;</span> <span style="color:#ec7f15;">try</span> <span style="color:#ff0000;">(</span>char <span style="color:#ff0000;">'</span>$<span style="color:#ff0000;">' &#60;*</span> eof<span style="color:#ff0000;">),</span>
                    Elem <span style="color:#ff0000;">&#60;</span>$<span style="color:#ff0000;">&#62;</span> element<span style="color:#ff0000;">]</span>
    element <span style="color:#ff0000;">=</span> choice <span style="color:#ff0000;">[</span>esc <span style="color:#ff0000;">&#60;</span>$<span style="color:#ff0000;">&#62;</span> <span style="color:#ec7f15;">try</span> <span style="color:#ff0000;">(</span>char <span style="color:#ff0000;">'</span>\\<span style="color:#ff0000;">' *&#62;</span> anyChar<span style="color:#ff0000;">),</span>
                      <span style="color:#ec7f15;">const</span> Any <span style="color:#ff0000;">&#60;</span>$<span style="color:#ff0000;">&#62;</span> char <span style="color:#ff0000;">'.',</span>
                      Set <span style="color:#0000ff;font-weight:bold;">False</span> <span style="color:#ff0000;">.</span> expandSet <span style="color:#ff0000;">&#60;</span>$<span style="color:#ff0000;">&#62;</span> set <span style="color:#ff0000;">"[^"</span><span style="color:#ff0000;">,</span>
                      Set <span style="color:#0000ff;font-weight:bold;">True</span> <span style="color:#ff0000;">.</span> expandSet <span style="color:#ff0000;">&#60;</span>$<span style="color:#ff0000;">&#62;</span> set <span style="color:#ff0000;">"["</span><span style="color:#ff0000;">,</span>
                      Lit <span style="color:#ff0000;">&#60;</span>$<span style="color:#ff0000;">&#62;</span> noneOf <span style="color:#ff0000;">"]"</span><span style="color:#ff0000;">]</span>
    esc c <span style="color:#ff0000;">=</span> if <span style="color:#ec7f15;">elem</span> c <span style="color:#ff0000;">"nt"</span> then Esc c else Lit c
    set s <span style="color:#ff0000;">=</span> <span style="color:#ec7f15;">try</span> <span style="color:#ff0000;">(</span><span style="color:#0000ff;font-weight:bold;">string</span> s <span style="color:#ff0000;">*&#62;</span> many1 element <span style="color:#ff0000;">&#60;*</span> char <span style="color:#ff0000;">']')</span>
    expandSet <span style="color:#ff0000;">(</span>Lit a<span style="color:#ff0000;">:</span>Lit <span style="color:#ff0000;">'-':</span>Lit b<span style="color:#ff0000;">:</span>xs<span style="color:#ff0000;">)</span>
        <span style="color:#ff0000;">&#124;</span> validRange a b <span style="color:#ff0000;">=</span> <span style="color:#ec7f15;">map</span> Lit <span style="color:#ff0000;">[</span>a<span style="color:#ff0000;">..</span>b<span style="color:#ff0000;">] ++</span> expandSet xs
    expandSet <span style="color:#ff0000;">(</span>x<span style="color:#ff0000;">:</span>xs<span style="color:#ff0000;">) =</span> x <span style="color:#ff0000;">:</span> expandSet xs
    expandSet _ <span style="color:#ff0000;">= []</span>
    validRange a b <span style="color:#ff0000;">=</span> b <span style="color:#ff0000;">&#62;</span> a <span style="color:#ff0000;">&#38;&#38; ((</span><span style="color:#ec7f15;">isLower</span> a <span style="color:#ff0000;">&#38;&#38;</span> <span style="color:#ec7f15;">isLower</span> b<span style="color:#ff0000;">) &#124;&#124;</span>
                               <span style="color:#ff0000;">(</span><span style="color:#ec7f15;">isUpper</span> a <span style="color:#ff0000;">&#38;&#38;</span> <span style="color:#ec7f15;">isUpper</span> b<span style="color:#ff0000;">) &#124;&#124;</span>
                               <span style="color:#ff0000;">(</span><span style="color:#ec7f15;">isDigit</span> a <span style="color:#ff0000;">&#38;&#38;</span> <span style="color:#ec7f15;">isDigit</span> b<span style="color:#ff0000;">))</span></pre>
<p>With the parser written, the function to parse a string is trivial:</p>
<pre style="color:#000000;background-color:#ffffff;font-size:9pt;font-family:'Courier New';">parseRegex <span style="color:#ff0000;">::</span> <span style="color:#0000ff;">String</span> <span style="color:#ff0000;">-&#62;</span> <span style="color:#0000ff;">Either</span> ParseError <span style="color:#ff0000;">[</span>Chunk<span style="color:#ff0000;">]</span>
parseRegex <span style="color:#ff0000;">=</span> parse regex <span style="color:#ff0000;">""</span></pre>
<p>Some tests to see if everything is working properly:</p>
<pre style="color:#000000;background-color:#ffffff;font-size:9pt;font-family:'Courier New';">main <span style="color:#ff0000;">::</span> <span style="color:#0000ff;">IO</span> <span style="color:#ff0000;">()</span>
main <span style="color:#ff0000;">=</span> <span style="color:#ec7f15;">mapM_ print</span> <span style="color:#ff0000;">[</span>parseRegex <span style="color:#ff0000;">"[0-9][0-9]*"</span><span style="color:#ff0000;">,</span>
                    parseRegex <span style="color:#ff0000;">"^..*$"</span><span style="color:#ff0000;">,</span>
                    parseRegex <span style="color:#ff0000;">"hello"</span><span style="color:#ff0000;">,</span>
                    parseRegex <span style="color:#ff0000;">"^ *hello *$"</span><span style="color:#ff0000;">,</span>
                    parseRegex <span style="color:#ff0000;">"^[^x].*[0-9] *x$"</span><span style="color:#ff0000;">]</span></pre>
<p>Piece of cake. Next time we do the implementation.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Spanish Word Definitions with Python]]></title>
<link>http://thunderlabs.wordpress.com/2009/09/09/spanish-word-definitions-with-python/</link>
<pubDate>Wed, 09 Sep 2009 18:07:13 +0000</pubDate>
<dc:creator>bont</dc:creator>
<guid>http://thunderlabs.wordpress.com/2009/09/09/spanish-word-definitions-with-python/</guid>
<description><![CDATA[I love Python! Last night I created a small Python script for a friend, in a couple hours, to get th]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>I love Python!</p>
<p>Last night I created a small Python script for a friend, in a couple hours, to get the definitions of a list of Spanish words using <a href="http://spanishdict.com">SpanishDict</a>. It works by sending a request to SpanishDict and parsing the HTML it gets back, only keeping the definition of the word. If you run it from the command line you have the choice of passing files or individual words. Here&#8217;s the help message:</p>
<p><code> </code></p>
<pre>Usage: spanishdict.py [options] file [file2, file3, ...]
Options:
  -h, --help            show this help message and exit
  -o OUTPUT             where to send the output, file name or '-'
to indicate stdout [default: {first file}.defs]
  --words               interpret the input as words, not file names [default: False]
  --delimiter=DELIMITER
                        delimiter between words in the files [default: \n]</pre>
<p>The script is available in <a href="http://code.google.com/p/bont/">my code repository</a> in the &#8216;python&#8217; directory or you can <a href="http://code.google.com/p/bont/source/browse/python/spanishdict.py">view it online</a>.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Sentence Patterns]]></title>
<link>http://brainwave.opencog.org/2009/09/08/sentence-patterns/</link>
<pubDate>Tue, 08 Sep 2009 14:54:23 +0000</pubDate>
<dc:creator>linasv</dc:creator>
<guid>http://brainwave.opencog.org/2009/09/08/sentence-patterns/</guid>
<description><![CDATA[I&#8217;ve recently resumed work on the question-answering chatbot, and am trying to get it to compr]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>I&#8217;ve recently resumed work on the question-answering chatbot, and am trying to get it to comprehend a broader range of questions and statements.   The &#8220;big idea&#8221; is to create a number of &#8220;sentence patterns&#8221; that the pattern matcher can recognize and respond to.  The reason this is a &#8220;big&#8221; idea is because I am trying to avoid anything algorothmic or procedural &#8212; everything is to be done by specifying OpenCog hypergraphs, and NOT by writing C++ code, or <a href="http://www.gnu.org/software/guile/guile.html">scheme</a> code (or python code&#8230;etc). The reason for working entirely with patterns and hypergraphs, rather than with C++ or scheme, is because this puts the &#8220;knowledge&#8221; of the system into a form that AI routines can manipulate it: learning algos can learn new hypergraphs; statistical algos can gather usage information on which hypergraphs get triggered, and so on.  This is all easer said than done: although I&#8217;ve eliminated a fair amount of question-answering code previously written in C++, I&#8217;ve also had to write some new scheme code. Bummer. <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_sad.gif' alt=':-(' class='wp-smiley' /> </p>
<p>Patten matching is now used through-out all of the OpenCog NLP pipeline, although not in a unified manner. The <a href="http://www.abisource.com/projects/link-grammar/">Link Grammar parser</a> uses patterns (called &#8220;disjuncts&#8221;) to determine how the words in a sentence can link to one-another, thus &#8220;parsing&#8221;, or pulling the grammatical structure out of a sentence (<a href="http://www.cs.cmu.edu/afs/cs.cmu.edu/project/link/pub/www/papers/ps/tr91-196.pdf">this paper</a> provides an excellent overview). The <a href="http://opencog.org/wiki/RelEx">RelEx dependency relation extractor</a> applies patterns on the link-grammar output  to extract syntactic relations. For example, the sentence &#8220;John threw a rock&#8221; becomes</p>
<blockquote><p>_obj(throw, ball)<br />
_subj(throw, John)</p></blockquote>
<p>after RelEx gets done with it. And now, there are a dozen patterns inside of OpenCog that can pick out certain kinds of questions and statements from RelEx output, and pattern-match questions to find answers to them.</p>
<p>For example, the new OpenCog patterns convert &#8220;The capital of France is Paris&#8221; into</p>
<blockquote><p>capital_of(France, Paris)</p></blockquote>
<p>and similarly, &#8220;What is the capital of France?&#8221; into</p>
<blockquote><p>capital_of(France,what)</p></blockquote>
<p>Treating &#8220;what&#8221; as a variable, there is yet another pattern that matches up the form of the question to the form of the answer, thus deducing that &#8220;what&#8221; must be &#8220;Paris&#8221;.</p>
<p>Somewhat harder is using patterns to distinguish similar from dis-similar concepts, so that sentences like &#8220;John threw a green ball&#8221; aren&#8217;t used as answers to questions such as &#8220;Did John throw a red ball?&#8221;: the word &#8220;ball&#8221; with modifier &#8220;green&#8221; has to be detected as a different entity than the word &#8220;ball&#8221; with modifier &#8220;red&#8221;: these are two different entities (called &#8220;semes&#8221; in the code).  In fact, out of laziness, I&#8217;ve punted on this one: the promotion of word-instances to &#8220;semes&#8221; is done by code, rather than by pattern matching. But soon, I hope, this will change. In the meanwhile, the <a href="http://buildbot.opencog.org/doxygen/d7/d41/opencog_2nlp_2seme_2README-source.html">README file</a> provides a more detailed discussion.</p>
<p>Here are some patterns that work these days:</p>
<blockquote><p>&#60;me&#62;         John threw a green ball.<br />
&#60;me&#62;         Fred threw a red ball<br />
&#60;me&#62;         Mary threw a blue rock<br />
&#60;me&#62;         who threw a ball?<br />
&#60;cogita-bot&#62; Syntax pattern match found: Fred John<br />
&#60;me&#62;         who threw a red ball?<br />
&#60;cogita-bot&#62; Syntax pattern match found: Fred</p>
<p>&#60;me&#62;         Did Fred throw a ball?<br />
&#60;cogita-bot&#62; Truth query determined &#8220;yes&#8221;: throw</p>
<p>&#60;me&#62;         Did Fred throw a red ball?<br />
&#60;cogita-bot&#62; Truth query determined &#8220;yes&#8221;: throw</p>
<p>&#60;me&#62;         The color of the book is red.<br />
&#60;me&#62;         What is the color of the book?<br />
&#60;cogita-bot&#62; Triples abstraction found: red</p>
<p>&#60;me&#62;         the cat sat on the mat<br />
&#60;me&#62;         what did the cat sit on?<br />
&#60;cogita-bot&#62; Triples abstraction found: mat</p></blockquote>
<p>And here are some that don&#8217;t yet work: &#8220;Did Fred throw a green ball?&#8221; &#8212; gets no reply, because the system can&#8217;t find an answer, and doesn&#8217;t make the common-sense leap of &#8220;can&#8217;t find answer-&#62; answer must be no&#8221;.  Another common-sense problem is illustrated by: &#8220;Did Fred throw a round ball?&#8221; &#8212; the system doesn&#8217;t know that balls are round, and simply assumes that a &#8220;round ball&#8221; is some special kind of &#8220;ball&#8221;.  Oh well. There&#8217;s work to be done.</p>
<p>You can try out the chatbot yourself (when its up, and not broken!) on the IRC chat channel #opencog on the freenode.net chat servers.</p>
<p>&#8211; Linas Vepstas</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[pyopt, a python optparse with class]]></title>
<link>http://uberpython.wordpress.com/2009/09/05/pyopt-a-python-optparse-with-class/</link>
<pubDate>Sat, 05 Sep 2009 23:18:39 +0000</pubDate>
<dc:creator>ubershmekel</dc:creator>
<guid>http://uberpython.wordpress.com/2009/09/05/pyopt-a-python-optparse-with-class/</guid>
<description><![CDATA[I made a python option parser, I wonder what python-ideas will think of it&#8230; http://code.google]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>I made a python option parser, I wonder what python-ideas will think of it&#8230;</p>
<p><a title="Pyopt - A Pythonic optparse" href="http://code.google.com/p/pyopt/">http://code.google.com/p/pyopt/</a></p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Simple Parse Tree]]></title>
<link>http://pgolub.wordpress.com/2009/09/05/simple-parse-tree/</link>
<pubDate>Sat, 05 Sep 2009 18:48:56 +0000</pubDate>
<dc:creator>pashagolub</dc:creator>
<guid>http://pgolub.wordpress.com/2009/09/05/simple-parse-tree/</guid>
<description><![CDATA[It was amazing cognac time and talking about parser with a good old college friend. E.g. to demonstr]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>It was amazing  <a href="http://pgolub.wordpress.com/2009/08/31/cognac-cola/">cognac</a> time and  talking about parser with a good old college friend. E.g. to demonstrate <a href="http://microolap.com/temp/pg/PgNative.html#stmtmulti">how PostgreSQL analyses</a> some queries.</p>
<p>Statement</p>
<table style="border-collapse:collapse;margin-top:15px;margin-bottom:15px;" border="0" cellspacing="0" cellpadding="5" bgcolor="LightYellow">
<tbody>
<tr>
<td>SELECT field1 AS customer FROM &#8220;FooSchema&#8221;.&#8221;Bar&#8221;</td>
</tr>
</tbody>
</table>
<p>inside server&#8217;s brain will become:</p>
<pre>stmtmulti
    stmt
        SelectStmt
            select_no_parens
                simple_select
                    <strong>SELECT</strong>
                    target_list
                        target_el
                            a_expr
                                c_expr
                                    columnref
                                        relation_name
                                            ColId
                                                <strong>field1</strong>
                            <strong>AS</strong>
                            ColLabel
                                <strong>customer</strong>
                    from_clause
                        <strong>FROM</strong>
                        from_list
                            table_ref
                                relation_expr
                                    qualified_name
                                        relation_name
                                            ColId
                                                <strong>"FooSchema"</strong>
                                        indirection
                                            indirection_el
                                               <strong> .</strong>
                                                attr_name
                                                    ColLabel
                                                        <strong>"Bar"</strong></pre>
<p>Will add some more in case of interest <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Tutorial JQuery Ajax - Parser xml simples]]></title>
<link>http://tutoriaisparaweb.wordpress.com/2009/09/03/tutorial-jquery-ajax-parser-xml-simples/</link>
<pubDate>Thu, 03 Sep 2009 17:38:10 +0000</pubDate>
<dc:creator>andreluizrodper</dc:creator>
<guid>http://tutoriaisparaweb.wordpress.com/2009/09/03/tutorial-jquery-ajax-parser-xml-simples/</guid>
<description><![CDATA[Boa tarde ae a todos Hoje vo passar aqui um parser de xml com ajax e jquery bem simples e fácil de f]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Boa tarde ae a todos</p>
<p>Hoje vo passar aqui um parser de xml com ajax e jquery bem simples e fácil de fazer.</p>
<p>Amanhã vo passar um para criar e montar o xml com mysql e php</p>
<p>bom é simples o código</p>
<p><span style="color:#0000ff;">$.ajax({ <span style="color:#808080;">//Inicio do evento</span><br />
type: &#8220;GET&#8221;, <span style="color:#808080;">//Metodo que o ajax vai utilizar para pegar a informção</span></span><span style="color:#0000ff;"> </span><br />
<span style="color:#0000ff;">url: &#8220;noticia.xml&#8221;, <span style="color:#808080;">//Caminho do arquivo</span></span><br />
<span style="color:#0000ff;"> dataType: &#8220;xml&#8221;, <span style="color:#808080;">//Tipo dos dados</span><br />
success: function(xml) { <span style="color:#808080;">//Se for bem sucedido</span><br />
$(xml).find(&#8220;noticia&#8221;).each(function(){ <span style="color:#808080;">//Agora o jquery vai procurar cada sessão do xml no caso aqui é a tag noticia que tem como filhas as tags titulo e texto </span><br />
var titulo = $(this).find(&#8220;titulo&#8221;).text(); <span style="color:#808080;">//A variavel que recebe o conteudo da tag titulo</span><br />
var texto = $(this).find(&#8220;texto&#8221;).text(); <span style="color:#808080;">//A variavel que recebe o conteudo da tag texto</span><br />
noticia += &#8220;&#60;strong&#62;&#8221;+titulo+&#8221;&#60;/strong&#62;&#60;br&#62;&#60;br&#62;&#8221;+texto;<span style="color:#808080;">//Aqui ele monta o texto da noticia pode ser do jeito que você precisar colocar dentro de div lista tanto faz rs</span><br />
}); <span style="color:#808080;">//Fecha o laço</span><br />
} <span style="color:#808080;">//Aqui eu coloquei só no caso de sucesso aqui você pode continuar o ajax no caso de erro</span><br />
}); <span style="color:#808080;">//Fecha o ajax</span></span></p>
<p><span style="color:#0000ff;"><span style="color:#808080;"><span style="color:#000000;">Bom o código é simples pode ser personalizado sem problemas</span></span></span></p>
<p><span style="color:#0000ff;"><span style="color:#808080;"><span style="color:#000000;">Se tiver dúvida pode deixar um comentário</span></span></span></p>
<p><span style="color:#0000ff;"><span style="color:#808080;"><span style="color:#000000;">Até!<br />
</span></span></span></p>
<p><span style="color:#ffffff;">Webtutoriais:2ED55B30</span></p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Open API integration of Resume Parser for Job-Boards and ATS vendors.]]></title>
<link>http://recruitplus.wordpress.com/2009/08/31/open-api-integration-of-resume-parser-for-job-boards-and-ats-vendors/</link>
<pubDate>Mon, 31 Aug 2009 07:21:13 +0000</pubDate>
<dc:creator>Gaurav Mittal</dc:creator>
<guid>http://recruitplus.wordpress.com/2009/08/31/open-api-integration-of-resume-parser-for-job-boards-and-ats-vendors/</guid>
<description><![CDATA[When a new user registers on a Job-Board, he/she has to fill all the required information manually. ]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>When a new user registers on a Job-Board, he/she has to fill all the required information manually. Only a handful job-boards provide an option to the candidate to browse his/her resume from PC and all the required fields by job-boards gets filled automatically. </p>
<p>Should you think, it could be another crucial success factor for your job-board; you might want to look at our offering at http://onlineresumeparser.com. </p>
<p>The candidate resume can be in any format like .doc, .docx, .pdf, .rtf, .html or .txt, whether in file folder or email or already available on any other job-portal or career networking site, the resume parser will parse the canidate details with an accuracy level of 80% to 95%. Also the resume gets saved to the database in .doc format irrespective if the browsed resume had some other format. Thus candidate spends time only to correct the information which is not accurately parsed and total time to register on Job-board gets reduced.</p>
<p>Do a signup and sign in; parse a few resumes and check out the accuracy level yourself. You need to have .net framework 2.0 installed on your pc to use this application.</p>
<p>We provide open API integration of our resume parser for all job boards/ applicant tracking systems/HR applications across globe.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Progresser, raid... (or not to raid) ?]]></title>
<link>http://wowfrostmage.wordpress.com/2009/08/17/to-raid-or-not-to-raid/</link>
<pubDate>Mon, 17 Aug 2009 19:04:40 +0000</pubDate>
<dc:creator>Ccelenn</dc:creator>
<guid>http://wowfrostmage.wordpress.com/2009/08/17/to-raid-or-not-to-raid/</guid>
<description><![CDATA[En raid, nous sommes 10 ou 25 joueurs, chacun avec ses forces et ses faiblesses, ses envies, ses hab]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p><img class="alignright size-full wp-image-2038" title="guerriere" src="http://wowfrostmage.wordpress.com/files/2009/08/guerriere.jpg" alt="guerriere" width="337" height="390" />En raid, nous sommes 10 ou 25 joueurs, chacun avec ses forces et ses faiblesses, ses envies, ses habiletés et ses &#8220;handicaps&#8221;&#8230; Lorsqu&#8217;on est seul, jouer est facile : on fait ce qu&#8217;on veut, comme on veut, quand on veut. Mais lorsqu&#8217;on se retrouve en groupe, il n&#8217;est plus question de ne penser qu&#8217;à soit et de n&#8217;en faire qu&#8217;à sa tête. Il devient IN-DIS-PEN-SA-BLE de penser d&#8217;abord aux autres, et ceci est valable aussi bien pour le joueur lambda que je suis, que pour le ou les &#8220;leader(s)&#8221;.</p>
<p>Je ne suis pas un donneur la leçon, je suis comme tout le monde et je ne cherche qu&#8217;à progresser dans ce jeu que j&#8217;aime beaucoup. Pour cela, j&#8217;utilise depuis longtemps un outil vraiment formidable qui me sert à prendre conscience de mes forces, de mes faiblesses, à mieux connaitre ma classe de personnage et aussi globalement à mieux me connaitre en tant que joueur. Autant d&#8217;éléments qui m&#8217;aident à progresser.  Cet outil s&#8217;appelle le &#8220;<strong>Journal de combat</strong>&#8220;. Je l&#8217;enregistre dès que j&#8217;y pense quand le raid ou la rencontre prévue va demander des efforts et un apprentissage. Une fois le journal de combat enregistré, il est alors possible de l&#8217;analyser grâce à des programmes en ligne comme wow web stats, wow meter online ou encore World of logs qui a l&#8217;avantage de présenter l&#8217;information de façon grahique, très parlante, et dont les chiffres importants peuvent rapidement être trouvés.</p>
<p>Il ne s&#8217;agit pas de savoir qui fait le plus de DPS. Il ne s&#8217;agit pas de savoir qui fait le plus de HPS. Il ne s&#8217;agit pas de savoir si untel est &#8220;meilleur&#8221; qu&#8217;untel ou d&#8217;une façon générale de s&#8217;espionner les uns les autres dans le raid.<strong> Il s&#8217;agit pour chacun de savoir ce qu&#8217;il vaut par lui-même</strong>, de prendre ça pour base, et ensuite de s&#8217;auto-évaluer dans ses prochaines performances : est-ce que mon sort &#8220;X&#8221; est rentable ? Est-ce que j&#8217;aurais pu faire quelque-chose à cet instant alors que je vois que j&#8217;ai été immobile ? Il y a plein de questions qu&#8217;on peut avantageusement se poser en analysant le journal de combat à travers un site comme <a href="http://www.worldoflogs.com/" target="_blank">Worldoflogs.com</a>.</p>
<p><a href="http://www.worldoflogs.com/"><img class="aligncenter size-full wp-image-2041" title="worldoflogs" src="http://wowfrostmage.wordpress.com/files/2009/08/worldoflogs.png" alt="worldoflogs" width="340" height="123" /></a></p>
<p>Enfin, si on veut progresser.</p>
<p>Si vous faites partie des gens qui veulent progresser dans World of Warcraft, en tant que joueur et non en tant que personnage, alors je vous invite à regarder comme moi attentivement les statistiques de vos raids et à y chercher ce qui peut vous aider à mieux remplir votre rôle tout en gardant à l&#8217;esprit que&#8230;</p>
<p>&#8230;le plaisir de jouer d&#8217;abord.</p>
<p>Parce que <strong>World of Warcraft n&#8217;est qu&#8217;un jeu</strong>, un loisir, un plaisir, et que personne ne devrait y cliquer le bouton &#8220;Connexion&#8221; comme il glisse sa carte de pointage au boulot : <em>le raid est une partie très exigente du jeu qui n&#8217;est pas pour tous les caractères</em>, et il n&#8217;y a aucun jugement de valeur à avoir à ce propos car chacun est libre de ses goûts, de ses envies, de ses besoins et de ses plaisirs.</p>
<p>Bonne journée.</p>
<p>-Cc</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[The new face of a conspiracy]]></title>
<link>http://sagito.wordpress.com/2009/08/13/the-new-face-of-a-conspiracy/</link>
<pubDate>Thu, 13 Aug 2009 18:48:34 +0000</pubDate>
<dc:creator>sagito</dc:creator>
<guid>http://sagito.wordpress.com/2009/08/13/the-new-face-of-a-conspiracy/</guid>
<description><![CDATA[Hi everyone! It was a long time since I posted something new here, because my network keeps going do]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Hi everyone! It was a long time since I posted something new here, because my network keeps going down&#8230; But this time, I&#8217;m bringing you news about the Conspiracy engine. Missed him? Me too&#8230; <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' />  Well, it has a brand new face since many new features have been added that make him stand much higher then before.</p>
<p>To start, a small problem with the vertices was corrected. Somehow the vertex data was getting scrambled all the way through the CMOL (Conspiracy Mesh Optimization Pipeline), and I ended up removing some of the optimization routines that could be handled in another way by DirectX himself. And by doing so, the <a href="http://sagito.wordpress.com/2009/07/31/constraints/">GenerateAdjacency</a> problem was also solved! This way, the meshes are more perfect and faster than ever! <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_biggrin.gif' alt=':D' class='wp-smiley' /> </p>
<p>Also, I&#8217;ve implemented a XML parser system from scratch. With that, I created implicit support for something else: Scene graphs! And this is a very important part of the engine, as some objects might now be loaded directly through a simple configuration file. This is great for scenarios, lights, etc., but also for the future creation of a visual editor for the engine!</p>
<p>With so many new features, something logical occurred! A game is being developed on top of it and I can guarantee that it is already in a very advanced state! Stand by for more news! <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[It's Working!]]></title>
<link>http://research2009.wordpress.com/2009/08/10/its-working/</link>
<pubDate>Mon, 10 Aug 2009 09:27:21 +0000</pubDate>
<dc:creator>jasetiojanco</dc:creator>
<guid>http://research2009.wordpress.com/2009/08/10/its-working/</guid>
<description><![CDATA[Yay! After hours of coding I&#8217;ve finally finished the prototype for the XLS Parser. It only sup]]></description>
<content:encoded><![CDATA[Yay! After hours of coding I&#8217;ve finally finished the prototype for the XLS Parser. It only sup]]></content:encoded>
</item>
<item>
<title><![CDATA[POIng!]]></title>
<link>http://research2009.wordpress.com/2009/08/08/poing/</link>
<pubDate>Sat, 08 Aug 2009 09:29:28 +0000</pubDate>
<dc:creator>jasetiojanco</dc:creator>
<guid>http://research2009.wordpress.com/2009/08/08/poing/</guid>
<description><![CDATA[For the Excel Parser, I have chosen the Apache POI HSSF/XSSF parser. It seems to have better support]]></description>
<content:encoded><![CDATA[For the Excel Parser, I have chosen the Apache POI HSSF/XSSF parser. It seems to have better support]]></content:encoded>
</item>
<item>
<title><![CDATA[Функциональные комбинаторы парсеров в Python]]></title>
<link>http://vlasovskikh.wordpress.com/2009/07/28/python-functional-parser-combinators/</link>
<pubDate>Tue, 28 Jul 2009 17:38:40 +0000</pubDate>
<dc:creator>vlasovskikh</dc:creator>
<guid>http://vlasovskikh.wordpress.com/2009/07/28/python-functional-parser-combinators/</guid>
<description><![CDATA[С некоторого времени я стал делать на Python часть моих повседневных задач по анализу языков, трансл]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>С некоторого времени я стал делать на Python часть моих повседневных задач по анализу языков, трансляторам и пр. Вначале для вспомогательных целей, а потом и для парсинга небольших языков, прототипирования грамматик, деревьев AST, трансформаций кода. Многие при этом подумают про OCaml, но в Unix-среде (привет <a href="http://archlinux.folding-maps.org/">spb-archlinux</a>!) от Python с его библиотеками пользы больше.</p>
<p>Для задач парсинга я написал <em>библиотеку <a href="http://code.google.com/p/funcparserlib/">funcparserlib</a></em>. Эта библиотека предназначена для создания парсеров по методу рекурсивного спуска на основе функциональных комбинаторов. Также я написал <a href="http://archlinux.folding-maps.org/2009/funcparserlib/Tutorial">вводное руководство по funcparserlib</a> (на английском), которое будет интересно всем, увлекающимся функциональным программированием (FP) и/или языком Python. Рекомендую его почитать!</p>
<p>Вот, например, <a href="http://archlinux.folding-maps.org/2009/funcparserlib/Illustrated">такие картинки деревьев</a> можно легко получать с помощью funcparserlib:</p>
<pre><code>&#62;&#62;&#62; print dotparser.pretty_parse_tree(tree)
Graph [id=g1, strict=False, type=digraph]
`-- stmts
    &#124;-- Edge
    &#124;   &#124;-- nodes
    &#124;   &#124;   &#124;-- n1
    &#124;   &#124;   &#124;-- n2
    &#124;   &#124;   `-- SubGraph [id=n3]
    &#124;   &#124;       `-- stmts
    &#124;   &#124;           &#124;-- Edge
    &#124;   &#124;           &#124;   &#124;-- nodes
    &#124;   &#124;           &#124;   &#124;   &#124;-- nn1
    &#124;   &#124;           &#124;   &#124;   &#124;-- nn2
    &#124;   &#124;           &#124;   &#124;   `-- nn3
    &#124;   &#124;           &#124;   `-- attrs
    &#124;   &#124;           `-- Edge
    &#124;   &#124;               &#124;-- nodes
    &#124;   &#124;               &#124;   &#124;-- nn3
    &#124;   &#124;               &#124;   `-- nn1
    &#124;   &#124;               `-- attrs
    &#124;   `-- attrs
    `-- Edge
        &#124;-- nodes
        &#124;   &#124;-- SubGraph [id=n3]
        &#124;   &#124;   `-- stmts
        &#124;   `-- n1
        `-- attrs

</code></pre>
<p>Итак, предлагаю взглянуть на <a href="http://archlinux.folding-maps.org/2009/funcparserlib/Tutorial">руководство</a>, а питонистам — попробовать funcparserlib, посмотреть другие доки и примеры на <a href="http://code.google.com/p/funcparserlib/">сайте библиотеки</a>.</p>
<p><!--more Дальше идут особенности funcparserlib, сравнение с pyparsing и LEPL, история библиотеки... --></p>
<p>Отличительные особенности библиотеки funcparserlib:</p>
<ul>
<li>Несколько необходимых удобных комбинаторов парсеров (API всего 14 вызовов). Код получается компактным, очень похожим по языку на xBNF-грамматики</li>
<li>Маленький размер самой библиотеки: всего лишь 0.5 KLOC с комментариями</li>
<li>Обнаружение ошибок по методу длиннейшего разобранного префикса даёт разумные сообщения об ошибках разбора</li>
<li>Маленький токенизатор на основе регулярных выражений позволяет следить за позицией лексем в тексте, выдавать её в сообщениях</li>
</ul>
<p>При своём небольшом размере, библиотека является достаточной для написания парсеров весьма больших грамматик. Но главное предназначение — разбор небольших языков и языков DSL (предметно-ориентированных).</p>
<p>Для Python существуют несколько библиотек синтаксического анализа. Сравним некоторые из них с funcparserlib:</p>
<ul>
<li><a href="http://pyparsing.wikispaces.com/">pyparsing</a>. Самая популярная библиотека. Имеет не очень большой размер кода (3.7 KLOC), очень избыточный разношёрстный API (около сотни вызовов), довольно медленная (по простым тестам в 3 раза медленнее, чем funcparserlib)</li>
<li><a href="http://www.acooke.org/lepl/">LEPL</a>. Библиотека с большой функциональностью, опциями и пр. (API содержит около сотни вызовов) Имеет очень большие для данной задачи исходные коды (около 15 KLOC). Быстрая, по утверждению авторов</li>
</ul>
<p>Библиотека funcparserlib возникла поначалу из игрушечного <a href="http://vlasovskikh.wordpress.com/2008/07/24/simple_recursive_json_parser/">примера парсера JSON</a>, который я написал в 2008 году. Пример был создан, чтобы показать, что можно писать парсеры, в точности соответствующие формальной грамматике языка. Летом 2009 года я вернулся к парсерам на Python и решил дописать библиотеку, добавить токенизатор на regexps, выполнить оптимизации и т. д. На данный момент доступна версия 0.3.2, по которой я написал довольно много документации (на английском).</p>
<p>Теперь funcparserlib включает вполне приличный парсер JSON как один из примеров. Этот парсер поддерживает JSON со всеми нюансами и по скорости всего в 3 раза медленнее, чем специализированная библиотека simplejson. А исходного кода — в 8 раз меньше, намного более читаемого <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[SQL Formatter and Pretty Printer]]></title>
<link>http://markfarnsworth.wordpress.com/2009/07/13/sql/</link>
<pubDate>Mon, 13 Jul 2009 01:01:39 +0000</pubDate>
<dc:creator>Mark</dc:creator>
<guid>http://markfarnsworth.wordpress.com/2009/07/13/sql/</guid>
<description><![CDATA[A couple of weeks ago I started work to build a universal parser for SQL.  The work started with a b]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>A couple of weeks ago I started work to build a universal parser for SQL.  The work started with a basic need I had for minimal parsing of SQL statements in the GarinDriver JDBC project.  I was unable to find any open source parser code that would fully support the basic structure used by MySQL, PostgreSQL, Oracle, and other similar systems.   In particular, I wanted support for q&#8217;&#60;&#62;&#8217; used by Oracle in addition to the SQL standard string encoding, MySQL&#8217;s use of C style string escaping, and $tag$ style strings for PostgreSQL and H2.  Conceptually I wanted to build ONE parser that could be configured to properly parse ANY dialect and that would support a framework for adding additional exceptions in the future.</p>
<p>The basics of my parser framework is now complete. It is smart enough to parse MySQL, H2, HSQLDB, Oracle, PostgreSQL, and ISO SQL2003 variants. The shallow parsing framework uses a base Token class and a range of subclasses to describe the most basic elements of the SQL grammar. The parser identifies the core keywords, statement boundaries, string, comment, and identifier limits but does not look into deeper language issues like statement structure.   One benifit of the shallow parsing is that it allows for flexibiltiy and does not require a complete BNF style grammar.  Even with only this basic level, it is possible to leverage this code for useful stuff beyond the GarinDriver, LiquiBase data migration solution.</p>
<p>The following bare bones test page demonstrates how the parser can be used to format and add color to SQL batches.  The code would also be useful for someone building a universal SQL editor.<br />
<a style="text-decoration:none;" href="http://markfarnsworth-dev.appspot.com/RenderSql">http://markfarnsworth-dev.appspot.com/RenderSql</a></p>
<p>Overall, I am happy with the approach and while I plan to use a more established framework like ANTLR or XTEXT for future deep parsing efforts I feel my homegrown framework provides a better approach for shallow parsing vs. the larger and more complex language tools that I have reviewed.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[A Universal Parser for SQL]]></title>
<link>http://markfarnsworth.wordpress.com/2009/07/08/parse/</link>
<pubDate>Wed, 08 Jul 2009 03:11:24 +0000</pubDate>
<dc:creator>Mark</dc:creator>
<guid>http://markfarnsworth.wordpress.com/2009/07/08/parse/</guid>
<description><![CDATA[I recently completed some work on the GarinDriver to research a more flexible model for database sch]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>I recently completed some work on the GarinDriver to research a more flexible model for database schema change management.   This work gave me a chance to dig a little deeper into the nuts and bolts of Java database interactions.  Working to track schema changes and support a wide range of databases increased my appreciating for powerful open source database platforms available today.  In particular, working with H2 Database gave me a chance to explore what I now think is one of the hidden treasures of the open source DB marketplace.  Overall it was a fun bit of hobby coding and I believe it will be useful in future real world projects.</p>
<p>Working on the driver, one challenge that went beyond my expectation was parsing the various dialects of SQL.   In a review of current open source projects, I did not find any public license parsers that can parse statements from ALL database systems.  In a sense, we are supposed to have a standard but it is really well followed by the vendors and as such writing a universal parse is a bit difficult.   I had some free time this weekend so I took on the challenge since it is an area where the open source community really does not currently have a good solution.  For example the tools that ship with Eclipse can not parse the $tag$ style strings from PostgreSQL or the q&#8217;[O'Brian]&#8216; style supported by Oracle.   In any case,  I had some free time and interest so I decided to build my own parser.  Without such a parser it would be impossible to provide full support for the statement execution model needed to support the GarinDriver desgin.  Workarounds were possible but a proper parser felt like the best approach.</p>
<p>Building a parser is complex and often involves specialized tools (i.e. ANTLR, BISON, JAVACC, FLEX, LEX, YACC, etc).  There are benefits and drawbacks to the traditional grammar definition and parser generator approach.   My initial review it seemed that supporting the often contradictory approaches used by different systems would be quite difficult with these tools.</p>
<p>Since all that I needed for the GarinDriver/LiquiBase project was &#8220;shallow parsing&#8221; my approach was to use a small framework of Java classes.  The use of my own framework allows me to share logic across the SQL dialects and provided a chance to explore idea for a more dynamic approach to &#8220;shallow parsing&#8221;.   So far the approach seems to be working out nicely although I am considering building a deep parser at some point later on with the Eclipse <a href="http://www.eclipse.org/Xtext/">XTEXT</a> project but for now the flexiiblity of a home grown hand coded parser has proved to be a viable approach for &#8220;shallow parsing&#8221;.</p>
<p><strong>Parser Fun:</strong></p>
<ul>
<li>Standard SQL comments have EOL style (&#8211;) and the block style (/* */).</li>
<li>Block style comments nest within each other so parser must count the nesting levels.</li>
<li>MySQL supports pound sign comments in addition to the standard forms.</li>
<li>HSQLDB supports // style comments in addition to the standard forms.</li>
<li>Standard SQL uses double quotes for identifiers and single quotes for text strings.</li>
<li>The parser must support both including quote symbol doubling for escapes (i.e. O&#8221;Brian or My &#8220;&#8221;big&#8221;" table.</li>
<li>Oracle supports q&#8217;[O'Brian]&#8216; in addition to the standard style.</li>
<li>MySQL uses C style string encoding (i.e. &#8216;O\&#8217;Brian&#8217; ).</li>
<li>PostgreSQL supports $tag$O&#8217;Brian$tag$ style in addition to the standard forms AND the mysql format.</li>
</ul>
<p>My parser uses object oriented techniques to define a base token concept, extend the base concept for SqlStatements, and then extend the SqlStatements to define the dialect variants in what I hope will be an extendable framework.  This approach provides more flexibility for future growth vs. more static models and parser generator tools.  In any case the approach seems to work and the ability to use object orientation to extend the parser in new directions seems like a good thing.</p>
<p>As with other hobby projects, my work in this area is EPL and hosted on Google code.  The documentation is sparse but if you are looking for SQL parser code you may find that this code can help you develop new and interesting tools for working with SQL systems.</p>
<p><a href="http://code.google.com/p/garinparser/">http://code.google.com/p/garinparser/</a></p>
<p>If you decide to use the parser please let me know.  Also, if you can define a legal SQL statement that does not parse correctly with my parser let me know since I like to eat tasty bugs.</p>
</div>]]></content:encoded>
</item>

</channel>
</rss>
