<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress.com" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>s-expressions &amp;laquo; WordPress.com Tag Feed</title>
	<link>http://en.wordpress.com/tag/s-expressions/</link>
	<description>Feed of posts on WordPress.com tagged "s-expressions"</description>
	<pubDate>Sat, 02 Jan 2010 04:30:27 +0000</pubDate>

	<generator>http://en.wordpress.com/tags/</generator>
	<language>en</language>

<item>
<title><![CDATA[XML or S-expressions?]]></title>
<link>http://rwmj.wordpress.com/2009/10/30/xml-or-s-expressions/</link>
<pubDate>Fri, 30 Oct 2009 16:00:31 +0000</pubDate>
<dc:creator>rich</dc:creator>
<guid>http://rwmj.wordpress.com/2009/10/30/xml-or-s-expressions/</guid>
<description><![CDATA[I was writing a little program to track monthly outgoings. &#8220;Only&#8221; £30/month for internet]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>I was writing a little program to track monthly outgoings.  &#8220;Only&#8221; £30/month for internet access or whatever can quickly add up &#8230;</p>
<p>But what format should I save the data in?  XML is heavyweight and redundant compared to S-expressions, compare:</p>
<table>
<tr>
<td>
<pre style="width:18em;">
&#60;outgoing rate="monthly"&#62;
  &#60;price&#62;30.&#60;/price&#62;
  &#60;name&#62;Internet&#60;/name&#62;
&#60;/outgoing&#62;</pre>
</td>
<td>
<pre style="width:18em;">(outgoing
  (rate monthly)
  (price 30.)
  (name "Internet"))</pre>
</td>
</tr>
</table>
<p><i>(Update: fixed XML x 2)</i></p>
<p>One difference I always notice is the redundancy of attributes like rate=&#8221;monthly&#8221;.  S-expressions let you decide to make the attribute structured, but with XML you&#8217;re stuck with a simple string unless you make an incompatible change to the schema.</p>
<p>Another difference is that S-expressions are typed.  30 is a float and &#8220;Internet&#8221; is a string.  XML is all just strings, which sucks when your language is typed.</p>
<p>On the other hand <a href="http://www.prescod.net/xml/sexprs.html">this article makes a good argument that XML is not (and is better than) S-expressions</a>.  More debate <a href="http://c2.com/cgi/wiki?XmlIsaPoorCopyOfEssExpressions">here</a>.</p>
<p>A killer feature of OCaml is the <a href="http://www.ocaml.info/home/ocaml_sources.html#sexplib310">sexplib</a> syntax extension which makes S-expressions really easy.  You just define any OCaml type in the usual way, and add <code>with sexp</code> after it, and that magically generates serializer and deserializer functions for your type, so you can slurp your data into and out of S-expression files effortlessly.  A page of boilerplate disappears in just two words.  That&#8217;s probably the reason why I&#8217;ll go with S-expressions for this.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Extrapol, part 1: from C to Effects]]></title>
<link>http://dutherenverseauborddelatable.wordpress.com/2008/06/03/extrapol-part-1-from-c-to-effects/</link>
<pubDate>Tue, 03 Jun 2008 17:41:01 +0000</pubDate>
<dc:creator>yoric</dc:creator>
<guid>http://dutherenverseauborddelatable.wordpress.com/2008/06/03/extrapol-part-1-from-c-to-effects/</guid>
<description><![CDATA[Here comes the long-promised description of Extrapol, my main ongoing research project. In a few wor]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p style="text-align:justify;">
<p style="text-align:justify;">Here comes the long-promised description of Extrapol, my main ongoing research project. In a few words, our objective with Extrapol is to fill a hole in the current suite of tools built to ensure the security of systems. While there&#8217;s an ample amount of stuff designed to analyse the behaviour of processes either during their execution (dynamic analysis) or after their completion (trace analysis), there is little work on applying static analysis to actual system security.</p>
<p><!--more--></p>
<h2>What does this program do?</h2>
<p style="text-align:justify;">Whether we&#8217;re talking about SELinux, AppArmor, the Unix memory protection, ACLs, every form of dynamic analysis is trying to find the answer to two questions:</p>
<ol>
<li>what is this program doing?</li>
<li>is that legal?</li>
</ol>
<p style="text-align:justify;">With Extrapol, the questions are similar, although we&#8217;re asking them earlier &#8212; and restricting ourselves to C programs:</p>
<ol>
<li>what is this program going to do whenever it gets executed?</li>
<li>is that legal?</li>
<li>if information is missing, where will that information come from?</li>
</ol>
<p>Let&#8217;s look at an example and see what gives:</p>
<pre class="brush: cpp;">
void* read_some_stuff()
{
   FILE* file = fopen(&quot;/home/foo/bar.log&quot;, &quot;r&quot;);
   void* buf  = malloc(1024*sizeof(int));
   fread(buf , sizeof(int), 1024, file);
   return buf;
}
</pre>
<p style="text-align:justify;">Now, what does this function do? Now, regardless of the quality of the code, from a cursory read, it&#8217;s quite clear that it</p>
<ul>
<li>opens a file named &#8220;/home/foo/bar.log&#8221; for reading</li>
<li>allocates some memory</li>
<li>read some of the content of that file</li>
<li>return a pointer to the contents of that file.</li>
</ul>
<p style="text-align:justify;">In terms of security, the program should only be executed by a user who has the authorization to read file &#8220;/home/foo/bar.log&#8221; and only if that user is willing to let the program read the contents of that file. Let&#8217;s see what Extrapol tells us about the extract:</p>
<pre class="brush: python;">
read_some_stuff: Function
        effect                 : &quot;open&quot; (Constant &quot;/home/foo/bar.log&quot; , Constant &quot;r&quot; )
        effect                 : &quot;read&quot; (Constant &quot;/home/foo/bar.log&quot; )
        return                 : &quot;data read from file&quot; (Constant &quot;/home/foo/bar.log&quot; )
End
</pre>
<p>In clear(er) English, Extrapol has just given us the following informations:</p>
<ul>
<li>read_some_stuff is a function</li>
<li>read_some_stuff does not take any argument</li>
<li>if executed, this function will have the secondary effect of opening a file named &#8220;/home/foo/bar.log&#8221; for reading</li>
<li>if executed, this function will also have the secondary effect of reading stuff from a file named <tt>"/home/foo/bar.log"</tt></li>
<li>the result of calling this function is some data read from the file named <tt>"/home/foo/bar.log"</tt></li>
</ul>
<p style="text-align:justify;">While the result is somewhat larger than the original program, of course, we can apply Extrapol to more complex programs. Say, from</p>
<pre class="brush: cpp;">
#include &lt;stdio.h&gt;
#include &lt;sys/types.h&gt;
#include &lt;sys/stat.h&gt;
#include &lt;fcntl.h&gt;

int do_open(char* file_name)
{
  int fd = (int)fopen(file_name, O_RDONLY);
  return fd;
}

int do_read(int fd)
{
  char* buf;
  buf    = (char*)malloc(1024*sizeof(char));
  int rd = 0;
  for(int i = 0; i &lt; 1024; ++i)
    rd = rd + fread(buf, i, 1, fd);

  return buf;
}
</pre>
<p>Well, in that case, Extrapol will analyze both functions and produce a summary of each.</p>
<ul>
<li> The analysis of <code>do_open</code> will result in:
<pre class="brush: php;">
do_open: Function
        input arg file_name    : Bottom
        effect                 : &quot;open&quot; (Identifier &quot;file_name&quot;, Constant &quot;0&quot; )
        return                 : &quot;FILE&quot; (Identifier &quot;file_name&quot; )
End
</pre>
<p>that is,</p>
<ul>
<li><tt>do_open</tt> is a function</li>
<li><tt>do_open</tt> accepts one argument <code>file_name</code></li>
<li>the content of this argument won&#8217;t be modified</li>
<li>this argument doesn&#8217;t seem to require to contain anything specific</li>
<li>the function has one system effect: opening for reading the file whose name was given as <code>file_name</code></li>
<li>the function returns some abstract data, in that case a file whose name name was given as <code>file_name</code>.</li>
</ul>
</li>
<li> The analysis of <code>do_read</code> will result in:
<pre class="brush: php;">
do_read: Function
        input arg fd           : &quot;FILE&quot; (Identifier &quot;extrapol_generated_699 (formerly &lt;anonymous&gt; ) &quot; )
        effect                 : &quot;read&quot; (Identifier &quot;extrapol_generated_699 (formerly &lt;anonymous&gt; ) &quot; )
        return                 : &quot;data read from file&quot; (Identifier &quot;extrapol_generated_699 (formerly &lt;anonymous&gt; ) &quot; )
End</pre>
<p>that is,</p>
<ul>
<li><tt>do_read</tt> is a function</li>
<li><tt>do_read</tt> accepts one argument <code>fd</code></li>
<li>the content of this argument won&#8217;t be modified</li>
<li>this argument must be abstract data, in that case a file</li>
<li>for the rest of this function we will call the filename <code>extrapol_generated_699 (formerly &#60;anonymous&#62;)"</code></li>
<li>the function has one system effect: reading from the file whose name was given as <code>extrapol_generated_699 (formerly &#60;anonymous&#62; )</code></li>
<li>the function returns some abstract data, in that case data read from the file whose name was given as <code>extrapol_generated_699 (formerly &#60;anonymous&#62; )</code></li>
</ul>
</li>
</ul>
<p style="text-align:justify;">Now, chances are that we may want to use both functions. As it turns out, whenever a function has been analysed, Extrapol places the information it has deduced in its knowledge base (the <em>environment</em>) and may use it for further deductions.</p>
<p>Now, let&#8217;s feed the following additional source to Extrapol:</p>
<pre class="brush: cpp;">
int main(int argc, char **argv) {
  assert(argc &gt;= 1);
  int   fd  = do_open(argv[0]);
  char* buf = do_read(fd);
  free(buf);
  return 0;
}
</pre>
<p>In return, Extrapol will answer:</p>
<pre class="brush: php;">
main: Function
        input arg argc         : &quot;command-line argument&quot; ()
        input arg argv         : &quot;command-line argument&quot; ()
        input vararg           : &quot;command-line argument&quot; ()
        effect                 : &quot;read&quot; (Identifier &quot;argv&quot; )
        effect                 : &quot;open&quot; (Identifier &quot;argv&quot;, Constant &quot;0&quot; )
        return                 : Constant &quot;0&quot;
End</pre>
<p style="text-align:justify;">In other words, the effect of the full program will be to open and read from a file whose name was provided as a command-linen argument.</p>
<p style="text-align:justify;">
<h2>How does this work ?</h2>
<p style="text-align:justify;">For today&#8217;s entry, I&#8217;ll take that question with the meaning of &#8220;How do I use it ?&#8221; Well, it&#8217;s simple. Extrapol takes as input one or more C source files (they don&#8217;t have to be pre-processed) and a knowledge base of primitive functions. This knowledge base is an ASCII file with the same format as Extrapol&#8217;s answers.</p>
<p style="text-align:justify;">For instance, to obtain the above demos, we provided Extrapol with the following piece of information for function <code>malloc</code>:</p>
<pre class="brush: cpp;">
malloc: Function
  input arg size: Bottom
  return: Bottom
End
</pre>
<p style="text-align:justify;">In other words, <code>malloc</code> is a function accepting one argument named <code>size</code>. Nothing special is expected from <code>size</code> and the returned value is compatible with any value.</p>
<p style="text-align:justify;">Now, <code>fread</code> is a more complex function. Here&#8217;s the prototype of the function, as presented in its <code>man</code> page:</p>
<pre class="brush: cpp;">
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
</pre>
<p>For Extrapol, that&#8217;s a function with 4 arguments and a return value. Let&#8217;s start from the end.</p>
<ul>
<li>The return value could be anything, we won&#8217;t make any guarantee (that&#8217;s called <code>Top</code> &#8212; we could refine that).</li>
<li>The last argument is a file. We&#8217;ve seen earlier how to deal with files, they&#8217;re abstract structures, which we&#8217;ve decided by convention to call <code>"FILE"(foobar)</code>, where <code>foobar</code> is either the name of the file or a pointer to a variable/argument which contains that name. In that context, <code>"FILE"</code> is called a <em>constructor</em>, by the way. It&#8217;s just a character string which we use to differentiate abstract values. Correspondingly, <code>foobar</code> is the <em>argument</em> of <code>"FILE"</code>.</li>
<li>The second to last argument is a number of blocks to read. We&#8217;re just going to ignore it. In other words, that argument is compatible with any value. That&#8217;s <code>Bottom</code>.</li>
<li>The second argument is a size. We&#8217;re also going to ignore it, so <code>Bottom</code> again.</li>
<li>Finally, the first argument is an output argument &#8212; functionally, it&#8217;s a special form of <code>return</code>, so we&#8217;re going to mark that argument as <code>output</code>. We could just specify that it may contain anything and refuse to make any guarantee, but here, we&#8217;re going to refine a bit and specify that it&#8217;s abstract data, read from the file. Let&#8217;s call the constructor <code>"data read from file"</code> and give it as argument the name of the file. This gives <code>"data read from file"(foobar)</code>.</li>
<li>Oh, yeah, and before we forget, this function has a side-effect we&#8217;re interested in: it reads some content from that file <code>foobar</code>. Let&#8217;s call that effect <code>"read"(foobar)</code></li>
</ul>
<p>Or, in full Extrapol syntax:</p>
<pre class="brush: php;">
fread: Function
   output arg ptr: &quot;data read from file&quot; ( Identifier &quot;path&quot; )
   input arg size : Bottom
   input arg nmemb: Bottom
   input arg stream: &quot;FILE&quot; ( Identifier &quot;path&quot; )
   return: Top
   effect: &quot;read&quot; ( Identifier &quot;path&quot; )
End
</pre>
<p style="text-align:justify;">Now, from a set of C source files and this knowledge base, Extrapol will pre-process C, parse it, remove unused symbols then examine the source code and proceed by success deductions until either</p>
<ul>
<li>something goes wrong (typically, a function is used without being defined)</li>
<li>every function has been examined.</li>
</ul>
<p style="text-align:justify;">Extrapol then outputs the result of his deductions, in a format fit for reinjection inside the knowledge base.</p>
<p>If you are curious, the theories behind this notion of deduction are</p>
<ul>
<li>lambda-calculus</li>
<li>dependent types</li>
<li>types with effects</li>
<li>Hindley-Milner-style type inference.</li>
</ul>
<p style="text-align:justify;">This will be detailed further in another blog entry &#8212; and in a journal paper, whenever I find the time to complete it.</p>
<h2>Does it work ?</h2>
<p style="text-align:justify;">The answer is yes, no and maybe.</p>
<ul>
<li>
<p style="text-align:justify;">Maybe: testing on real applications is something yet to do. Currently, Extrapol works on our sample set, a set containing no entry larger than 100 lines. We&#8217;ll know more when we find the courage to test, say, <code>tar</code> or <code>df</code>.</p>
</li>
<li>Yes: everything written above, and a dozen other samples, work.</li>
<li>No: some features are not implemented yet. Function pointers won&#8217;t work, nor will recursivity. I haven&#8217;t put together the theory behind these constructions yet, so I don&#8217;t know yet how hard all of this will be. In addition, for the current prototype, we can&#8217;t deduce anything interesting on global variables yet, and functions need to be declared in the order in which they are used. These two aspects will be easy to fix, they just don&#8217;t have the highest priority.</li>
</ul>
<h2>How is it written ?</h2>
<p>There are currently two different versions of Extrapol:</p>
<ul>
<li>Extrapol/Java is a Java-based implementation of the specification, by Steve-William Kissi, Bastien Jansen and David Teller. It is about 18,000 lines of code and, at this very moment, lags slightly behind the experimental version, Extrapol/ML.</li>
<li>Extrapol/ML is the set of specifications, written in OCaml by David Teller &#8212; as well as the experimental version. It is about 3,000 lines of code, purely functional except for logging.</li>
</ul>
<p style="text-align:justify;">Both versions are meant as open-source, although the licence hasn&#8217;t been 100% decided. Extrapol/Java may end up MIT-style with Extrapol/ML ending up LGPL + linking exception.</p>
<h2>Can I play with it ?</h2>
<p style="text-align:justify;">As soon as we&#8217;ve decided the licence, we&#8217;ll release the current prototypes. Ideally, this should happen before mid-June.</p>
</div>]]></content:encoded>
</item>

</channel>
</rss>
