<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress.com" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>r-r-code &amp;laquo; WordPress.com Tag Feed</title>
	<link>http://en.wordpress.com/tag/r-r-code/</link>
	<description>Feed of posts on WordPress.com tagged "r-r-code"</description>
	<pubDate>Sat, 26 May 2012 08:15:06 +0000</pubDate>

	<generator>http://en.wordpress.com/tags/</generator>
	<language>en</language>

<item>
<title><![CDATA[Useful R snippets]]></title>
<link>http://ryouready.wordpress.com/2012/03/18/useful-r-snippets/</link>
<pubDate>Sun, 18 Mar 2012 17:56:32 +0000</pubDate>
<dc:creator>markheckmann</dc:creator>
<guid>http://ryouready.wordpress.com/2012/03/18/useful-r-snippets/</guid>
<description><![CDATA[In this post we collect several R one- or few-liners that we consider useful. As our minds tend to f]]></description>
<content:encoded><![CDATA[<p><a title="on flickr by user zzpza Jan 2012, http://www.flickr.com/photos/zzpza/3269784239/. THX!" href="http://ryouready.files.wordpress.com/2012/01/tools.jpg"><img class="alignleft size-full wp-image-817" style="margin:8px;" title="tools" src="http://ryouready.files.wordpress.com/2012/01/tools.jpg" alt="on flickr by user zzpza Jan 2012, http://www.flickr.com/photos/zzpza/3269784239/. THX!" width="240" height="180" /></a>In this post we collect several R one- or few-liners that we consider useful. As our minds tend to forget these little fragments we jot them down here so we will find them again.<!--more--></p>
<h3>Subsequently re-calling a function that takes two arguments</h3>
<p>Suppose we wanted to call a function that takes two arguments and use the results as a argument to the same function again. For example may want to sum up the values 1 to 5 Of course the function <tt>sum</tt> will do this for us, but what if this function didn&#8217;t exist? We might of course write:</p>
<p><pre class="brush: r;">
1 + 2 + 3 + 4 + 5
</pre></p>
<p>But how do that in a single function call? Using <tt>do.call</tt> or the like will not work, as the function <tt>"+"</tt> takes two arguments.</p>
<p><pre class="brush: r;">
do.call(&#34;+&#34;, list(1:5))
</pre></p>
<p>The trick is to use the function <tt>Reduce</tt>.</p>
<p><pre class="brush: r;">
Reduce(&#34;+&#34;, 1:5)
&#62; 15
</pre></p>
<h3>Evaluating an R command stored in a character string</h3>
<p>From time to time, you may encounter situations where you have to evaluate a command which is stored in a character string. For example, let&#8217;s assume that we have the following variables:</p>
<p><pre class="brush: r;">
name1 &#60;- &#34;Steve&#34;
name2 &#60;- &#34;Bill&#34;
value1 &#60;- 1
value2 &#60;- 0
</pre></p>
<p>Now, what would you do if you have to create a vector with entries whose value is stored in the variables <tt>value1</tt> and <tt>value2</tt> and entry names whose value is stored in the variables <tt>name1</tt> and <tt>name2</tt>? You can write:</p>
<p><pre class="brush: r;">
command &#60;- paste(&#34;values=c(&#34;,name1,&#34;=&#34;,value1,&#34;,&#34;,
name2,&#34;=&#34;,value2,&#34;)&#34;,sep=&#34;&#34;)
values &#60;- eval(parse(text=command))
</pre></p>
<p>After issuing those command a vector named <tt>values </tt>is going to be created with named entries and values as follows</p>
<p><pre class="brush: r;">
Steve  Bill
1     0
</pre></p>
<h3>Creating an empty dataframe with zero rows</h3>
<p>Sometimes I want to fill up a dataframe from the frist row on. It might be useful do start off with a dataframe with zero rows for that purpose. The function <tt>numeric</tt> or <tt>character</tt> do the job. In case we wanted to specify a factor with predefined levels also <tt>factor</tt> may be useful.</p>
<p><pre class="brush: r;">
data.frame(a=numeric(), b=numeric())
data.frame(a=numeric(), b=character(), c=factor(levels=1:10), stringsAsFactors=F)
</pre></p>
<p>&#8230; to be continued.</p>
<p>Tamas and Mark</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[multi-platform real-time 'intro' in R using rdyncall]]></title>
<link>http://ryouready.wordpress.com/2011/07/29/multi-platform-intro-video-scripted-in-r-using-rdyncall/</link>
<pubDate>Fri, 29 Jul 2011 18:01:13 +0000</pubDate>
<dc:creator>markheckmann</dc:creator>
<guid>http://ryouready.wordpress.com/2011/07/29/multi-platform-intro-video-scripted-in-r-using-rdyncall/</guid>
<description><![CDATA[Guest post by Daniel Adler. Below is a real-time audio-visual multimedia demonstration &#8211; or in]]></description>
<content:encoded><![CDATA[<p><a href="http://ryouready.files.wordpress.com/2011/07/sousalicious4-320x240.png"><img class="alignleft size-medium wp-image-768" style="margin:7px 10px;" title="sousalicious4-320x240" src="http://ryouready.files.wordpress.com/2011/07/sousalicious4-320x240.png?w=300&h=225" alt="" width="300" height="225" /></a>Guest post by <a href="http://www.statoek.wiso.uni-goettingen.de/cms/user/index.php?lang=de&#38;section=institut.team.dadler" target="_blank">Daniel Adler</a>.</p>
<p>Below is a real-time audio-visual multimedia demonstration &#8211; or in short &#8216;an intro&#8217; &#8211; written in 100% pure R. It requires no compilation and runs across major platforms via the package <a href="http://cran.r-project.org/web/packages/rdyncall/index.html" target="_blank">rdyncall</a> and preinstalled precompiled standard libraries such as OpenGL and SDL libraries. This &#8216;happy-birthday&#8217; production runs about 3 minutes and comprises typical effects of the home computer oldschool demoscene era such as a rotating cube, multi-layer star field, text scrollers, still images and flashes while playing a nice Amiga Soundtracker module tune. Check out the video screen-cast (with sound) or enjoy a smooth framerate using the R version at <a href="http://dyncall.org/demos/soulsalicious/" target="_blank">this website</a>.</p>
<p><!--more--><div class='embed-vimeo' style='text-align:center;'><iframe src='http://player.vimeo.com/video/27059431' width='400' height='300' frameborder='0'></iframe></div></p>
<p>The <a href="http://cran.r-project.org/web/packages/rdyncall/index.html" target="_blank">rdyncall</a> package used for this video facilitates a dynamic middleware between R and  C libraries and offers an improved Foreign Function Interface for R. It enables developers to &#8216;script&#8217; system-level code in R such as OpenGL visualizations, multimedia applications, computer games or simply to call a single system service without the need for writing C code. The FFI toolkit offered by the package is flexible enough to address low-level C interfacing issues directly in R. R bindings to the C libraries are created dynamically with a single interface function &#8216;dynport&#8217; similar to &#8216;library&#8217; and the C interface is made available in R as if it is an extension to the language. Support for handling foreign C data types and callbacks is offered by helper utilities. An extendable repository of cross-platform bindings is delivered with the package that contains bindings to OpenGL 1, OpenGL 3, SDL, Expat, ODE, CUDA, OpenCL and more.</p>
<p>The implementation of <a href="http://cran.r-project.org/web/packages/rdyncall/index.html" target="_blank">rdyncall</a> is based on libraries of the <a href="http://dyncall.org">DynCall</a> project that offers a dynamic call facility between interpreted languages and precompiled native code with support for almost all basic C types (in constrast to &#8216;.C&#8217; in R). Call kernels &#8211; implemented in Assembly &#8211; offer a Foreign Function Interface solution that is small in size and generic in its application. The libraries have been ported across a large set of processor architectures (i386,AMD64,ARM,PowerPC 32-bit, MIPS 32/64-bit, SPARC 32/64-bit) and operating-systems including major R platforms.</p>
<p>The <a href="http://cran.r-project.org/web/packages/rdyncall/index.html" target="_blank">rdyncall </a>package comes with a couple of demos, a comprehensive manual and vignette that gives further details.</p>
<p>Have fun!</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Using R, Sweave and Latex to integrate animations into PDFs]]></title>
<link>http://ryouready.wordpress.com/2011/04/18/using-r-sweave-and-latex-to-integrate-animations-into-pdfs/</link>
<pubDate>Mon, 18 Apr 2011 14:16:35 +0000</pubDate>
<dc:creator>markheckmann</dc:creator>
<guid>http://ryouready.wordpress.com/2011/04/18/using-r-sweave-and-latex-to-integrate-animations-into-pdfs/</guid>
<description><![CDATA[The first week of April I attended an excellent workshop on biplots held by Michael Greenacre and Ol]]></description>
<content:encoded><![CDATA[<p><a href="http://ryouready.files.wordpress.com/2011/04/pacman1.gif"><img class="alignleft size-full wp-image-695" title="pacman" src="http://ryouready.files.wordpress.com/2011/04/pacman1.gif" alt="" width="216" height="216" /></a>The first week of April I attended an excellent workshop on <a href="http://en.wikipedia.org/wiki/Biplot" target="_blank">biplots</a> held by<a title="Michael Greenacre" href="http://www.econ.upf.edu/%7Emichael/"> Michael Greenacre</a> and <a href="http://www.statoek.wiso.uni-goettingen.de/cms/user/index.php?lang=de&#38;section=institut.team.onenadic">Oleg Nenadić</a> at the <a href="http://www.gesis.org/en/institute/">Gesis Institute</a> in Cologne, Germany. Throughout his presentations, Michael used animations to visualize the concepts he was explaining. He also included  animations in some of his papers. This inspired me to do this post in which I will show how to use LaTex, R and Sweave to include animations in a PDF document. Here is the <a title="Animations in PDF" href="http://ryouready.files.wordpress.com/2011/04/2011_animated_pdf_v1.pdf">PDF document</a> we will create (on MacOS the standard PDF viewer may not be able to play the animations, but Adobe Reader will). For this post some basic knowledge about Sweave is assumed.<!--more--></p>
<p>First, let&#8217;s create a simple animation in R. I will use a neat  example Oleg used during the biplot workshop: Pacman eating. Herefore we need a pacman. We use a pie chart to construct him</p>
<pre style="color:#000000;background-color:#e9e9e9;font-size:9pt;font-family:Courier;"><span style="color:#000000;">pie</span><span style="color:#000000;">(</span><span style="color:#000000;">c</span><span style="color:#000000;">(</span><span style="color:#000000;">.1</span><span style="color:#000000;">,</span> <span style="color:#000000;">.9</span><span style="color:#000000;">,</span> <span style="color:#000000;">.1</span><span style="color:#000000;">)</span>                       <span style="color:#2f9956;"># a pie chart</span>
<span style="color:#000000;">pie</span><span style="color:#000000;">(</span><span style="color:#000000;">c</span><span style="color:#000000;">(</span><span style="color:#000000;">.1</span><span style="color:#000000;">,</span> <span style="color:#000000;">.9</span><span style="color:#000000;">,</span> <span style="color:#000000;">.1</span><span style="color:#000000;">),</span>                      <span style="color:#2f9956;"># a pie chart </span>
    col<span style="color:#000000;">=</span><span style="color:#000000;">c</span><span style="color:#000000;">(</span><span style="color:#0000ff;">"white"</span><span style="color:#000000;">,</span> <span style="color:#0000ff;">"yellow"</span><span style="color:#000000;">,</span> <span style="color:#0000ff;">"white"</span><span style="color:#000000;">),</span>  <span style="color:#2f9956;"># resembling pac man</span>
    border<span style="color:#000000;">=</span><span style="color:#7f0055;font-weight:bold;">NA</span><span style="color:#000000;">,</span> labels<span style="color:#000000;">=</span><span style="color:#7f0055;font-weight:bold;">NA</span><span style="color:#000000;">)</span>
<span style="color:#000000;">points</span><span style="color:#000000;">(</span><span style="color:#000000;">.3</span><span style="color:#000000;">,</span><span style="color:#000000;">.4</span><span style="color:#000000;">,</span> pch<span style="color:#000000;">=</span><span style="color:#000000;">16</span><span style="color:#000000;">,</span> cex<span style="color:#000000;">=</span><span style="color:#000000;">4)</span>            <span style="color:#2f9956;"># adding an eye</span></pre>
<p>Next, we will produce a series of pictures of Pacman by varying a parameter that specifies how far he opens his  mouth. Here, all the single pics are saved in one PDF file, each as one page.</p>
<pre style="color:#000000;background-color:#e9e9e9;font-size:9pt;font-family:Courier;">p <span style="color:#000000;">&#60;-</span> <span style="color:#000000;">seq</span><span style="color:#000000;">(</span><span style="color:#000000;">0.999</span><span style="color:#000000;">,</span> <span style="color:#000000;">.9</span><span style="color:#000000;">,</span> len<span style="color:#000000;">=</span><span style="color:#000000;">10</span><span style="color:#000000;">)</span>       <span style="color:#2f9956;"># parameters for opening mouth </span>
p <span style="color:#000000;">&#60;-</span> <span style="color:#000000;">c</span><span style="color:#000000;">(</span><span style="color:#000000;">rev</span><span style="color:#000000;">(</span>p<span style="color:#000000;">),</span> p<span style="color:#000000;">)</span>                 <span style="color:#2f9956;"># add reversed parameters</span>
<span style="color:#000000;">pdf</span><span style="color:#000000;">(</span>file<span style="color:#000000;">=</span><span style="color:#0000ff;">"pacman.pdf"</span><span style="color:#000000;">)</span>            <span style="color:#2f9956;"># open pdf device</span>
<span style="color:#7f0055;font-weight:bold;">for</span> <span style="color:#000000;">(</span>i <span style="color:#7f0055;font-weight:bold;">in</span> <span style="color:#000000;">1</span><span style="color:#000000;">:</span><span style="color:#000000;">length</span><span style="color:#000000;">(</span>p<span style="color:#000000;">)){</span>
  <span style="color:#000000;">pie</span><span style="color:#000000;">(</span><span style="color:#000000;">c</span><span style="color:#000000;">(</span><span style="color:#000000;">1</span><span style="color:#000000;">-</span>p<span style="color:#000000;">[</span>i<span style="color:#000000;">],</span> p<span style="color:#000000;">[</span>i<span style="color:#000000;">],</span> <span style="color:#000000;">1</span><span style="color:#000000;">-</span>p<span style="color:#000000;">[</span>i<span style="color:#000000;">]),</span>    <span style="color:#2f9956;"># pac man like pie chart</span>
      col<span style="color:#000000;">=</span><span style="color:#000000;">c</span><span style="color:#000000;">(</span><span style="color:#0000ff;">"white"</span><span style="color:#000000;">,</span> <span style="color:#0000ff;">"yellow"</span><span style="color:#000000;">,</span> <span style="color:#0000ff;">"white"</span><span style="color:#000000;">),</span>
      border<span style="color:#000000;">=</span><span style="color:#7f0055;font-weight:bold;">NA</span><span style="color:#000000;">,</span> labels<span style="color:#000000;">=</span><span style="color:#7f0055;font-weight:bold;">NA</span><span style="color:#000000;">)</span>
  <span style="color:#000000;">points</span><span style="color:#000000;">(</span><span style="color:#000000;">.3</span><span style="color:#000000;">,</span><span style="color:#000000;">.4</span><span style="color:#000000;">,</span> pch<span style="color:#000000;">=</span><span style="color:#000000;">16</span><span style="color:#000000;">,</span> cex<span style="color:#000000;">=4</span><span style="color:#000000;">)</span>    <span style="color:#2f9956;"># add the eye</span>
<span style="color:#000000;">}</span>
dev<span style="color:#000000;">.</span><span style="color:#000000;">off</span><span style="color:#000000;">()</span>                         <span style="color:#2f9956;"># close pdf device</span></pre>
<p>Now, each page of the PDF file contains a single frame of an animation. To include it as an  animation in LaTex the <tt>animate</tt> package can be used. Here is the whole code with the Pacman frames being rendered and included via LaTex.</p>
<pre style="color:#000000;background-color:#e9e9e9;font-size:9pt;font-family:Courier;">\documentclass<span style="color:#000000;">{</span>article<span style="color:#000000;">}</span>
\usepackage<span style="color:#000000;">{</span>animate<span style="color:#000000;">} %</span> <span style="color:#7f0055;font-weight:bold;">for</span> animated figures

<span style="color:#0000ff;">\t</span>itle<span style="color:#000000;">{</span>Animations <span style="color:#7f0055;font-weight:bold;">in</span> \LaTeX<span style="color:#000000;">{}</span> via <span style="color:#000000;">{</span>\sf R<span style="color:#000000;">}</span> and Sweave<span style="color:#000000;">}</span>
<span style="color:#0000ff;">\a</span>uthor<span style="color:#000000;">{</span>Mark Heckmann<span style="color:#000000;">}</span>

<span style="color:#0000ff;">\b</span>egin<span style="color:#000000;">{</span>document<span style="color:#000000;">}</span>
\maketitle

<span style="color:#000000;">&#60;&#60;</span>echo<span style="color:#000000;">=</span>false<span style="color:#000000;">,</span> results<span style="color:#000000;">=</span>hide<span style="color:#000000;">&#62;&#62;=</span>
p <span style="color:#000000;">&#60;-</span> <span style="color:#000000;">seq</span><span style="color:#000000;">(</span><span style="color:#000000;">0.999</span><span style="color:#000000;">,</span> <span style="color:#000000;">.9</span><span style="color:#000000;">,</span> len<span style="color:#000000;">=</span><span style="color:#000000;">10</span><span style="color:#000000;">)</span>       <span style="color:#2f9956;"># parameters for opening mouth </span>
p <span style="color:#000000;">&#60;-</span> <span style="color:#000000;">c</span><span style="color:#000000;">(</span><span style="color:#000000;">rev</span><span style="color:#000000;">(</span>p<span style="color:#000000;">),</span> p<span style="color:#000000;">)</span>                 <span style="color:#2f9956;"># add reversed parameters</span>
<span style="color:#000000;">pdf</span><span style="color:#000000;">(</span>file<span style="color:#000000;">=</span><span style="color:#0000ff;">"pacman.pdf"</span><span style="color:#000000;">)</span>            <span style="color:#2f9956;"># open pdf device</span>
<span style="color:#7f0055;font-weight:bold;">for</span> <span style="color:#000000;">(</span>i <span style="color:#7f0055;font-weight:bold;">in</span> <span style="color:#000000;">1</span><span style="color:#000000;">:</span><span style="color:#000000;">length</span><span style="color:#000000;">(</span>p<span style="color:#000000;">)){</span>
  <span style="color:#000000;">pie</span><span style="color:#000000;">(</span><span style="color:#000000;">c</span><span style="color:#000000;">(</span><span style="color:#000000;">1</span><span style="color:#000000;">-</span>p<span style="color:#000000;">[</span>i<span style="color:#000000;">],</span> p<span style="color:#000000;">[</span>i<span style="color:#000000;">],</span> <span style="color:#000000;">1</span><span style="color:#000000;">-</span>p<span style="color:#000000;">[</span>i<span style="color:#000000;">]),</span>    <span style="color:#2f9956;"># pac man like pie chart</span>
      col<span style="color:#000000;">=</span><span style="color:#000000;">c</span><span style="color:#000000;">(</span><span style="color:#0000ff;">"white"</span><span style="color:#000000;">,</span> <span style="color:#0000ff;">"yellow"</span><span style="color:#000000;">,</span> <span style="color:#0000ff;">"white"</span><span style="color:#000000;">),</span>
      border<span style="color:#000000;">=</span><span style="color:#7f0055;font-weight:bold;">NA</span><span style="color:#000000;">,</span> labels<span style="color:#000000;">=</span><span style="color:#7f0055;font-weight:bold;">NA</span><span style="color:#000000;">)</span>
  <span style="color:#000000;">points</span><span style="color:#000000;">(</span><span style="color:#000000;">.3</span><span style="color:#000000;">,</span><span style="color:#000000;">.4</span><span style="color:#000000;">,</span> pch<span style="color:#000000;">=</span><span style="color:#000000;">16</span><span style="color:#000000;">,</span> cex<span style="color:#000000;">=</span><span style="color:#000000;">4</span><span style="color:#000000;">)</span>    <span style="color:#2f9956;"># add the eye</span>
<span style="color:#000000;">}</span>
dev<span style="color:#000000;">.</span><span style="color:#000000;">off</span><span style="color:#000000;">()</span>                         <span style="color:#2f9956;"># close pdf device</span>
<span style="color:#000000;">4</span>

<span style="color:#0000ff;">\b</span>egin<span style="color:#000000;">{</span>center<span style="color:#000000;">}</span>
<span style="color:#0000ff;">\a</span>nimategraphics<span style="color:#000000;">[</span>loop<span style="color:#000000;">,</span> width<span style="color:#000000;">=</span><span style="color:#000000;">.7</span>\linewidth<span style="color:#000000;">]{</span><span style="color:#000000;">12</span><span style="color:#000000;">}{</span>pacman<span style="color:#000000;">}{}{}</span><span style="color:#0000ff;">\\</span>
<span style="color:#0000ff;">\v</span>space<span style="color:#000000;">{-</span><span style="color:#000000;">5</span>mm<span style="color:#000000;">}</span> Click me<span style="color:#000000;">!</span>
\end<span style="color:#000000;">{</span>center<span style="color:#000000;">}</span>
\end<span style="color:#000000;">{</span>document<span style="color:#000000;">}</span></pre>
<p>If you click on Pacman in the PDF file, he will start eating. Now let&#8217;s create another example and add a panel to control the  animation in the PDF. To add controls to the animation use the controls tag in the <tt>\animategraphics</tt> command. In the example several values from a uniform  distribution are sampled and their mean is calculated. The mean values are plotted in a histogram. The example is supposed to demonstrate the central limit theorem.</p>
<pre style="color:#000000;background-color:#e9e9e9;font-size:9pt;font-family:Courier;"><span style="color:#000000;">&#60;&#60;</span>eval<span style="color:#000000;">=</span>true<span style="color:#000000;">,</span> echo<span style="color:#000000;">=</span>false<span style="color:#000000;">,</span> results<span style="color:#000000;">=</span>hide<span style="color:#000000;">&#62;&#62;=</span>
<span style="color:#000000;">pdf</span><span style="color:#000000;">(</span><span style="color:#0000ff;">"limit.pdf"</span><span style="color:#000000;">)</span>                            <span style="color:#2f9956;"># open pdf device</span>
msam <span style="color:#000000;">&#60;-</span> <span style="color:#7f0055;font-weight:bold;">NA</span>                                  <span style="color:#2f9956;"># set up empty vector</span>
ns <span style="color:#000000;">&#60;-</span> <span style="color:#000000;">3</span>                                     <span style="color:#2f9956;"># sample size</span>
<span style="color:#7f0055;font-weight:bold;">for</span><span style="color:#000000;">(</span>i <span style="color:#7f0055;font-weight:bold;">in</span> <span style="color:#000000;">1</span><span style="color:#000000;">:</span><span style="color:#000000;">500</span><span style="color:#000000;">){</span>
  sam <span style="color:#000000;">&#60;-</span> <span style="color:#000000;">runif</span><span style="color:#000000;">(</span>ns<span style="color:#000000;">) *</span> <span style="color:#000000;">10</span>                     <span style="color:#2f9956;"># draw sample</span>
  msam<span style="color:#000000;">[</span>i<span style="color:#000000;">] &#60;-</span> <span style="color:#000000;">mean</span><span style="color:#000000;">(</span>sam<span style="color:#000000;">)</span>                      <span style="color:#2f9956;"># save mean of sample</span>
  h <span style="color:#000000;">&#60;-</span> <span style="color:#000000;">hist</span><span style="color:#000000;">(</span>msam<span style="color:#000000;">,</span> breaks<span style="color:#000000;">=</span><span style="color:#000000;">seq</span><span style="color:#000000;">(</span><span style="color:#000000;">0</span><span style="color:#000000;">,</span><span style="color:#000000;">10</span><span style="color:#000000;">,</span> len<span style="color:#000000;">=</span><span style="color:#000000;">50</span><span style="color:#000000;">),</span> <span style="color:#2f9956;"># histogram of all means</span>
            xlim<span style="color:#000000;">=</span><span style="color:#000000;">c</span><span style="color:#000000;">(</span><span style="color:#000000;">0</span><span style="color:#000000;">,</span><span style="color:#000000;">10</span><span style="color:#000000;">),</span> col<span style="color:#000000;">=</span><span style="color:#000000;">grey</span><span style="color:#000000;">(</span><span style="color:#000000;">.9</span><span style="color:#000000;">),</span>
            xlab<span style="color:#000000;">=</span><span style="color:#0000ff;">""</span><span style="color:#000000;">,</span> main<span style="color:#000000;">=</span><span style="color:#0000ff;">""</span><span style="color:#000000;">,</span> border<span style="color:#000000;">=</span><span style="color:#0000ff;">"white"</span><span style="color:#000000;">,</span> las<span style="color:#000000;">=</span><span style="color:#000000;">1</span><span style="color:#000000;">)</span>
  <span style="color:#000000;">points</span><span style="color:#000000;">(</span>sam<span style="color:#000000;">,</span> <span style="color:#000000;">rep</span><span style="color:#000000;">(</span><span style="color:#000000;">max</span><span style="color:#000000;">(</span>h$count<span style="color:#000000;">),</span> <span style="color:#000000;">length</span><span style="color:#000000;">(</span>sam<span style="color:#000000;">)),</span>
         pch<span style="color:#000000;">=</span><span style="color:#000000;">16</span><span style="color:#000000;">,</span> col<span style="color:#000000;">=</span><span style="color:#000000;">grey</span><span style="color:#000000;">(</span><span style="color:#000000;">.2</span><span style="color:#000000;">))</span>              <span style="color:#2f9956;"># add sampled values</span>
  <span style="color:#000000;">points</span><span style="color:#000000;">(</span>msam<span style="color:#000000;">[</span>i<span style="color:#000000;">],</span> <span style="color:#000000;">max</span><span style="color:#000000;">(</span>h$count<span style="color:#000000;">),</span>             <span style="color:#2f9956;"># add sample mean value</span>
         col<span style="color:#000000;">=</span><span style="color:#0000ff;">"red"</span><span style="color:#000000;">,</span> pch<span style="color:#000000;">=</span><span style="color:#000000;">15</span><span style="color:#000000;">)</span>
  <span style="color:#000000;">text</span><span style="color:#000000;">(</span><span style="color:#000000;">10</span><span style="color:#000000;">,</span> <span style="color:#000000;">max</span><span style="color:#000000;">(</span>h$count<span style="color:#000000;">),</span> <span style="color:#000000;">paste</span><span style="color:#000000;">(</span><span style="color:#0000ff;">"sample no"</span><span style="color:#000000;">,</span> i<span style="color:#000000;">))</span>
  <span style="color:#000000;">hist</span><span style="color:#000000;">(</span>msam<span style="color:#000000;">[</span>i<span style="color:#000000;">],</span> breaks<span style="color:#000000;">=</span><span style="color:#000000;">seq</span><span style="color:#000000;">(</span><span style="color:#000000;">0</span><span style="color:#000000;">,</span><span style="color:#000000;">10</span><span style="color:#000000;">,</span> len<span style="color:#000000;">=</span><span style="color:#000000;">50</span><span style="color:#000000;">),</span>   <span style="color:#2f9956;"># ovelay sample mean </span>
       xlim<span style="color:#000000;">=</span><span style="color:#000000;">c</span><span style="color:#000000;">(</span><span style="color:#000000;">0</span><span style="color:#000000;">,</span><span style="color:#000000;">10</span><span style="color:#000000;">),</span> col<span style="color:#000000;">=</span><span style="color:#0000ff;">"red"</span><span style="color:#000000;">,</span> add<span style="color:#000000;">=</span><span style="color:#7f0055;font-weight:bold;">T</span><span style="color:#000000;">,</span>      <span style="color:#2f9956;"># in histogram</span>
       xlab<span style="color:#000000;">=</span><span style="color:#0000ff;">""</span><span style="color:#000000;">,</span> border<span style="color:#000000;">=</span><span style="color:#0000ff;">"white"</span><span style="color:#000000;">,</span> las<span style="color:#000000;">=</span><span style="color:#000000;">1</span><span style="color:#000000;">)</span>
<span style="color:#000000;">}</span>
dev<span style="color:#000000;">.</span><span style="color:#000000;">off</span><span style="color:#000000;">()</span>                                   <span style="color:#2f9956;"># close pdf device</span>
<span style="color:#000000;">4</span></pre>
<p>As a last example we will include a 3D animation created using <tt>rgl</tt>. Herefore we will create random points and rotate them about the x-axis and then about the y-axis.</p>
<pre style="color:#000000;background-color:#e9e9e9;font-size:9pt;font-family:Courier;"><span style="color:#000000;">&#60;&#60;</span>echo<span style="color:#000000;">=</span>f<span style="color:#000000;">,</span> results<span style="color:#000000;">=</span>hide<span style="color:#000000;">&#62;&#62;=</span>
<span style="color:#000000;">library</span><span style="color:#000000;">(</span>rgl<span style="color:#000000;">)</span>                          <span style="color:#2f9956;"># load rgl library</span>
x <span style="color:#000000;">&#60;-</span> <span style="color:#000000;">matrix</span><span style="color:#000000;">(</span><span style="color:#000000;">rnorm</span><span style="color:#000000;">(</span><span style="color:#000000;">30</span><span style="color:#000000;">),</span> ncol<span style="color:#000000;">=</span><span style="color:#000000;">3</span><span style="color:#000000;">)</span>        <span style="color:#2f9956;"># make random points </span>
<span style="color:#000000;">plot3d</span><span style="color:#000000;">(</span>x<span style="color:#000000;">)</span>                             <span style="color:#2f9956;"># plot points in 3d device</span>
<span style="color:#000000;">par3d</span><span style="color:#000000;">(</span>params<span style="color:#000000;">=</span><span style="color:#000000;">list</span><span style="color:#000000;">(</span>
      windowRect<span style="color:#000000;">=</span><span style="color:#000000;">c</span><span style="color:#000000;">(</span><span style="color:#000000;">100</span><span style="color:#000000;">,</span><span style="color:#000000;">100</span><span style="color:#000000;">,</span><span style="color:#000000;">600</span><span style="color:#000000;">,</span><span style="color:#000000;">600</span><span style="color:#000000;">)))</span> <span style="color:#2f9956;"># enlarge 3d device</span>
<span style="color:#000000;">view3d</span><span style="color:#000000;">(</span> theta <span style="color:#000000;">=</span> <span style="color:#000000;">0</span><span style="color:#000000;">,</span> phi <span style="color:#000000;">=</span> <span style="color:#000000;">0</span><span style="color:#000000;">)</span>           <span style="color:#2f9956;"># change 3d view angle</span>
M <span style="color:#000000;">&#60;-</span> <span style="color:#000000;">par3d</span><span style="color:#000000;">(</span><span style="color:#0000ff;">"userMatrix"</span><span style="color:#000000;">)</span>              <span style="color:#2f9956;"># get current position matrix</span>
M1 <span style="color:#000000;">&#60;-</span> <span style="color:#000000;">rotate3d</span><span style="color:#000000;">(</span>M<span style="color:#000000;">,</span> <span style="color:#000000;">.9</span><span style="color:#000000;">*</span>pi<span style="color:#000000;">/</span><span style="color:#000000;">2</span><span style="color:#000000;">,</span> <span style="color:#000000;">1</span><span style="color:#000000;">,</span> <span style="color:#000000;">0</span><span style="color:#000000;">,</span> <span style="color:#000000;">0</span><span style="color:#000000;">)</span>
M2 <span style="color:#000000;">&#60;-</span> <span style="color:#000000;">rotate3d</span><span style="color:#000000;">(</span>M1<span style="color:#000000;">,</span> pi<span style="color:#000000;">/</span><span style="color:#000000;">2</span><span style="color:#000000;">,</span> <span style="color:#000000;">0</span><span style="color:#000000;">,</span> <span style="color:#000000;">0</span><span style="color:#000000;">,</span> <span style="color:#000000;">1</span><span style="color:#000000;">)</span>
<span style="color:#000000;">movie3d</span><span style="color:#000000;">(</span><span style="color:#000000;">par3dinterp</span><span style="color:#000000;">(</span> userMatrix<span style="color:#000000;">=</span><span style="color:#000000;">list</span><span style="color:#000000;">(</span>M<span style="color:#000000;">,</span> M1<span style="color:#000000;">,</span> M2<span style="color:#000000;">,</span> M1<span style="color:#000000;">,</span> M<span style="color:#000000;">),</span>
        method<span style="color:#000000;">=</span><span style="color:#0000ff;">"linear"</span><span style="color:#000000;">),</span> duration<span style="color:#000000;">=</span><span style="color:#000000;">4</span><span style="color:#000000;">,</span> convert<span style="color:#000000;">=</span><span style="color:#7f0055;font-weight:bold;">F</span><span style="color:#000000;">,</span>
        clean<span style="color:#000000;">=</span><span style="color:#7f0055;font-weight:bold;">F</span><span style="color:#000000;">,</span> dir<span style="color:#000000;">=</span><span style="color:#0000ff;">"pics"</span><span style="color:#000000;">)</span>          <span style="color:#2f9956;"># save frames in pics folder</span>
<span style="color:#000000;">4</span></pre>
<p>The inclusion into LaTex works a bit different this time. This time we do not include the single pages from a PDF as frames but we use singe .png pics that have been geberetaed by <tt>movid3d()</tt>. The default file name for the frames generated by movie3d is &#8220;movie&#8221; plus the frame number.</p>
<pre style="color:#000000;background-color:#e9e9e9;font-size:9pt;font-family:Courier;"><span style="color:#0000ff;">\b</span>egin<span style="color:#000000;">{</span>center<span style="color:#000000;">}</span>
<span style="color:#0000ff;">\a</span>nimategraphics<span style="color:#000000;">[</span>controls<span style="color:#000000;">,</span> loop<span style="color:#000000;">,</span> width<span style="color:#000000;">=</span><span style="color:#000000;">.7</span>\linewidth<span style="color:#000000;">]{</span><span style="color:#000000;">6</span><span style="color:#000000;">}</span>
  <span style="color:#000000;">{</span>pics<span style="color:#000000;">/</span>movie<span style="color:#000000;">}{</span><span style="color:#000000;">001</span><span style="color:#000000;">}{</span><span style="color:#000000;">040</span><span style="color:#000000;">}</span>
\end<span style="color:#000000;">{</span>center<span style="color:#000000;">}</span></pre>
<p>Here is the <a href="http://ryouready.files.wordpress.com/2011/04/2011_animated_pdf_v1_code.pdf">whole code</a> for the <a href="http://ryouready.files.wordpress.com/2011/04/2011_animated_pdf_v1.pdf">PDF</a> containing the three animations ready to be Sweaved (make sure to set the directory in the last animation correctly).</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Animate .gif images in R / ImageMagick]]></title>
<link>http://ryouready.wordpress.com/2010/11/21/animate-gif-images-in-r-imagemagick/</link>
<pubDate>Sun, 21 Nov 2010 13:48:37 +0000</pubDate>
<dc:creator>markheckmann</dc:creator>
<guid>http://ryouready.wordpress.com/2010/11/21/animate-gif-images-in-r-imagemagick/</guid>
<description><![CDATA[Yesterday I surfed the web looking for 3D wireframe examples to explain linear models in class. I st]]></description>
<content:encoded><![CDATA[<p><a href="http://ryouready.files.wordpress.com/2010/11/example_corner.gif"><img class="alignleft size-full wp-image-670" style="margin:7px;" title="example_corner" src="http://ryouready.files.wordpress.com/2010/11/example_corner.gif" alt="" width="200" height="200" /></a>Yesterday I surfed the web looking for 3D wireframe examples to explain linear models in class. I stumbled across this site where <a href="http://www.ats.ucla.edu/stat/sas/examples/aw/example_graphs.htm">animated 3D wireframe plots</a> are outputted by SAS.  Below I did something similar in R. This post shows the few steps of how to create an animated .gif file using R and ImageMagick. Here I assume that you have <a href="http://www.imagemagick.org/">ImageMagick</a> installed on your computer. As far as I know it is also possible to produce animated .gif files using R only, e.g. with <tt>write.gif()</tt> from the <a href="http://cran.r-project.org/web/packages/caTools/index.html"><tt>caTools</tt></a> package. But using ImageMagick is straighforward, gives you control over the conversion and .gif production and is the free standard program for conversion.<!--more--></p>
<p>First a simple countdown example. To be sure not to overwrite anything I will create a new folder and set the working directory to the new folder﻿.</p>
<pre style="color:#000000;background-color:#e9e9e9;font-size:9pt;font-family:Courier;">dir<span style="color:#000000;">.</span><span style="color:#000000;">create</span><span style="color:#000000;">(</span><span style="color:#0000ff;">"examples"</span><span style="color:#000000;">)</span>
<span style="color:#000000;">setwd</span><span style="color:#000000;">(</span><span style="color:#0000ff;">"examples"</span><span style="color:#000000;">)</span>

<span style="color:#2f9956;"># example 1: simple animated countdown from 10 to "GO!".</span>
<span style="color:#000000;">png</span><span style="color:#000000;">(</span>file<span style="color:#000000;">=</span><span style="color:#0000ff;">"example%02d.png"</span><span style="color:#000000;">,</span> width<span style="color:#000000;">=</span><span style="color:#000000;">200</span><span style="color:#000000;">,</span> height<span style="color:#000000;">=</span><span style="color:#000000;">200</span><span style="color:#000000;">)</span>
  <span style="color:#7f0055;font-weight:bold;">for</span> <span style="color:#000000;">(</span>i <span style="color:#7f0055;font-weight:bold;">in</span> <span style="color:#000000;">c</span><span style="color:#000000;">(</span><span style="color:#000000;">10</span><span style="color:#000000;">:</span><span style="color:#000000;">1</span><span style="color:#000000;">,</span> <span style="color:#0000ff;">"G0!"</span><span style="color:#000000;">)){</span>
    plot<span style="color:#000000;">.</span><span style="color:#000000;">new</span><span style="color:#000000;">()</span>
    <span style="color:#000000;">text</span><span style="color:#000000;">(</span><span style="color:#000000;">.5</span><span style="color:#000000;">,</span> <span style="color:#000000;">.5</span><span style="color:#000000;">,</span> i<span style="color:#000000;">,</span> cex <span style="color:#000000;">=</span> <span style="color:#000000;">6</span><span style="color:#000000;">)</span>
  <span style="color:#000000;">}</span>
dev<span style="color:#000000;">.</span><span style="color:#000000;">off</span><span style="color:#000000;">()</span>

<span style="color:#2f9956;"># convert the .png files to one .gif file using ImageMagick. </span>
<span style="color:#2f9956;"># The system() function executes the command as if it was done</span>
<span style="color:#2f9956;"># in the terminal. the -delay flag sets the time between showing</span>
<span style="color:#2f9956;"># the frames, i.e. the speed of the animation.</span>
<span style="color:#000000;">system</span><span style="color:#000000;">(</span><span style="color:#0000ff;">"convert -delay 80 *.png example_1.gif"</span><span style="color:#000000;">)</span>

<span style="color:#2f9956;"># to not leave the directory with the single jpeg files</span>
<span style="color:#2f9956;"># I remove them.</span>
file<span style="color:#000000;">.</span><span style="color:#000000;">remove</span><span style="color:#000000;">(</span>list<span style="color:#000000;">.</span><span style="color:#000000;">files</span><span style="color:#000000;">(</span>pattern<span style="color:#000000;">=</span><span style="color:#0000ff;">".png"</span><span style="color:#000000;">))</span></pre>
<p style="text-align:center;"><a href="http://ryouready.files.wordpress.com/2010/11/example_1a1.gif"></a><a href="http://ryouready.files.wordpress.com/2010/11/example_1a2.gif"><img class="aligncenter size-full wp-image-667" style="border:0 none;" title="example_1a" src="http://ryouready.files.wordpress.com/2010/11/example_1a2.gif" alt="" width="200" height="200" /></a></p>
<p>Above a loop is used to do the plotting. A new .png file for each plot is created automatically. The <tt>"%02d"</tt> part in the  filenamepart is a placeholder here for a two character counter (01,02 etc.). So we do not have to hard-code the filename each time.</p>
<p>Now I want a linear model to be visualized as a 3d mesh.  A 3D surface can easily be plotted using the <tt>wireframe()</tt> function from the <tt>lattice</tt> package (or other functions available in R; also see the <tt>rgl</tt> package for rotatable 3D output).</p>
<pre style="color:#000000;background-color:#e9e9e9;font-size:9pt;font-family:Courier;"><span style="color:#000000;">library</span><span style="color:#000000;">(</span>lattice<span style="color:#000000;">)</span>
b0 <span style="color:#000000;">&#60;-</span> <span style="color:#000000;">10</span>
b1 <span style="color:#000000;">&#60;-</span> <span style="color:#000000;">.5</span>
b2 <span style="color:#000000;">&#60;-</span> <span style="color:#000000;">.3</span>
g <span style="color:#000000;">&#60;-</span> expand<span style="color:#000000;">.</span><span style="color:#000000;">grid</span><span style="color:#000000;">(</span>x <span style="color:#000000;">=</span> <span style="color:#000000;">1</span><span style="color:#000000;">:</span><span style="color:#000000;">20</span><span style="color:#000000;">,</span> y <span style="color:#000000;">=</span> <span style="color:#000000;">1</span><span style="color:#000000;">:</span><span style="color:#000000;">20</span><span style="color:#000000;">)</span>
g$z <span style="color:#000000;">&#60;-</span> b0 <span style="color:#000000;">+</span> b1<span style="color:#000000;">*</span>g$x <span style="color:#000000;">+</span> b2<span style="color:#000000;">*</span>g$y
<span style="color:#000000;">wireframe</span><span style="color:#000000;">(</span>z <span style="color:#000000;">~</span> x <span style="color:#000000;">*</span> y<span style="color:#000000;">,</span> data <span style="color:#000000;">=</span> g<span style="color:#000000;">)</span>

<span style="color:#2f9956;"># to rotate the plot</span>
<span style="color:#000000;">wireframe</span><span style="color:#000000;">(</span>z <span style="color:#000000;">~</span> x <span style="color:#000000;">*</span> y<span style="color:#000000;">,</span> data <span style="color:#000000;">=</span> g<span style="color:#000000;">,</span>
          screen <span style="color:#000000;">=</span> <span style="color:#000000;">list</span><span style="color:#000000;">(</span>z <span style="color:#000000;">=</span> <span style="color:#000000;">10</span><span style="color:#000000;">,</span> x <span style="color:#000000;">= -</span><span style="color:#000000;">60</span><span style="color:#000000;">))</span></pre>
<p>Now let&#8217;s create multiple files while changing the rotation angle. Note that <tt>wireframe()</tt> returns a trellis object which needs to be printed explicitly here using <tt>print()</tt>. As the code below produces over 150 images and merges them into one .gif file note that this may take a minute or two.</p>
<pre style="color:#000000;background-color:#e9e9e9;font-size:9pt;font-family:Courier;"><span style="color:#2f9956;"># example 2</span>
<span style="color:#000000;">png</span><span style="color:#000000;">(</span>file<span style="color:#000000;">=</span><span style="color:#0000ff;">"example%03d.png"</span><span style="color:#000000;">,</span> width<span style="color:#000000;">=</span><span style="color:#000000;">300</span><span style="color:#000000;">,</span> heigh<span style="color:#000000;">=</span><span style="color:#000000;">300</span><span style="color:#000000;">)</span>
  <span style="color:#7f0055;font-weight:bold;">for</span> <span style="color:#000000;">(</span>i <span style="color:#7f0055;font-weight:bold;">in</span> <span style="color:#000000;">seq</span><span style="color:#000000;">(</span><span style="color:#000000;">0</span><span style="color:#000000;">,</span> <span style="color:#000000;">350</span> <span style="color:#000000;">,</span> <span style="color:#000000;">10</span><span style="color:#000000;">)){</span>
    <span style="color:#000000;">print</span><span style="color:#000000;">(</span><span style="color:#000000;">wireframe</span><span style="color:#000000;">(</span>z <span style="color:#000000;">~</span> x <span style="color:#000000;">*</span> y<span style="color:#000000;">,</span> data <span style="color:#000000;">=</span> g<span style="color:#000000;">,</span>
              screen <span style="color:#000000;">=</span> <span style="color:#000000;">list</span><span style="color:#000000;">(</span>z <span style="color:#000000;">=</span> i<span style="color:#000000;">,</span> x <span style="color:#000000;">= -</span><span style="color:#000000;">60</span><span style="color:#000000;">)))</span>
  <span style="color:#000000;">}</span>
dev<span style="color:#000000;">.</span><span style="color:#000000;">off</span><span style="color:#000000;">()</span>
<span style="color:#2f9956;"># convert pngs to one gif using ImageMagick</span>
<span style="color:#000000;">system</span><span style="color:#000000;">(</span><span style="color:#0000ff;">"convert -delay 40 *.png example_2_reduced.gif"</span><span style="color:#000000;">)</span>

<span style="color:#2f9956;"># cleaning up</span>
file<span style="color:#000000;">.</span><span style="color:#000000;">remove</span><span style="color:#000000;">(</span>list<span style="color:#000000;">.</span><span style="color:#000000;">files</span><span style="color:#000000;">(</span>pattern<span style="color:#000000;">=</span><span style="color:#0000ff;">".png"</span><span style="color:#000000;">))</span></pre>
<p><!--HTML generated by highlight 3.1 beta2, http://www.andre-simon.de/--></p>
<p style="text-align:center;"><a href="http://ryouready.files.wordpress.com/2010/11/example_2_reduced.gif"><img class="aligncenter size-full wp-image-656" style="border:0 none;" title="example_2_reduced" src="http://ryouready.files.wordpress.com/2010/11/example_2_reduced.gif" alt="" width="300" height="300" /></a></p>
<p>Now I want the same as above but for a model with an interaction and I want to make the plot a bit more pretty. This time I use .pdf as output file. This is just to demonstrate that other formats  than .png can be used. Note that the <tt>"%02d"</tt> part of the filename has disappeared as I only create one .pdf file with multiple pages, not multiple .pdf files.</p>
<pre style="color:#000000;background-color:#e9e9e9;font-size:9pt;font-family:Courier;"><span style="color:#2f9956;"># example 3</span>
b0 <span style="color:#000000;">&#60;-</span> <span style="color:#000000;">10</span>
b1 <span style="color:#000000;">&#60;-</span> <span style="color:#000000;">.5</span>
b2 <span style="color:#000000;">&#60;-</span> <span style="color:#000000;">.3</span>
int12 <span style="color:#000000;">&#60;-</span> <span style="color:#000000;">.2</span>
g <span style="color:#000000;">&#60;-</span> expand<span style="color:#000000;">.</span><span style="color:#000000;">grid</span><span style="color:#000000;">(</span>x <span style="color:#000000;">=</span> <span style="color:#000000;">1</span><span style="color:#000000;">:</span><span style="color:#000000;">20</span><span style="color:#000000;">,</span> y <span style="color:#000000;">=</span> <span style="color:#000000;">1</span><span style="color:#000000;">:</span><span style="color:#000000;">20</span><span style="color:#000000;">)</span>
g$z <span style="color:#000000;">&#60;-</span> b0 <span style="color:#000000;">+</span> b1<span style="color:#000000;">*</span>g$x <span style="color:#000000;">+</span> b2<span style="color:#000000;">*</span>g$y <span style="color:#000000;">+</span> int12<span style="color:#000000;">*</span>g$x<span style="color:#000000;">*</span>g$y

<span style="color:#000000;">pdf</span><span style="color:#000000;">(</span>file<span style="color:#000000;">=</span><span style="color:#0000ff;">"example_3.pdf"</span><span style="color:#000000;">,</span> width<span style="color:#000000;">=</span><span style="color:#000000;">4</span><span style="color:#000000;">,</span> height<span style="color:#000000;">=</span><span style="color:#000000;">4</span><span style="color:#000000;">)</span>
  <span style="color:#7f0055;font-weight:bold;">for</span> <span style="color:#000000;">(</span>i <span style="color:#7f0055;font-weight:bold;">in</span> <span style="color:#000000;">seq</span><span style="color:#000000;">(</span><span style="color:#000000;">0</span><span style="color:#000000;">,</span> <span style="color:#000000;">350</span> <span style="color:#000000;">,</span><span style="color:#000000;">10</span><span style="color:#000000;">)){</span>
    <span style="color:#000000;">print</span><span style="color:#000000;">(</span><span style="color:#000000;">wireframe</span><span style="color:#000000;">(</span>z <span style="color:#000000;">~</span> x <span style="color:#000000;">*</span> y<span style="color:#000000;">,</span> data <span style="color:#000000;">=</span> g<span style="color:#000000;">,</span>
              screen <span style="color:#000000;">=</span> <span style="color:#000000;">list</span><span style="color:#000000;">(</span>z <span style="color:#000000;">=</span> i<span style="color:#000000;">,</span> x <span style="color:#000000;">= -</span><span style="color:#000000;">60</span><span style="color:#000000;">),</span>
              drape<span style="color:#000000;">=</span><span style="color:#7f0055;font-weight:bold;">TRUE</span><span style="color:#000000;">))</span>
  <span style="color:#000000;">}</span>
dev<span style="color:#000000;">.</span><span style="color:#000000;">off</span><span style="color:#000000;">()</span>
<span style="color:#2f9956;"># convert pdf to gif using ImageMagick</span>
<span style="color:#000000;">system</span><span style="color:#000000;">(</span><span style="color:#0000ff;">"convert -delay 40 *.pdf example_3_reduced.gif"</span><span style="color:#000000;">)</span>
<span style="color:#2f9956;"># cleaning up</span>
file<span style="color:#000000;">.</span><span style="color:#000000;">remove</span><span style="color:#000000;">(</span>list<span style="color:#000000;">.</span><span style="color:#000000;">files</span><span style="color:#000000;">(</span>pattern<span style="color:#000000;">=</span><span style="color:#0000ff;">".pdf"</span><span style="color:#000000;">))</span></pre>
<p style="text-align:center;"><a href="http://ryouready.files.wordpress.com/2010/11/example_3_reduced.gif"><img class="aligncenter size-full wp-image-663" style="border:0 none;" title="example_3_reduced" src="http://ryouready.files.wordpress.com/2010/11/example_3_reduced.gif" alt="" width="288" height="288" /></a></p>
<p>The last example is a visual comparison of the interaction and a non-interaction model. Here we now have the models on the same scale. Before I did not specify the scale limits.</p>
<pre style="color:#000000;background-color:#e9e9e9;font-size:9pt;font-family:Courier;"><span style="color:#2f9956;"># example 4</span>
b0 <span style="color:#000000;">&#60;-</span> <span style="color:#000000;">10</span>
b1 <span style="color:#000000;">&#60;-</span> <span style="color:#000000;">.5</span>
b2 <span style="color:#000000;">&#60;-</span> <span style="color:#000000;">.3</span>
int12 <span style="color:#000000;">&#60;-</span> <span style="color:#000000;">.2</span>
g <span style="color:#000000;">&#60;-</span> expand<span style="color:#000000;">.</span><span style="color:#000000;">grid</span><span style="color:#000000;">(</span>x <span style="color:#000000;">=</span> <span style="color:#000000;">1</span><span style="color:#000000;">:</span><span style="color:#000000;">20</span><span style="color:#000000;">,</span> y <span style="color:#000000;">=</span> <span style="color:#000000;">1</span><span style="color:#000000;">:</span><span style="color:#000000;">20</span><span style="color:#000000;">)</span>
z <span style="color:#000000;">&#60;-</span> <span style="color:#000000;">c</span><span style="color:#000000;">(</span> b0 <span style="color:#000000;">+</span> b1<span style="color:#000000;">*</span>g$x <span style="color:#000000;">+</span> b2<span style="color:#000000;">*</span>g$y<span style="color:#000000;">,</span>
        b0 <span style="color:#000000;">+</span> b1<span style="color:#000000;">*</span>g$x <span style="color:#000000;">+</span> b2<span style="color:#000000;">*</span>g$y <span style="color:#000000;">+</span> int12<span style="color:#000000;">*</span>g$x<span style="color:#000000;">*</span>g$y<span style="color:#000000;">)</span>
g <span style="color:#000000;">&#60;-</span><span style="color:#000000;">rbind</span><span style="color:#000000;">(</span>g<span style="color:#000000;">,</span> g<span style="color:#000000;">)</span>
g$z <span style="color:#000000;">&#60;-</span> z
g$group <span style="color:#000000;">&#60;-</span> <span style="color:#000000;">gl</span><span style="color:#000000;">(</span><span style="color:#000000;">2</span><span style="color:#000000;">,</span> <span style="color:#000000;">nrow</span><span style="color:#000000;">(</span>g<span style="color:#000000;">)/</span><span style="color:#000000;">2</span><span style="color:#000000;">,</span> labels<span style="color:#000000;">=</span><span style="color:#000000;">c</span><span style="color:#000000;">(</span><span style="color:#0000ff;">"interaction"</span><span style="color:#000000;">,</span> <span style="color:#0000ff;">"no interaction"</span><span style="color:#000000;">))</span>

<span style="color:#000000;">png</span><span style="color:#000000;">(</span>file<span style="color:#000000;">=</span><span style="color:#0000ff;">"example%03d.png"</span><span style="color:#000000;">,</span> width<span style="color:#000000;">=</span><span style="color:#000000;">300</span><span style="color:#000000;">,</span> height<span style="color:#000000;">=</span><span style="color:#000000;">300</span><span style="color:#000000;">)</span>
  <span style="color:#7f0055;font-weight:bold;">for</span> <span style="color:#000000;">(</span>i <span style="color:#7f0055;font-weight:bold;">in</span> <span style="color:#000000;">seq</span><span style="color:#000000;">(</span><span style="color:#000000;">0</span><span style="color:#000000;">,</span> <span style="color:#000000;">350</span> <span style="color:#000000;">,</span><span style="color:#000000;">10</span><span style="color:#000000;">)){</span>
    <span style="color:#000000;">print</span><span style="color:#000000;">(</span><span style="color:#000000;">wireframe</span><span style="color:#000000;">(</span>z <span style="color:#000000;">~</span> x <span style="color:#000000;">*</span> y<span style="color:#000000;">,</span> data <span style="color:#000000;">=</span> g<span style="color:#000000;">,</span> groups<span style="color:#000000;">=</span>group<span style="color:#000000;">,</span>
              screen <span style="color:#000000;">=</span> <span style="color:#000000;">list</span><span style="color:#000000;">(</span>z <span style="color:#000000;">=</span> i<span style="color:#000000;">,</span> x <span style="color:#000000;">= -</span><span style="color:#000000;">60</span><span style="color:#000000;">)))</span>
  <span style="color:#000000;">}</span>
dev<span style="color:#000000;">.</span><span style="color:#000000;">off</span><span style="color:#000000;">()</span>
<span style="color:#2f9956;"># convert pngs to one gif using ImageMagick</span>
<span style="color:#000000;">system</span><span style="color:#000000;">(</span><span style="color:#0000ff;">"convert -delay 40 *.png example_4.gif"</span><span style="color:#000000;">)</span>

<span style="color:#2f9956;"># cleaning up</span>
file<span style="color:#000000;">.</span><span style="color:#000000;">remove</span><span style="color:#000000;">(</span>list<span style="color:#000000;">.</span><span style="color:#000000;">files</span><span style="color:#000000;">(</span>pattern<span style="color:#000000;">=</span><span style="color:#0000ff;">".png"</span><span style="color:#000000;">))</span></pre>
<p style="text-align:center;"><a href="http://ryouready.files.wordpress.com/2010/11/example_4_reduced.gif"><img class="aligncenter size-full wp-image-664" style="border:0 none;" title="example_4_reduced" src="http://ryouready.files.wordpress.com/2010/11/example_4_reduced.gif" alt="" width="300" height="300" /></a></p>
<p>Above I chose a small image size (300 x 300 pts). Smaller steps for rotation and a bigger picture size increases file sizes for examples 2, 3 and 4 to 3-5mb which is far too big for a web format. I am not familiar with image optimization and I suppose a smaller file sizes for the .gif file can easily be achieved by some optimization flags in ImageMagick. Any hints are welcome!</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Playing with the 'playwith' package]]></title>
<link>http://ryouready.wordpress.com/2010/03/23/playing-with-the-playwith-package/</link>
<pubDate>Tue, 23 Mar 2010 11:44:05 +0000</pubDate>
<dc:creator>nattomi</dc:creator>
<guid>http://ryouready.wordpress.com/2010/03/23/playing-with-the-playwith-package/</guid>
<description><![CDATA[Abilities of R for creating graphics is great, but one thing I always missed is the possibility of c]]></description>
<content:encoded><![CDATA[<p><a href="http://ryouready.files.wordpress.com/2010/03/playwith_demo.png"><img class="alignleft size-medium wp-image-552" style="margin:8px;" title="playwith_demo" src="http://ryouready.files.wordpress.com/2010/03/playwith_demo.png?w=240&h=215" alt="" width="240" height="215" /></a>Abilities of R for creating graphics is great, but one thing I always missed is the possibility of creating interactive plots and being able to look at graphs while changing one ore more parameters. I know that there is <a href="http://www.ggobi.org/rggobi/">rggobi</a>, but so far I always ran into problems with flexibility each time I wanted to use it. So I kept on searching until I found <a href="http://code.google.com/p/playwith/">playwith</a> which is &#8220;an R package, providing a GTK+ graphical user interface for editing and interacting with R plots&#8221; as its homepage says. The homepage includes a lot of <a href="http://code.google.com/p/playwith/wiki/Screenshots">screenshots</a> with code snippets so this post doesn&#8217;t intend to give an extensive review about the possibilities of the playwith package to the reader. All I want to do now is present a small application of it.<!--more--></p>
<p>I had some geospatial data I wanted to visualize. The data was a result of a computer simulation and consisted of a set of geographical coordinates and corresponding frequency values expressed in Hz. The values associated to the coordinates <img src='http://s0.wp.com/latex.php?latex=%28x%2Cy%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='(x,y)' title='(x,y)' class='latex' />  tells us the first eigenfrequency of the <a href="http://en.wikipedia.org/wiki/Schumann_resonances">Earth-ionosphere cavity</a> that would be measured at <a href="http://w.ggki.hu/index.php?id=38&#38;L=1">Nagycenk Observatory</a> in the case of an assumed lightning source at <img src='http://s0.wp.com/latex.php?latex=%28x%2Cy%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='(x,y)' title='(x,y)' class='latex' /> with certain properties.</p>
<p>At first, as always, we need to load some packages.<br />
<code><br />
library(R.basic) # for creating perspective plot<br />
library(playwith) # for creating interactive plot<br />
library(fields) # for plotting a map of the world</code></p>
<p>Then, we read in our data (which is available online, so the example must be reproducible):</p>
<p><code><br />
regs &#60;- list(Africa="Africa",Americas="Americas",Asia="Asia")<br />
cols &#60;- c(Africa="red",Americas="green",Asia="blue")<br />
url &#60;- "http://storage.ggki.hu/~nattomi/ryouready/20100303"<br />
x &#60;- lapply(regs, function(x) {read.table(file.path(url,x),header=TRUE)})</code></p>
<p>Before plotting, I determine the range of the plottted values.<br />
<code><br />
data(world.dat) # data used for plotting a world map<br />
zAxs &#60;- unlist(lapply(x,function(x) x$fnERT1))<br />
r &#60;- range(zAxs)<br />
</code></p>
<p>And finally, the interactive plot itself:<br />
<code><br />
playwith({plot3d(world.dat$x,world.dat$y,lowZ,pch=".",<br />
zlim=c(lowZ,upZ),xlab="longitude",</code><br />
<code> ylab="latitude",zlab="F1 (Hz)",<br />
theta=theta,phi=phi,ticktype="detailed")<br />
for (i in regs) {<br />
d &#60;- x[[i]]<br />
points3d(d$Dc,d$Hc,d$fnERT1,col=cols[[i]],pch=".")<br />
}},<br />
parameters=list(<br />
theta=seq(0,360,by=5),<br />
phi=seq(0,90,by=5),<br />
lowZ=seq(r[1],r[2],0.5),<br />
upZ=seq(r[2],r[1],-0.5)))</code></p>
<p>You should see a window popping up with 4 sliders allowing you to set different paramters of the plot, for example vertical and horizontal rotation. You can already see from the example that the syntax of the <em>playwith</em> command is very simple, you specify a set of commands necessary for creating the plot (with possible parameters included such as <em>theta</em> in this example) between curly braces then a list specifying values to be looped through for the parameters. What more could I say? If you are a visual type (or your boss is one) then play with <em>playwith</em>!</p>
<p><strong>Remark 1</strong>: Installing the package <em>R.basic</em> goes in a little bit unusual way, see <a href="http://www.braju.com/R/">http://www.braju.com/R/</a> for details.</p>
<p><strong>Remark 2</strong>: My world map is just a plot of a cloud of points on the plane. It would be nice if the points would be connected accordingly. This can be achieved by using the <em>world()</em> command in the <em>fields</em> package although I wasn&#8217;t able to integrate this into the 3d display. Any suggestions are very welcome. The <em>world.dat</em> dataset has an another drawback: it doesn&#8217;t include <a href="http://www.openstreetmap.org/?lat=40.77&#38;lon=11.01&#38;zoom=7&#38;layers=B000FTF">Corsica and Sardinia</a> so this world map is not of too much use for the locals.</p>
<p style="text-align:center;"><a href="http://ryouready.files.wordpress.com/2010/03/playwith.png"><img class="aligncenter size-full wp-image-546" style="border:0 none;" title="playwith" src="http://ryouready.files.wordpress.com/2010/03/playwith.png" alt="" width="500" height="295" /></a></p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[R vs. Matlab - a small example]]></title>
<link>http://ryouready.wordpress.com/2010/02/15/r-vs-matlab/</link>
<pubDate>Mon, 15 Feb 2010 19:26:23 +0000</pubDate>
<dc:creator>nattomi</dc:creator>
<guid>http://ryouready.wordpress.com/2010/02/15/r-vs-matlab/</guid>
<description><![CDATA[At the institute I&#8217;m working quite a lot of people prefer using Matlab and only a few of them ]]></description>
<content:encoded><![CDATA[<p>At the <a href="http://www.ggki.hu/">institute</a> I&#8217;m working quite a lot of people prefer using Matlab and only a few of them know about R. Today one of my colleagues &#8212; who is also an eager user of Matlab &#8212; ran into the following problem:</p>
<ul>
<li>He had a vector <img src='http://s0.wp.com/latex.php?latex=v&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='v' title='v' class='latex' /> in hand which consisted of <img src='http://s0.wp.com/latex.php?latex=%5Cfrac%7Bn%28n%2B1%29%7D%7B2%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;frac{n(n+1)}{2}' title='&#92;frac{n(n+1)}{2}' class='latex' /> elements.</li>
<li>He wanted to reshape this data into an n×n matrix <img src='http://s0.wp.com/latex.php?latex=M&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='M' title='M' class='latex' />, where the element <img src='http://s0.wp.com/latex.php?latex=M_%7Bij%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='M_{ij}' title='M_{ij}' class='latex' /> is equal to <img src='http://s0.wp.com/latex.php?latex=v_%7Bk%2Bj%7DI%28j%26%2360%3B%3Dn-i%2B1%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='v_{k+j}I(j&lt;=n-i+1)' title='v_{k+j}I(j&lt;=n-i+1)' class='latex' /> with <img src='http://s0.wp.com/latex.php?latex=k%3D%5Cfrac%7B%282n-i%2B2%29%28i-1%29%7D%7B2%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='k=&#92;frac{(2n-i+2)(i-1)}{2}' title='k=&#92;frac{(2n-i+2)(i-1)}{2}' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=I%28j+%26%2360%3B%3D+n-i%2B1%29%3D1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='I(j &lt;= n-i+1)=1' title='I(j &lt;= n-i+1)=1' class='latex' /> if the condition <img src='http://s0.wp.com/latex.php?latex=j+%26%2360%3B%3D+n-i%2B1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='j &lt;= n-i+1' title='j &lt;= n-i+1' class='latex' /> is satisfied and <img src='http://s0.wp.com/latex.php?latex=0&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='0' title='0' class='latex' /> otherwise. In other words, the first <img src='http://s0.wp.com/latex.php?latex=%28n-i%2B1%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='(n-i+1)' title='(n-i+1)' class='latex' />th element of the <img src='http://s0.wp.com/latex.php?latex=i&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='i' title='i' class='latex' />th row of <img src='http://s0.wp.com/latex.php?latex=M&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='M' title='M' class='latex' /> is equal to the vector <img src='http://s0.wp.com/latex.php?latex=%28v_%7Bk%2B1%7D%2Cv_%7Bk%2B2%7D%2C%5Cldots%2Cv_%7Bk%2Bn-i%2B1%7D%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='(v_{k+1},v_{k+2},&#92;ldots,v_{k+n-i+1})' title='(v_{k+1},v_{k+2},&#92;ldots,v_{k+n-i+1})' class='latex' /> and the remaining elements are zero.</li>
</ul>
<p>He struggled for long minutes of how he should design a loop for doing this task. Of course writing such a loop is not a highly difficult task, but why would we waste our time, if we can get the same result in a single line of R code?</p>
<p><!--more-->For the sake of illustration, I&#8217;ve generated an input vector for the case of <img src='http://s0.wp.com/latex.php?latex=n%3D99&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='n=99' title='n=99' class='latex' /> (the value of <img src='http://s0.wp.com/latex.php?latex=n&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='n' title='n' class='latex' /> was 99 in my colleague&#8217;s problem as well):</p>
<p><code>v &#60;- rep(99:1,times=99:1)</code></p>
<p>and used the one-liner</p>
<p><code>M &#60;- t(matrix(unlist(tapply(v,rep(1:99,times=99:1),function(x) c(x,rep(0,99-length(x))))),nrow=99))</code></p>
<p>This is the kind of compactness I like pretty much in R. At the end I would like to emphasize that this post is not against Matlab, it just points out how the different logic of the R language can simplify problem solving in many situations. As a bonus let me share the visualization of the resulted matrix using the color2D.matplot function of the <a href="http://cran.r-project.org/web/packages/plotrix/index.html">plotrix</a> package:<br />
<code><br />
library(plotrix)<br />
color2D.matplot(M,c(0,1),c(1,0),c(0,0))</code></p>
<p><a href="http://ryouready.files.wordpress.com/2010/02/matrix.jpg"><img class="alignnone size-medium wp-image-518" title="matrix" src="http://ryouready.files.wordpress.com/2010/02/matrix.jpg?w=300&h=300" alt="" width="300" height="300" /></a></p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Progress bars in R (part II) - a wrapper for apply functions]]></title>
<link>http://ryouready.wordpress.com/2010/01/11/progress-bars-in-r-part-ii-a-wrapper-for-apply-functions/</link>
<pubDate>Sun, 10 Jan 2010 23:55:47 +0000</pubDate>
<dc:creator>markheckmann</dc:creator>
<guid>http://ryouready.wordpress.com/2010/01/11/progress-bars-in-r-part-ii-a-wrapper-for-apply-functions/</guid>
<description><![CDATA[In a previous post I gave some examples of how to make a progress bar in R. In the examples the bars]]></description>
<content:encoded><![CDATA[<p><a href="http://ryouready.files.wordpress.com/2010/01/progress_bar_part2.png"><img class="alignleft size-medium wp-image-478" style="margin:10px;" title="progress_bar_part2" src="http://ryouready.files.wordpress.com/2010/01/progress_bar_part2.png?w=300&h=225" alt="" width="300" height="225" /></a>In a <a href="http://ryouready.wordpress.com/2009/03/16/r-monitor-function-progress-with-a-progress-bar/" target="_blank">previous post</a> I gave some examples of how to make a progress bar in R. In the examples the bars were created within loops. Very often though I have situations where I would like have a progress bar when using <tt>apply()</tt>. The <tt>plyr</tt> package provides several <tt>apply</tt>-like functions also including progress bars, so one could have a look here and use a <tt>plyr</tt> function instead of apply if possible. Anyway, here comes a wrapper for <tt>apply</tt>, <tt>lapply</tt> and <tt>sapply</tt> that has a progressbar. It seems to work although one known issue is the use of vectors (like <tt>c(1,2)</tt>with the <tt>MARGIN</tt> argument in <tt>apply_pb()</tt>. Also you can see in the performance comparison below that the wrapper causes overhead to a considerable extent, which is the main drawback of this approach. <!--more--></p>
<pre><span style="color:#008000;">###############################################################

# STATUS: WORKING, but only tested once or twice,
# tested with most ?apply examples
# ISSUES/TODO: MARGIN argument cannot take a
# vector like 1:2 that is more than one numeric</span>

<span style="color:#333399;">apply_pb &#60;- function(X, MARGIN, FUN, ...)
{
  env &#60;- environment()
  pb_Total &#60;- sum(dim(X)[MARGIN])
  counter &#60;- 0
  pb &#60;- txtProgressBar(min = 0, max = pb_Total,
                       style = 3)</span>

  <span style="color:#333399;">wrapper &#60;- function(...)
  {
    curVal &#60;- get("counter", envir = env)
    assign("counter", curVal +1 ,envir= env)
    setTxtProgressBar(get("pb", envir= env),
                           curVal +1)
    FUN(...)
  }
  res &#60;- apply(X, MARGIN, wrapper, ...)
  close(pb)
  res
}</span>

<span style="color:#008000;">## NOT RUN:
# apply_pb(anscombe, 2, sd, na.rm=TRUE)

## large dataset
# df &#60;- data.frame(rnorm(30000), rnorm(30000))
# apply_pb(df, 1, sd)

###############################################################</span>

<span style="color:#333399;">lapply_pb &#60;- function(X, FUN, ...)
{
 env &#60;- environment()
 pb_Total &#60;- length(X)
 counter &#60;- 0
 pb &#60;- txtProgressBar(min = 0, max = pb_Total, style = 3)   

 # wrapper around FUN
 wrapper &#60;- function(...){
   curVal &#60;- get("counter", envir = env)
   assign("counter", curVal +1 ,envir=env)
   setTxtProgressBar(get("pb", envir=env), curVal +1)
   FUN(...)
 }
 res &#60;- lapply(X, wrapper, ...)
 close(pb)
 res
}</span>

<span style="color:#008000;">## NOT RUN:
# l &#60;- sapply(1:20000, function(x) list(rnorm(1000)))
# lapply_pb(l, mean)</span>

<span style="color:#008000;">###############################################################</span>

<span style="color:#333399;">sapply_pb &#60;- function(X, FUN, ...)
{
  env &#60;- environment()
  pb_Total &#60;- length(X)
  counter &#60;- 0
  pb &#60;- txtProgressBar(min = 0, max = pb_Total, style = 3)

  wrapper &#60;- function(...){
    curVal &#60;- get("counter", envir = env)
    assign("counter", curVal +1 ,envir=env)
    setTxtProgressBar(get("pb", envir=env), curVal +1)
    FUN(...)
  }
  res &#60;- sapply(X, wrapper, ...)
  close(pb)
  res
}</span><span style="color:#008000;">

## NOT RUN:</span>
<span style="color:#008000;"># l &#60;- sapply(1:20000, function(x) list(rnorm(1000))
# sapply_pb(l, mean)</span><span style="color:#008000;">

###############################################################</span></pre>
<p>Nice up to now, but now let&#8217;s see what the difference in performance due to the wrapper overhead looks like.</p>
<pre><span style="color:#008000;">###############################################################</span>

&#62; l &#60;- sapply(1:20000, function(x) list(rnorm(1000)))
&#62; system.time(sapply(l, mean))
User      System    verstrichen
0.474       0.003       0.475
&#62; system.time(sapply_pb(l, mean))
&#124;======================================================&#124; 100%
User      System    verstrichen
1.863       0.025       1.885

&#62; df &#60;- data.frame(rnorm(90000), rnorm(90000))
&#62; system.time(apply(df, 1, sd))
User      System verstrichen
7.152       0.062       7.260
&#62; system.time(apply_pb(df, 1, sd))
&#124;======================================================&#124; 100%
User      System     verstrichen
13.112       0.099      13.192

<span style="color:#008000;">###############################################################</span></pre>
<p>So, what we see is that performance radically goes down. This is extremely problematic in our context as one will tend to use progress bars in situations where processing times are already quite long. So if someone has an improvement for that I would be glad to hear about it.</p>
<p>Latest version with more comments on <a href="http://github.com/markheckmann/MHmisc/blob/master/apply_with_progressbar.r" target="_blank">github</a>.</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Infomaps using R - Visualizing German unemployment rates by district on a map]]></title>
<link>http://ryouready.wordpress.com/2009/11/16/infomaps-using-r-visualizing-german-unemployment-rates-by-color-on-a-map/</link>
<pubDate>Mon, 16 Nov 2009 12:12:01 +0000</pubDate>
<dc:creator>markheckmann</dc:creator>
<guid>http://ryouready.wordpress.com/2009/11/16/infomaps-using-r-visualizing-german-unemployment-rates-by-color-on-a-map/</guid>
<description><![CDATA[Lately, David Smith from REvolution Computing set out to challenge the R community with the reprocuc]]></description>
<content:encoded><![CDATA[<p><a href="http://ryouready.files.wordpress.com/2009/11/germany_by_unemployment_shapefile.png"><img class="alignleft size-medium wp-image-404" title="germany_by_unemployment_shapefile" src="http://ryouready.files.wordpress.com/2009/11/germany_by_unemployment_shapefile.png?w=300&h=300" alt="germany_by_unemployment_shapefile" width="300" height="300" /></a>Lately, David Smith from <a href="http://www.revolution-computing.com/" target="_blank">REvolution Computing</a> set out to <a href="http://blog.revolution-computing.com/2009/11/choropleth-challenge-result.html" target="_blank">challenge the R community</a> with the reprocuction of a beautiful choropleth map (= multiple regions map/thematic map) on US unemployment rates he had seen on the <a href="http://flowingdata.com/2009/11/12/how-to-make-a-us-county-thematic-map-using-free-tools/" target="_blank">Flowing Data blog</a>. <a href="http://blog.revolution-computing.com/2009/11/choropleth-challenge-result.html" target="_self">Here</a> you can find the impressing results. Being a fan of beautiful visualizations I tried to produce a similar map for Germany.</p>
<p><strong>1. </strong><strong>Getting the spatial country data</strong></p>
<p>The first step resulted in getting<strong> </strong>data to draw a map of the German administrative districts. Unfortunately, the maps for Germany do not come along in the <tt>map</tt> package, which would mean I could easily adopt the code results from the challenge. Getting data: The <a href="http://gadm.org/" target="_blank">GADM database of Global Administrative Areas</a> has the aim to provide data of administrative districts for the whole world on different levels (country, state and county level). The data can be downloaded as as a shapefile, an ESRI geodatabase file, a Google Earth .kmz file and very convenient for R users, as an Rdata file.<strong> </strong><strong><br />
</strong></p>
<p><strong>2. </strong><strong>Getting socio-demographic data</strong> (e. g. unemployment rates by administrative district): A lot of data is available online at <a href="http://www.statistikportal.de/" target="_blank">www.statistikportal.de</a>. On this site you find links to several data bases. To get the unemployment stats by county I clicked my way through: <em>Regionaldatenbank Deutschland -&#62; Arbeitsmarkt -&#62; Arbeitsmarktstatistik der Bundesagentur für Arbeit -&#62; Arbeitslose nach ausgewählten Personengruppen sowie Arbeitslosenquoten &#8211; Jahresdurchschnitt &#8211; (ab 2008) regionale Tiefe: Kreise und krfr. Städte -&#62; Werteabruf -&#62; save as CSV format</em>. This table contains all the information I need, although for some reson, for a few districts there is no data listed. I also looked for another source. On <a href="http://ims.destatis.de/indikatoren/" target="_blank">Regionalatlas</a> a nice online visualization tool is offered. In the menu I selected unemployment rate 2008 as indicator. Besides the nice visualization you get, there is a menu button &#8220;tables&#8221; where you can retrieve a html table of the data. I simply copied and pasted it into a .txt file which gives me a tab seperated value format I can read in R. But still: some districts are not listed. <a href="http://ryouready.files.wordpress.com/2009/11/data_germany_unemployment_by_county.pdf" target="_blank">Here</a> is a pdf file containing the data.<!--more--></p>
<p><strong>3. </strong><strong>Preparing the data</strong></p>
<p>Now I have two datafiles: One (gadm) containaing the spatial information, the other one (unempl) containing the unemployment rates. It turns out that the same districts are not always named alike. Sometimes the name comes along with a supplement or in other cases the deviations are more severe so that simple parsing will not do it.</p>
<p><strong><a href="http://ryouready.files.wordpress.com/2009/11/unemployment_names_comparison.png"><img class="size-full wp-image-398 alignnone" title="unemployment_names_comparison" src="http://ryouready.files.wordpress.com/2009/11/unemployment_names_comparison.png" alt="unemployment_names_comparison" width="500" height="231" /></a></strong></p>
<p>I decided to take the quick-and-dirty route and do a fuzzy matching, which surely is prone to errors, very slow and not at all elegant&#8230; Well, never underestimate the rawness of raw data.</p>
<p><strong>4. Plotting the data</strong></p>
<p>On Claudia Engel&#8217;s <a href="http://www.stanford.edu/~cengel/cgi-bin/anthrospace/download-global-administrative-areas-as-rdata-files" target="_blank">Anthrospace blog</a> I found an R script already perfect to make use of the data provided. The Rdata files turn out to contain <tt>SpatialPolygonsDataFrame</tt> so we can print the data without any further preparation using the <tt>sp</tt> package.</p>
<pre><span style="color:#339966;">###############################################################</span><span style="color:#333399;"><span style="color:#339966;">
</span></span><span style="color:#333399;">library(sp)
library(RColorBrewer)</span>

<span style="color:#333399;"># get spatial data for Germany on county level</span>
con &#60;- url("http://gadm.org/data/rda/DEU_adm3.RData")
print(load(con))
close(con)
<span style="color:#333399;"><span style="color:#339966;"># plot Germany with random colors</span>
col = rainbow(length(levels(gadm$NAME_3)))
spplot(gadm, "NAME_3", col.regions=col, main="German Regions",
       colorkey = FALSE, lwd=.4, col="white")</span>
<span style="color:#339966;">###############################################################</span>
</pre>
<p><a href="http://ryouready.files.wordpress.com/2009/11/germany_random1.png"><img class="alignnone size-full wp-image-402" title="germany_random" src="http://ryouready.files.wordpress.com/2009/11/germany_random1.png" alt="germany_random" width="480" height="480" /></a></p>
<p>This looks nice. To produce a color vector to visualize the unemployment rate the two data sets have to be merged.</p>
<pre><span style="color:#339966;">###############################################################
</span><span style="color:#333399;"><span style="color:#339966;">### DATA PREP ###</span>
<span style="color:#339966;"># loading the unemployment data</span>
unempl &#60;- read.delim2(file="./data/data_germany_unemployment_by_
                     county.txt", header = TRUE, sep = "\t",
                     dec=",", stringsAsFactors=F)</span><span style="color:#339966;">

# due to Mac OS encoding, otherwise not needed</span><span style="color:#333399;">
gadm_names &#60;- iconv(gadm$NAME_3, "ISO_8859-2", "UTF-8")  </span><span style="color:#339966;"> </span>

<pre><span style="color:#333399;"><span style="color:#339966;"># fuzzy matching of data: quick &#38; dirty
</span></span><span style="color:#339966;"># caution: this step takes some time ~ 2 min.</span><span style="color:#333399;">

</span><span style="color:#339966;"># parsing out "Städte"</span><span style="color:#333399;">
gadm_names_n &#60;- gsub("Städte", "", gadm_names)</span><span style="color:#339966;"> </span><span style="color:#333399;">

total &#60;- length(gadm_names)</span><span style="color:#339966;">
# create progress bar</span><span style="color:#333399;">
pb &#60;- txtProgressBar(min = 0, max = total, style = 3) </span>
<span style="color:#333399;">order &#60;- vector()</span><span style="color:#333399;">
for (i in 1:total){</span>  <span style="color:#333399;"><span style="color:#333399;">
   order[i] &#60;- agrep(g</span>adm_names_n[i], unempl$Landkreis, </span><span style="color:#333399;">
                     max.distance = 0.2)[1]</span>
 <span style="color:#333399;">setTxtProgressBar(pb, i)</span>               <span style="color:#339966;"># update progress bar</span><span style="color:#333399;">
}</span>

<pre>
<pre><span style="color:#333399;"><span style="color:#339966;"># choose color by unemployment rate</span></span>
<span style="color:#333399;">col_no &#60;- as.factor(as.numeric(cut(unempl$Wert[order],</span>
<span style="color:#333399;">                    c(0,2.5,5,7.5,10,15,100))))
levels(col_no) &#60;- c("&#62;2,5%", "2,5-5%", "5-7,5%",
                    "7,5-10%", "10-15%", "&#62;15%")
gadm$col_no &#60;- col_no
myPalette&#60;-brewer.pal(6,"Purples")</span>

<span style="color:#339966;"># plotting</span>
<span style="color:#333399;">spplot(gadm, "col_no", col=grey(.9), col.regions=myPalette,
main="Unemployment in Germany by district")</span><span style="color:#339966;">

###############################################################</span></pre>
<p><a href="http://ryouready.files.wordpress.com/2009/11/germany_by_unemployment.png"><img class="alignnone size-full wp-image-403" title="germany_by_unemployment" src="http://ryouready.files.wordpress.com/2009/11/germany_by_unemployment.png" alt="germany_by_unemployment" width="480" height="480" /></a></p>
<p><span style="color:#000000;">It seems that the districts for which no data was available mainly belong to the states Sachsen-Anhalt and Sachsen. Also you can see that east of Germany has got a much higher unemplyoment rate than the west. The same holds true for a north-south comparison.</span></p>
<p><span style="color:#000000;">Besides ths <tt>sp</tt> package there are many other ways to produce such a graphic. I will now take another approch using shapefile data which also is available on GDAM. The data is availabe as a .zip file which includes several dBase files for all levels (3=district, 1=state etc.).</span></p>
<pre>
<pre><span style="color:#339966;">###############################################################</span>

<pre><span style="color:#333399;">library(sp)
library(maptools)</span>

<span style="color:#333399;">nc1 &#60;- readShapePoly("./data/DEU_adm/DEU_adm1.dbf",</span>
<span style="color:#333399;">                    proj4string=CRS("+proj=longlat +datum=NAD27"))</span><span style="color:#333399;">
nc3 &#60;- readShapePoly("./data/DEU_adm/DEU_adm3.dbf",
                    proj4string=CRS("+proj=longlat +datum=NAD27"))</span>

<span style="color:#339966;"># col_no comes from the calculations above</span><span style="color:#333399;">
par(mar=c(0,0,0,0))</span>
<span style="color:#333399;">plot(nc3, col=myPalette[col_no], border=grey(.9), lwd=.5)
plot(nc1, col=NA, border=grey(.5), lwd=1, add=TRUE)</span>

<span style="color:#339966;">###############################################################</span></pre>
<p><a href="http://ryouready.files.wordpress.com/2009/11/germany_by_unemployment_shapefile.png"><img title="germany_by_unemployment_shapefile" src="http://ryouready.files.wordpress.com/2009/11/germany_by_unemployment_shapefile.png?w=480&h=480" alt="germany_by_unemployment_shapefile" width="480" height="480" /></a></p>
<p><span style="color:#000000;">What I like about all this, it that it is pretty simple to draw almost any country you like. Besides the very messy part of data preparation it is only a few lines of code and the results are nice.</span></p>
<pre>
<pre> 
</pre>
]]></content:encoded>
</item>
<item>
<title><![CDATA[useR! Konferenz in Rennes 2009]]></title>
<link>http://markheckmann.wordpress.com/2009/07/15/user-konferenz-in-rennes-2009/</link>
<pubDate>Wed, 15 Jul 2009 18:17:31 +0000</pubDate>
<dc:creator>markheckmann</dc:creator>
<guid>http://markheckmann.wordpress.com/2009/07/15/user-konferenz-in-rennes-2009/</guid>
<description><![CDATA[Vom 06.07. bis zum 10.07.09 fand in der schönen Stadt Rennes in der französischen Bretagne die diesj]]></description>
<content:encoded><![CDATA[<p><a href="http://www.agrocampus-ouest.fr/math/useR-2009//"><img class="alignleft size-full wp-image-511" style="margin-top:5px;margin-bottom:5px;" title="useR" src="http://markheckmann.files.wordpress.com/2009/07/user-middle.png" alt="useR" width="237" height="114" /></a>Vom 06.07. bis zum 10.07.09 fand in der schönen Stadt Rennes in der französischen Bretagne die diesjährige <a href="http://www.agrocampus-ouest.fr/math/useR-2009/" target="_blank">useR!</a> Konferenz statt. Für mich war es das erste Mal, dieser Konferenz beizuwohnen und definitiv nicht das letzte Mal. Die Organisation war sehr gut: Die Veranstaltungen fingen pünklich an, für Mittagessen und Zwischenmahlzeiten war ausnahmslos gesorgt, und es blieb Raum und Zeit, um sich auszutauschen. Auch der Veransaltungsort, der Campus der Universität <a href="www.agrocampus-ouest.fr" target="_blank">Agrocampus-Ouest</a>, bot eine mehr als angenehme Atmosphäre.</p>
<p>Die Themenfelder, die bei der useR! behandelt wurden, gliedern sich in drei Hauptbereiche: informatische, statistische und anwendungensbezogenen Sessions. Auch für Psychologen boten sich so im Schwerpunkt Psychometrie einige interessante Sessions an. Die Keynote-Vorträge von <a href="http://www.math.usu.edu/~adele/" target="_blank">Adele Cutler</a>, <a href="http://www-stat.stanford.edu/~hastie/" target="_blank">Trevor Hastie</a>, <a href="http://www-stat.stanford.edu/~jhf/" target="_blank">Jerome Friedman</a> oder <a href="http://www.econ.upf.edu/~michael/" target="_blank">Michael Greenacre</a> machten weiterhin  deutlich, dass diverse Größen aus der Statistikforschung der R Bewegung sehr zugetan sind und die Sprache intensiv nutzen.<!--more--></p>
<p>Für mich als useR! Neuling ist die offene Atmosphäre beeindruckend gewesen. Die Community freut sich, wenn Leute etwas beisteuern und dies spürt man. Auch bekommt man dort ein Gefühl für die Mächtigkeit dieser Bewegung. Denn was  an Paketen entwickelt wird und welchen Umfang ihre Analysefunktionen  zunehmends abdecken, ist wirklich beeindruckend (s. <a href="http://www.r-project.org/doc/Rnews/Rnews_2008-2.pdf" target="_blank">hier</a>)</p>
<p>Einige Neuerungen zur Integration von R in Office-Anwendungen haben mich besonders gefreut, da ich R u. a. für die Marktfoschung nutze und hier ein starkes Bedürfnis nach Office kompatiblem Output vorhanden ist. Es gab jeweils eine ganze Focus Session zur Office Integration und Data Connection. Hier wurde z. B. von <a href="http://homepage.univie.ac.at/erich.neuwirth/php/dokuwiki/doku.php" target="_blank">Erich Neuwirth</a> die Integration von Excel und R vorgestellt. Mehr dazu wird in dem Buch <a href="http://www.springer.com/statistics/computational/book/978-1-4419-0051-7" target="_blank"><em>R Through Excel</em></a> in der Reihe user! des Springer Verlags im September erscheinen. <a href="http://www.google.de/url?sa=t&#38;source=web&#38;ct=res&#38;cd=4&#38;url=http%3A%2F%2Fwww.agrocampus-ouest.fr%2Fmath%2FuseR-2009%2Fabstracts%2Fpdf%2FJones.pdf&#38;ei=bUdwSuSPBYaknQPHrKS4Bw&#38;usg=AFQjCNEF0FTNVKYjZR_u6aKVXfxPDOYmcA&#38;sig2=YWLcBkXhyjHN9iwv7rohhg" target="_blank">Wayne Jones</a> entwickelte ein <em>R to PowerPoint</em> package. Inspiriert von diesen Vorträgen entwarf Christian Ritter quasi über Nacht ein <em>R to Word</em> package, das Ende dieses Jahres auf CRAN verfügbar sein wird. All dies sind massive Schritte diese Output-Lücke endlich zu schließen.</p>
<p>Da die useR! jedes Jahr abwechselnd in Europa und Übersee stattfindet, werde ich kommendes Jahr nicht die Möglichkeit haben, ihr beizuwohnen. 2011, wenn sie wieder in Eurpoa ist, jedoch bestimmt wieder.</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[R: Function to create tables in LaTex or Lyx to display regression model results]]></title>
<link>http://ryouready.wordpress.com/2009/06/19/r-function-to-create-tables-in-latex-or-lyx-to-display-regression-models-results/</link>
<pubDate>Fri, 19 Jun 2009 16:41:38 +0000</pubDate>
<dc:creator>markheckmann</dc:creator>
<guid>http://ryouready.wordpress.com/2009/06/19/r-function-to-create-tables-in-latex-or-lyx-to-display-regression-models-results/</guid>
<description><![CDATA[Most people using LaTex feel that creating tables is no fun. Some days ago I stumbled across a neat ]]></description>
<content:encoded><![CDATA[<p><img class="alignleft size-medium wp-image-358" style="margin:6px;" title="regression_models_in_latex" src="http://ryouready.files.wordpress.com/2009/06/regression_models_in_latex.png?w=240&h=199" alt="regression_models_in_latex" width="240" height="199" />Most people using LaTex feel that creating tables is no fun. Some days ago I stumbled across a neat function written by <a href="http://pj.freefaculty.org/" target="_blank">Paul Johnson</a> that produces LaTex code as well as LaTex code that can be used within <a href="www.lyx.org" target="_blank">Lyx</a>. The output can be used for regression models and looks like output from the Stata <tt>outreg</tt> command. His R function that produces the LaTex code has the same name:  <tt>outreg()</tt>. The <tt>outreg </tt>code can be found on his <a href="http://pj.freefaculty.org/R/outreg-worked.R" target="_blank">website</a> or in the <a href="http://ryouready.files.wordpress.com/2009/06/outreg_code_by_paul_johnson.pdf" target="_blank">PDF copy</a> of the code from his website.</p>
<p>I took the code, put it into a <tt>.rnw</tt> file and sweaved it. It worked like a charm and produced beautiful results (see the picture on the left and the <a href="http://ryouready.files.wordpress.com/2009/06/regression_for_latex_and_sweave.pdf" target="_blank">PDF</a>). Below you can find the code for the noweb file (.rnw). Latex code is colored grey, R-code is colored blue. Just have a look at all the results as a <a href="http://ryouready.files.wordpress.com/2009/06/regression_for_latex_and_sweave.pdf">PDF</a> file. Besides, Paul Johnson has also created a nice <a href="http://pj.freefaculty.org/R/Rtips.html" target="_blank">list of R-Tips</a> that can be found on his website as well.</p>
<p><!--more--></p>
<p><span style="color:#000080;"><span style="color:#008000;"> </span></span></p>
<pre><span style="color:#000080;"><span style="color:#008000;">###############################################################</span>

</span><span style="color:#808080;">\documentclass[a4paper,10pt]{article}

\title{Regression Tables for LaTex}
\auth<span style="color:#888888;">or{</span></span><span style="color:#888888;">Mark Heckmann. Code by Paul Johnson</span><span style="color:#808080;"><span style="color:#888888;">}
\date{\today}</span>

\begin{document}
\maketitle

&#60;&#60;echo=FALSE&#62;&#62;=
<span style="color:#333399;">x1 &#60;- rnorm(100)
x2 &#60;- rnorm(100)
y1 &#60;- 5*rnorm(100)+3*x1 + 4*x2
y2 &#60;- rnorm(100)+5*x2
m1 &#60;- lm (y1~x1)
m2 &#60;- lm (y1~x2)
m3 &#60;- lm (y1 ~ x1 + x2)
gm1 &#60;- glm(y1~x1)</span>
@

&#60;&#60;echo=FALSE, results=tex&#62;&#62;=
<span style="color:#333399;"> outreg(m1,title="My One Tightly Printed Regression", lyx=F )
 outreg(m1,tight=F,modelLabels=c("Fingers"),
        title="My Only Spread Out Regressions" ,lyx=F)
 outreg(list(m1,m2),modelLabels=c("Mine","Yours"),
        varLabels=list(x1="Billie"),
        title="My Two Linear Regressions Tightly Printed" ,lyx=F)
 outreg(list(m1,m2),modelLabels=c("Whatever","Whichever"),
        title="My Two Linear Regressions Not Tightly  Printed",
        showAIC=F, lyx=F)
 outreg(list(m1,m2,m3),title="My Three Linear Regressions", lyx=F)
 outreg(list(m1,m2,m3),tight=F,
        modelLabels=c("I Love love love really long titles",
        "Hate Long","Medium"), lyx=F)
 outreg(list(gm1),modelLabels=c("GLM"), lyx=F)
 outreg(list(m1,gm1),modelLabels=c("OLS","GLM"), lyx=F)</span>
@

\end{document}</span><span style="color:#000080;">
<span style="color:#008000;">
###############################################################
</span></span></pre>
<p><span style="color:#000080;"><span style="color:#008000;"> </span></span></p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[R: Building functions - using default settings that can be modified via the dot-dot-dot / three point argument]]></title>
<link>http://ryouready.wordpress.com/2009/04/07/dot/</link>
<pubDate>Tue, 07 Apr 2009 13:58:04 +0000</pubDate>
<dc:creator>markheckmann</dc:creator>
<guid>http://ryouready.wordpress.com/2009/04/07/dot/</guid>
<description><![CDATA[Before you read this post, please have a look at Enrique&#8217;s comment below. He pointed out that ]]></description>
<content:encoded><![CDATA[<blockquote><p><span style="color:#808080;">Before you read this post, please have a look at Enrique&#8217;s comment below. He pointed out that the built-in R function modifyList() already does what I wanted to describe in this post. Well, I live to learn :)</span></p></blockquote>
<p><span style="color:#000000;">I was wondering how I could write a function that uses default settings but accepts a list to overwrite the default settings via the dot-dot-dot / three-point argument. Here comes my solution.<br />
</span></p>
<pre><span style="color:#000080;"><span style="color:#008000;"># building a function with a list of default settings
# that can be modified by an optional list passed
# via the dot-dot-dot / three point argument

</span></span></pre>
<p><span style="color:#000080;"><span style="color:#008000;"><!--more--></span></span></p>
<pre><span style="color:#000080;"><span style="color:#008000;">###############################################################</span>

myFunction &#60;- function(...)
{
  print(hasArg(settings))
}

myFunction()
myFunction(settings=list(par1=8))
<span style="color:#008000;">
###############################################################

# now I define a default setting list
</span>
myFunction &#60;- function(...)
{
  print(hasArg(settings))
  <span style="color:#008000;"># define default settings</span>
  settings = list(par1=10, par2=12)
  print(settings)
}

myFunction()
myFunction(settings=list(par1=8))

<span style="color:#008000;">###############################################################

# as a next step I replace all the elements passed on to
# settings via the dot-dot-dot argument</span>

myFunction &#60;- function(...)
{
</span><span style="color:#000080;"><span style="color:#008000;">  # default settings</span></span><span style="color:#000080;">
  settings = list(par1=10, par2=12)       </span>
<span style="color:#000080;"><span style="color:#008000;">  # if settings argument is used
</span></span><span style="color:#000080;">  if(hasArg(settings)){
      suppliedSettings &#60;- list(...)$settings
      matching &#60;- intersect(names(settings),
                            names(suppliedSettings))
      settings[matching] &#60;- suppliedSettings[matching]
  }

  <span style="color:#008000;"># function operations</span>
  print(settings)
}

myFunction()
myFunction(settings=list(par1=8))

<span style="color:#008000;"># Now the settings have changed.

###############################################################

# If now some settings are supplied that do not match they
# are simply not considered
</span>
myFunction(settings=list(par1=8, parX=6))
<span style="color:#008000;"># For convenience I add a line to warn the user in case of not
# matching parameters
</span>
myFunction &#60;- function(...)
{</span>
<span style="color:#000080;"><span style="color:#008000;">  # default settings</span></span><span style="color:#000080;">
  settings = list(par1=10, par2=12)
</span><span style="color:#000080;"><span style="color:#008000;">  # if settings is supplied</span></span><span style="color:#000080;">
  if(hasArg(settings)){
      suppliedSettings &#60;- list(...)$settings
      matching &#60;- intersect(names(settings),
                            names(suppliedSettings))
      settings[matching] &#60;- suppliedSettings[matching]
      notMatching &#60;- setdiff(names(suppliedSettings),
                             names(settings))
      if(length(notMatching)!=0) warning(paste("The
            following arguments are ignored: ", notMatching))
  }

  <span style="color:#008000;"># function operations</span>
  print(settings)
}

<span style="color:#008000;">###############################################################</span>

myFunction(settings=list(par1=8, parX=6))
<span style="color:#008000;"># now the user is warned when some arguments do not match

by Mark Heckmann, April, 2009</span></span></pre>
]]></content:encoded>
</item>
<item>
<title><![CDATA[R: Zip fastener for two data frames / combining rows or columns of two dataframes in an alternating manner]]></title>
<link>http://ryouready.wordpress.com/2009/03/27/r-zip-fastener-for-two-data-frames-combining-rows-or-columns-of-two-dataframes-in-an-alternating-manner/</link>
<pubDate>Fri, 27 Mar 2009 11:03:34 +0000</pubDate>
<dc:creator>markheckmann</dc:creator>
<guid>http://ryouready.wordpress.com/2009/03/27/r-zip-fastener-for-two-data-frames-combining-rows-or-columns-of-two-dataframes-in-an-alternating-manner/</guid>
<description><![CDATA[Sometimes I find it useful to merge two data frames like the following ones   X1 X2 X3 X4 Y1 Y2 Y3 Y]]></description>
<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-301" title="zipper" src="http://ryouready.files.wordpress.com/2009/03/zipper.gif" alt="zipper" width="1" height="1" /><a href="http://www.flickr.com/photos/insashi/2046976044/"><img class="alignleft size-medium wp-image-302" style="border:0 none;margin-top:8px;margin-bottom:8px;" title="zippers" src="http://ryouready.files.wordpress.com/2009/03/zippers.png?w=240&h=179" alt="zippers" width="240" height="179" /></a>Sometimes I find it useful to merge two data frames like the following ones</p>
<pre>  <span style="color:#008000;">X1 X2 X3 X4</span>    <span style="color:#000080;">  Y1 Y2 Y3 Y4</span>  <span style="color:#000080;"> </span>
<span style="color:#008000;">1  o  o  o  o</span>    <span style="color:#3366ff;">   <span style="color:#000080;">X  X  X  X</span></span>
<span style="color:#008000;">2  o  o  o  o</span>      <span style="color:#3366ff;"> <span style="color:#000080;">X  X  X  X</span></span>
<span style="color:#008000;">3  o  o  o  o</span>     <span style="color:#000080;">  X  X  X  X</span></pre>
<p>by using zip feeding either along the columns</p>
<pre><span style="color:#008000;">   X1 </span><span style="color:#000080;">Y1 </span><span style="color:#008000;">X2 </span><span style="color:#000080;">Y2 </span><span style="color:#008000;">X3 </span><span style="color:#000080;">Y3 </span><span style="color:#008000;">X4 </span><span style="color:#000080;">Y4
</span>1  <span style="color:#008000;">o</span>  <span style="color:#000080;">X</span>  <span style="color:#008000;">o</span>  <span style="color:#000080;">X</span>  <span style="color:#008000;">o  </span><span style="color:#000080;">X</span>  <span style="color:#008000;">o  </span><span style="color:#000080;">X
</span>2  <span style="color:#008000;">o</span>  <span style="color:#000080;">X</span>  <span style="color:#008000;">o</span>  <span style="color:#000080;">X</span>  <span style="color:#008000;">o  </span><span style="color:#000080;">X</span>  <span style="color:#008000;">o  </span><span style="color:#000080;">X
</span>3  <span style="color:#008000;">o</span>  <span style="color:#000080;">X</span>  <span style="color:#008000;">o</span>  <span style="color:#000080;">X</span>  <span style="color:#008000;">o  </span><span style="color:#000080;">X</span>  <span style="color:#008000;">o  </span><span style="color:#000080;">X</span></pre>
<p>or along the rows of the data frames.</p>
<pre>  <span style="color:#008000;">V1 V2 V3 V4
</span><span style="color:#008000;">1  o  o  o  o</span>
<span style="color:#000080;">4  X  X  X  X</span>
<span style="color:#008000;">2  o  o  o  o</span>
<span style="color:#000080;">5  X  X  X  X</span>
<span style="color:#008000;">3  o  o  o  o</span>
<span style="color:#000080;">6  X  X  X  X</span></pre>
<p><!--more-->The following function acts like a <em>&#8220;zip fastener&#8221; </em>for combining two dataframes. It takes the first column (or row) of the first data frame and places it next to the first column (or row) of the second data frame and so on. Only one dimension of the data frame has to be equal to do this. E.g. to combine the <em>columns </em>by zip feeding the number of <em>rows </em>must be equal and vice versa.</p>
<p>So here comes the code for the <tt>zipFastener()</tt> function. Actually its only the last few lines (from <tt>#zip fastener operations</tt> on) that do the job, but as I did not want to restrict the function to equal dimensions there is a little prelude.</p>
<pre><span style="color:#008000;">###############################################################

</span><span style="color:#000080;"><span style="color:#008000;"># zipFastener for TWO dataframes of unequal length</span>
zipFastener &#60;- function(df1, df2, along=2)
{
    <span style="color:#008000;"># parameter checking</span>
    if(!is.element(along, c(1,2))){
        stop("along must be 1 or 2 for rows and columns
                                              respectively")
    }
    <span style="color:#008000;"># if merged by using zip feeding along the columns, the
    # same no. of rows is required and vice versa</span>
    if(along==1 &#38; (ncol(df1)!= ncol(df2))) {
        stop ("the no. of columns has to be equal to merge
               them by zip feeding")
    }
    if(along==2 &#38; (nrow(df1)!= nrow(df2))) {
        stop ("the no. of rows has to be equal to merge them by
               zip feeding")
    }

    <span style="color:#008000;"># zip fastener preperations</span>
    d1 &#60;- dim(df1)[along]
    d2 &#60;- dim(df2)[along]
    i1 &#60;- 1:d1           <span style="color:#008000;"># index vector 1</span>
    i2 &#60;- 1:d2 + d1      <span style="color:#008000;"># index vector 2</span>

    <span style="color:#008000;"># set biggest dimension dMax</span>
    if(d1==d2) {
        dMax &#60;- d1
    } else if (d1 &#62; d2) {
        length(i2) &#60;- length(i1)    <span style="color:#008000;"># make vectors same length, </span>
        dMax &#60;- d1                  <span style="color:#008000;"># </span></span><span style="color:#000080;"><span style="color:#008000;"><span style="color:#008000;">f</span>ill blanks with NAs</span></span><span style="color:#000080;">   
    } else  if(d1 &#60; d2){
        length(i1) &#60;- length(i2)    <span style="color:#008000;"># make vectors same length,</span>
        dMax &#60;- d2                  <span style="color:#008000;"># f</span></span><span style="color:#000080;"><span style="color:#008000;">ill blanks with NAs</span></span><span style="color:#000080;">   
    }
    </span>
<span style="color:#000080;"><span style="color:#008000;">    # zip fastener operations</span></span>
<span style="color:#000080;">    index &#60;- as.vector(matrix(c(i1, i2), ncol=dMax, byrow=T))
    index &#60;- index[!is.na(index)]        <span style="color:#008000;"> # remove NAs</span>
    </span>
<span style="color:#000080;">    if(along==1){
        colnames(df2) &#60;- colnames(df1)  </span><span style="color:#000080;"><span style="color:#008000;"> # keep 1st colnames </span></span><span style="color:#000080;">                 
        res &#60;- rbind(df1,df2)[ index, ]  <span style="color:#008000;"># reorder data frame</span>
    }
    if(along==2) res &#60;- cbind(df1,df2)[ , index]           

    return(res)
}

</span><span style="color:#008000;">###############################################################</span></pre>
<p>Here come some examples.</p>
<pre><span style="color:#008000;">###############################################################</span></pre>
<pre><span style="color:#000080;"><span style="color:#008000;">### examples ###</span>
require(plyr)

<span style="color:#008000;"># data frames equal dimensions</span>
df1 &#60;- rdply(3, rep("o",4))[ ,-1]       <span style="color:#008000;"># from plyr package</span>
df2 &#60;- rdply(3, rep("X",4))[ ,-1]       

zipFastener(df1, df2)
zipFastener(df1, df2, 2)
zipFastener(df1, df2, 1)

<span style="color:#008000;"># data frames unequal in no. of rows</span>
df1 &#60;- rdply(10, rep("o",4))[ ,-1]
zipFastener(df1, df2, 1)
zipFastener(df2, df1, 1)

<span style="color:#008000;"># data frames unequal in no. of columns</span>
df2 &#60;- rdply(10, rep("X",3))[ ,-1]
zipFastener(df1, df2)
zipFastener(df2, df1, 2)
</span><span style="color:#008000;">
###############################################################</span></pre>
<p>I hope you find that useful.</p>
<p>Ciao, Mark</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[R: Monitoring the function progress with a progress bar]]></title>
<link>http://ryouready.wordpress.com/2009/03/16/r-monitor-function-progress-with-a-progress-bar/</link>
<pubDate>Mon, 16 Mar 2009 16:40:38 +0000</pubDate>
<dc:creator>markheckmann</dc:creator>
<guid>http://ryouready.wordpress.com/2009/03/16/r-monitor-function-progress-with-a-progress-bar/</guid>
<description><![CDATA[Every once in while I have to write a function that contains a loop doing thousands or millions of c]]></description>
<content:encoded><![CDATA[<p><a href="http://www.flickr.com/photos/unsoundtransient/2168462838/"><img class="alignleft size-medium wp-image-252" style="margin:2px 15px;" title="progress_bar" src="http://ryouready.files.wordpress.com/2009/03/progress_bar.jpg?w=240&h=180" alt="progress_bar" width="240" height="180" /></a>Every once in while I have to write a function that contains a loop doing thousands or millions of calculations. To make sure that the function does not get stuck in an endless loop or just to fulfill the human need of control it is useful to monitor the progress. So  first I tried the following:</p>
<pre><span style="color:#000080;"><span style="color:#008000;">

</span></span><span style="color:#008000;">###############################################################
</span><span style="color:#000080;">
total &#60;- 10
for(i in 1:total){
   print(i)
   Sys.sleep(0.1)
}

</span><span style="color:#008000;">###############################################################
</span></pre>
<p>Unfortunately this does not work as the console output to the basic R GUI is buffered. This means that it is printed to the console at once after the loop is finished. The <a title="FAQs" href="http://cran.r-project.org/bin/windows/rw-FAQ.html#The-output-to-the-console-seems-to-be-delayed" target="_blank">R FAQs (7.1)</a> explains a solution: Either to change the R GUI buffering settings in the <em>Misc </em>menu which can be toggled via <em>&#60;Ctrl-W&#62;</em> or to tell R explicitly to empty the buffer by <tt>flush.console()</tt>. So like this it works:</p>
<pre><span style="color:#008000;">###############################################################
</span><span style="color:#000080;">
total &#60;- 20
for(i in 1:total){
</span><span style="color:#000080;">   Sys.sleep(0.1)</span><span style="color:#000080;">
   print(i)
</span><span style="color:#000080;"><span style="color:#008000;">   # update GUI console</span></span><span style="color:#000080;">
</span><span style="color:#000080;">   flush.console()</span><span style="color:#000080;"><span style="color:#008000;">                          </span></span><span style="color:#000080;">
}

</span><span style="color:#008000;">###############################################################
</span></pre>
<p>Of course it would be even nicer to have a real progress bar. For different progress bars we can use the built-in <tt>R.utils</tt> package. First a text based progress bar:</p>
<p><!--more--></p>
<pre><span style="color:#008000;">###############################################################

</span><span style="color:#000080;">total &#60;- 20</span>
<span style="color:#000080;"><span style="color:#008000;"># create progress bar</span></span><span style="color:#000080;">
pb &#60;- txtProgressBar(min = 0, max = total, style = 3)
for(i in 1:total){
   Sys.sleep(0.1)</span>
<span style="color:#000080;"><span style="color:#008000;">   # update progress bar</span></span><span style="color:#000080;">
   setTxtProgressBar(pb, i)
}
close(pb)</span><span style="color:#000080;"><span style="color:#008000;">

</span></span><span style="color:#008000;">###############################################################
</span></pre>
<p><span style="color:#000000;">To get a GUI progress bar the </span><tt>tkProgressBar()</tt> function from the <tt>tcltk</tt> package can used.</p>
<pre><span style="color:#008000;">###############################################################</span><span style="color:#000080;">
<span style="color:#000080;">
total &#60;- 20
</span></span><span style="color:#000080;"><span style="color:#008000;"># create progress bar</span></span><span style="color:#000080;">
</span><span style="color:#000080;">pb &#60;- tkProgressBar(title = "progress bar", min = 0,
                    max = </span><span style="color:#000080;"><span style="color:#000080;">total</span></span><span style="color:#000080;">, width = 300)</span>
<span style="color:#000080;">
for(i in 1:total){
   Sys.sleep(0.1)
 <span style="color:#000080;">  </span></span><span style="color:#000080;">setTkProgressBar(pb, i, label=paste( round(i/total*100, 0),
                                        "% done"))
</span><span style="color:#000080;"><span style="color:#000080;">}</span>
close(pb)</span><span style="color:#000080;"><span style="color:#008000;">

</span></span><span style="color:#008000;">###############################################################</span></pre>
<p>Last but not least, a progress bar using the Windows operating system.</p>
<pre><span style="color:#008000;">###############################################################

</span><span style="color:#000080;"><span style="color:#008000;"># create progress bar</span></span><span style="color:#000080;">
</span><span style="color:#000080;">pb &#60;<span style="color:#000080;">- </span></span><span style="color:#000080;">winProgressBar(title = "progress bar", min = 0,
                     max = total, width = 300)
</span><span style="color:#000080;"><span style="color:#000080;">
for(i in 1:total){</span>
   Sys.sleep(0.1)
 <span style="color:#000080;">  </span></span><span style="color:#000080;">setWinProgressBar(pb, i, title=paste( round(i/total*100, 0),
                                        "% done"))
</span><span style="color:#000080;"><span style="color:#000080;">}</span>
close(pb)</span><span style="color:#000080;"><span style="color:#008000;">

</span></span><span style="color:#008000;">###############################################################
</span></pre>
<p>Ciao, Mark</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[R: Good practice - adding footnotes to graphics]]></title>
<link>http://ryouready.wordpress.com/2009/02/17/r-good-practice-adding-footnotes-to-graphics/</link>
<pubDate>Tue, 17 Feb 2009 17:46:16 +0000</pubDate>
<dc:creator>markheckmann</dc:creator>
<guid>http://ryouready.wordpress.com/2009/02/17/r-good-practice-adding-footnotes-to-graphics/</guid>
<description><![CDATA[In some statistical programs there is the option available to attach a footnote to the graphical out]]></description>
<content:encoded><![CDATA[<p><a href="http://www.creativesynthesis.net/blog/2007/05/17/im-here-to-fix-your-ahem-database/" target="_blank"><img class="alignleft size-medium wp-image-242" style="margin:0;" title="//www.creativesynthesis.net/blog/2007/05/17/im-here-to-fix-your-ahem-database/. 10. February 2009." src="http://ryouready.files.wordpress.com/2009/02/footnote.png?w=300&h=177" alt="footnote" width="300" height="177" /></a>In some statistical programs there is the option available to attach a footnote to the graphical output that is created. This footnote may contain the name of the script or the file that produced the graphic, the author&#8217;s name and the date of creation. In SAS for example there is a <em>footnote </em>command to achieve this. Ever since I realized that this makes life a lot easier, I wrote a simple three-lines function in R which I use at the end of the construction of any graphic. I suppose, that this is what my professors meant with &#8220;good practice&#8221;. The nice thing about implementing this in the <em>grid </em>graphics system is that you can produce multiple graphics [e.g. by <tt>par(mfrow=c(2, 2))</tt>] and still the footnote will be positioned correctly.</p>
<p><!--more--></p>
<pre><span style="color:#008000;">###############################################################
##                                                           ##
##      R: Good practice - adding footnotes to graphics      ##
##                                                           ##
###############################################################</span>
<span style="color:#008000;">
# basic information at the beginning of each script</span>
<span style="color:#000080;">scriptName &#60;- "filename.R"
author &#60;- "mh"
footnote &#60;- paste(scriptName, format(Sys.time(), "%d %b %Y"),
                  author, sep=" / ")

<span style="color:#008000;"># default footnote is today's date, cex=.7 (size) and color
# is a kind of grey</span>

makeFootnote &#60;- function(footnoteText=
                         format(Sys.time(), "%d %b %Y"),
                         size= .7, color= grey(.5))
{
   require(grid)
   pushViewport(viewport())
   grid.text(label= footnoteText ,
             x = unit(1,"npc") - unit(2, "mm"),
             y= unit(2, "mm"),
             just=c("right", "bottom"),
             gp=gpar(cex= size, col=color))
   popViewport()
}

makeFootnote(footnote)</span>

<span style="color:#008000;">## Exa<span style="color:#008000;">mple</span></span><span style="color:#008000;"> ##</span>
<span style="color:#000080;">plot(1:10)
makeFootnote(footnote)</span>

<span style="color:#008000;">###############################################################</span><span style="color:#008000;">
</span></pre>
<p><span style="color:#000000;">Here an example of a footnote added to the graphical output.</span></p>
<div id="attachment_235" class="wp-caption alignleft" style="width: 310px"><img class="size-medium wp-image-235" title="footnotes_in_graphs" src="http://ryouready.files.wordpress.com/2009/02/r_posting_2008_12_03_good_practice_footnotes_in_graphs.png?w=300&h=300" alt="Correlation matrix with footnote" width="300" height="300" /><p class="wp-caption-text">Correlation matrix with footnote</p></div>
<p><span style="color:#000000;">Cheers, Mark</span></p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[R: Calculating all possible linear regression models for a given set of predictors]]></title>
<link>http://ryouready.wordpress.com/2009/02/06/r-calculating-all-possible-linear-regression-models-for-a-given-set-of-predictors/</link>
<pubDate>Fri, 06 Feb 2009 11:05:26 +0000</pubDate>
<dc:creator>markheckmann</dc:creator>
<guid>http://ryouready.wordpress.com/2009/02/06/r-calculating-all-possible-linear-regression-models-for-a-given-set-of-predictors/</guid>
<description><![CDATA[Although the graphic at the left might not seem a 100% appropriate, it gives a hint to what I am abo]]></description>
<content:encoded><![CDATA[<p><a rel="http://www.flickr.com/photos/ethanhein/3151846786/" href="http://www.flickr.com/photos/ethanhein/3151846786/" target="_blank"><img class="alignleft size-medium wp-image-174" title="Ethan Hein. &#34;permutations&#34;. Online image. Flickr. 3 January 2008" src="http://ryouready.files.wordpress.com/2009/01/variations.jpg?w=300&h=221" alt="variations" width="300" height="221" /></a>Although the graphic at the left might not seem a 100% appropriate, it gives a hint to what I am about to do. I want to calculate all possible linear regression models with one dependent and several independent variables.  I do not want to address bias and fitting issues or the question if this makes sense from a statistical point of view in this posting. Here I want to emphasize the technical issues only.</p>
<p>To solve the task, several approaches are possible. The first one is a step-by-step approach using a lot of code. Another one would be to make use of a specialized package. The packages <tt>leaps</tt> and <tt>meifly</tt> would be appropriate for the task but have some slight drawbacks in terms of flexibility. I will not address solutions using these packages here, but I would like to point out that in contrast to the below only a few lines of code would do the job.</p>
<h2>The step-by-step approach</h2>
<p>Let&#8217;s suppose we have the following set of four possible regressors.</p>
<pre><span style="color:#000080;">regressors &#60;- c("y1", "y2", "y3", "y4")</span></pre>
<p>Now we want to construct a formula that contains the first and third regressor.</p>
<pre><span style="color:#000080;">vec &#60;- c(T, F, T, F)
paste(regressors[vec])
&#62; [1] "y2" "y3"</span></pre>
<p>So the paste commmand works vectorwise which helps a lot in this case. Now we add a plus sign between the regressors&#8230;</p>
<p><!--more--></p>
<pre><span style="color:#000080;">paste(regressors[vec], collapse=" + ")</span></pre>
<p>&#8230; and add the left side of the equation. The 1 in the formula models the intercept , 0 would be a model without intercept.</p>
<pre><span style="color:#000080;">paste(c("y ~ 1", regressors[vec]), collapse=" + ")</span></pre>
<p>Now let&#8217;s make a formula out of it.</p>
<pre><span style="color:#000080;">as.formula(paste(c("y ~ 1", regressors[vec]), collapse=" + "))</span></pre>
<p>So we can construct a formula from each row of a TRUE /FALSE matrix which determines if a regressor is used or not. Now we need a TRUE / FALSE matrix of all the possible regressor combinations. The <tt>expand.grid()</tt> function produces one (see <tt>?expand.grid</tt>).</p>
<pre><span style="color:#000080;">regMat &#60;- expand.grid(c(TRUE,FALSE), c(TRUE,FALSE),
                      c(TRUE,FALSE), c(TRUE,FALSE)) </span>
<span style="color:#000080;"># &#62; regMat
#     Var1  Var2  Var3  Var4
# 1   TRUE  TRUE  TRUE  TRUE
# 2  FALSE  TRUE  TRUE  TRUE
# 3   TRUE FALSE  TRUE  TRUE
# 4  FALSE FALSE  TRUE  TRUE
# 5   TRUE  TRUE FALSE  TRUE</span></pre>
<p>The last line describes a trivial model as it does not contain any regressors (as it contains only FALSE values), thus it is removed.</p>
<pre><span style="color:#000080;">regMat &#60;- regMat[-(dim(regMat)[1]),]</span>
<span style="color:#008000;">
# let's name the columns</span><span style="color:#000080;">
names(regMat) &#60;- paste("y", 1:4, sep="")</span></pre>
<p>Now we can apply the above way of formula construction to each row of the matrix so we get a list with all the possible models.</p>
<pre><span style="color:#008000;"># x1 will be dependent</span>
<span style="color:#000080;">regressors &#60;- c("y1", "y2", "y3", "y4")
</span>
<span style="color:#000080;">allModelsList &#60;- apply(regMat, 1, function(x) as.formula(
                       paste(c("x1 ~ 1", regressors[x]),
                             collapse=" + ")) )</span>
<span style="color:#000080;"># &#62; allModelsList
# [[1]]
# x1 ~ 1 + y1 + y2 + y3 + y4
#
# [[2]]
# x1 ~ 1 + y2 + y3 + y4
#
# [[3]]
# x1 ~ 1 + y1 + y3 + y4
</span></pre>
<p><span style="color:#000000;">The last step is to use each list element for the calculation.</span></p>
<pre><span style="color:#000080;">data=anscombe
allModelsResults &#60;- lapply(allModelsList,
                           function(x) lm(x, data=data))
</span></pre>
<p><span style="color:#000000;">So basically, here our computation work is done, but as in most cases a lot of work follows to prepare the data in a nice way.</span> So now let&#8217;s get all the important information into one dataframe. Let&#8217;s say we want a data frame like the following.</p>
<pre><span style="color:#000080;"><span style="color:#000000;">+-------+-----------------------------------------------------+
&#124; model &#124;   no. of   &#124; coefficients &#124; se coef. &#124; t-Val &#124; etc. &#124;
&#124;       &#124; regressors &#124;  x1 &#124; x2 ... &#124;          &#124;       &#124;      &#124;
&#124;       &#124;            &#124;              &#124;          &#124;       &#124;      &#124;
</span>
</span></pre>
<p><span style="color:#000080;"><span style="color:#000000;">So  we need to extract all the following information (coefficients, SE etc.) and cast them into one data frame.</span></span></p>
<pre><span style="color:#000080;">x &#60;- allModelsResults[[1]]
coef(x)
coef(summary(x))[, "Std. Error"]
### ... and so on
</span></pre>
<p><span style="color:#000000;">This used to be one of the nasty tasks in R. Here Hadley Wickhams <tt>plyr</tt> package really helps a lot. ldply takes a list, applies a function and casts the results into ONE data frame (see <tt>?ldply</tt>). As function return value it expects a data frame or a vector. The advantage to return data frames is that the <tt>ldply()</tt> function uses rbind.fill for combining the results when they are data frames. <tt>rbind.fill()</tt> allows a different number of columns in each data frame. Here this is the case as a different number of regressors are  used each time. So we have to make sure that the function returns a data frame. Thus we use as.data.frame paying attention to the orientation of the data frame, using <tt>t()</tt> in case it is outputted as one column.</span></p>
<pre><span style="color:#000000;"><span style="color:#000080;">library(plyr)
dfCoefNum   &#60;- ldply(allModelsResults, function(x) as.data.frame(
                     t(coef(x))))
dfStdErrors &#60;- ldply(allModelsResults, function(x) as.data.frame(
                     t(coef(summary(x))[, "Std. Error"])))
dftValues   &#60;- ldply(allModelsResults, function(x) as.data.frame(
                     t(coef(summary(x))[, "t value"])))
dfpValues   &#60;- ldply(allModelsResults, function(x) as.data.frame(
                     t(coef(summary(x))[, "Pr(&#62;&#124;t&#124;)"]))) 

<span style="color:#008000;"># rename DFs so we know what the column contains</span>
names(dfStdErrors) &#60;- paste("se", names(dfStdErrors), sep=".")
names(dftValues) &#60;- paste("t", names(dftValues), sep=".")
names(dfpValues) &#60;- paste("p", names(dfpValues), sep=".")

<span style="color:#008000;"># p-value for overall model fit</span>
calcPval &#60;- function(x){
    fstat &#60;- summary(x)$fstatistic
    pVal &#60;- pf(fstat[1], fstat[2], fstat[3], lower.tail = FALSE)
    return(pVal)
}

<span style="color:#008000;"># Before creating ONE data frame with all important entries,
# we need to compute some more indices </span>
NoOfCoef &#60;- unlist(apply(regMat, 1, sum))
R2       &#60;- unlist(lapply(allModelsResults, function(x)
                          summary(x)$r.squared))
adjR2    &#60;- unlist(lapply(allModelsResults, function(x)
                          summary(x)$adj.r.squared))
RMSE     &#60;- unlist(lapply(allModelsResults, function(x)
                          summary(x)$sigma))
fstats   &#60;- unlist(lapply(allModelsResults, calcPval))

<span style="color:#008000;"># now we can combine all the data into one data frame</span>
results &#60;- data.frame( model = as.character(allModelsList),
                       NoOfCoef = NoOfCoef,
                       dfCoefNum,
                       dfStdErrors,
                       dftValues,
                       dfpValues,
                       R2 = R2,
                       adjR2 = adjR2,
                       RMSE = RMSE,
                       pF = fstats  )
<span style="color:#008000;"># round the results</span>
results[,-c(1,2)] &#60;- round(results[,-c(1,2)], 3)
results
</span></span></pre>
<p><span style="color:#000000;">This was really a lot of code. But now we have assembled all the information and indices that were important for my task. To choose what is needed is simple now. And </span><span style="color:#000000;">we have the flexibility to add any indices. Next time I will try to extend this example doing a k-fold estimation for each set of regressors.<br />
</span></p>
<p><span style="color:#000000;">Cheers,</span></p>
<p><span style="color:#000000;">Mark Heckmann<br />
</span></p>
<p><span style="color:#000000;"><br />
</span></p>
<p><span style="color:#000000;"><br />
</span></p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[R: Combining vectors or data frames of unequal length into one data frame]]></title>
<link>http://ryouready.wordpress.com/2009/01/23/r-combining-vectors-or-data-frames-of-unequal-length-into-one-data-frame/</link>
<pubDate>Fri, 23 Jan 2009 15:58:02 +0000</pubDate>
<dc:creator>markheckmann</dc:creator>
<guid>http://ryouready.wordpress.com/2009/01/23/r-combining-vectors-or-data-frames-of-unequal-length-into-one-data-frame/</guid>
<description><![CDATA[Today I will treat a problem I encounter every once in a while. Let&#8217;s suppose we have several ]]></description>
<content:encoded><![CDATA[<p>Today I will treat a problem I encounter every once in a while. Let&#8217;s suppose we have several dataframes or vectors of unequel length but with partly matching column names,  just like the following ones:</p>
<pre><span style="color:#000080;">df1 &#60;- data.frame(Intercept = .4, x1=.4, x2=.2, x3=.7)
df2 &#60;- data.frame(Intercept</span><a href="http://www.flickr.com/photos/gritzi/466951294/" target="_blank"><img class="alignleft size-medium wp-image-203" style="border:0 none;margin:10px;" title="lego" src="http://ryouready.files.wordpress.com/2009/01/lego_.jpg?w=270&h=186" alt="lego" width="270" height="186" /></a><span style="color:#000080;"> = .5,        x2=.8       )</span></pre>
<p>This for example may occur when fitting several multiple regression models each time using different combination of regressors. Now I would like to combine the results into one data frame.  The merge() as well as the rbind() function do not help here as they require equal lengths.</p>
<p>I posted this <a href="https://stat.ethz.ch/pipermail/r-help/2008-December/183540.html" target="_blank">matter</a><a href="https://stat.ethz.ch/pipermail/r-help/2008-December/183540.html" target="_blank"> on r-help</a> as my first solution was somewhat awkward and could not be generalized to any data frames or list of data frames. The first solution was posted by Charles C. Berry. <em>myList</em> is a list containing the data frames as elements</p>
<pre><span style="color:#000080;">myList &#60;- list(df1, df2)</span></pre>
<p>What he does is to use a nested loop. The inner loop runs for each data frame over each column name. It basically takes each column name and the correponding element <em>[i, j]</em> from the data frame<em> ( myList[[i]] ) </em>and writes it into an empty data frame <em>(dat)</em>. Thereby a new column that is named just like the column from the list element data frame is created. The cells that are left out are automatically set NA.</p>
<pre><span style="color:#000080;">dat &#60;- data.frame()
for(i in seq(along=myList)) for(j in names(myList[[i]]))
                                 dat[i,j] &#60;- myList[[i]][j]
dat
</span></pre>
<p><!--more-->Note that the order of the output columns depends on the input order. The list below renders a different order, though it contains the same elements but ordered differently.</p>
<pre><span style="color:#000080;">myList &#60;- list(df2, df1)

  Intercept  x2  x1  x3
1       0.5 0.8  NA  NA
2       0.4 0.2 0.4 0.7</span></pre>
<p>Another solution was posted by Henrique Dallazuanna. This one has the advantage that it does not use loops.</p>
<pre><span style="color:#000080;">l &#60;- myList
do.call(rbind, lapply(lapply(l, unlist), "[",
        unique(unlist(c(sapply(l,names))))))</span></pre>
<p>It looks a bit scary at first, so let's examine it starting from the inside.</p>
<pre><span style="color:#008000;"># a list of names from each list element</span>
<span style="color:#000080;">c(sapply(l,names))</span>
<span style="color:#008000;">
# unlist them and find unique names</span>
<span style="color:#000080;">unique(unlist(c(sapply(l,names))))</span>
<span style="color:#008000;">
# gives unlisted vectors with column names for each list element</span>
<span style="color:#000080;">lapply(l, unlist) </span></pre>
<p>As a next step for each vector with column names all columns are selected leaving those that are not present with NA values.</p>
<pre><span style="color:#000080;">listOfVectors &#60;- lapply(lapply(l, unlist), "[",
                        unique(unlist(c(sapply(l,names)))))</span></pre>
<p>As a last step the vectors having the same columns are combined.</p>
<pre><span style="color:#000080;">do.call(rbind, listOfVectors)</span><span style="color:#008000;">
# or in full</span>
<span style="color:#000080;">DF &#60;- do.call(rbind, lapply(lapply(l, unlist), "[",
              unique(unlist(c(sapply(l,names))))))</span></pre>
<p>The only little flaw in this function is that the column names of the first vector are taken as column names of the developing data frame. Using the second list from above, gives the following.</p>
<pre><span style="color:#000080;">l &#60;- list(df2, df1) </span>
<span style="color:#000080;">     Intercept  x2 &#60;NA&#62; &#60;NA&#62;
[1,]       0.5 0.8   NA   NA
[2,]       0.4 0.2  0.4  0.7</span></pre>
<p>Thus, in a last step we need change the column names of the data frame.</p>
<pre><span style="color:#000080;">DF &#60;- as.data.frame(DF)
names(DF) &#60;- unique(unlist(c(sapply(l,names))))
DF</span></pre>
<p>Well this works but it would be much more convenient to get this done in one single function and well, since october 2008 there is one. It can be found in the <em>plyr</em> package written by Hadley Wickham. So the solution is as easy as:</p>
<pre><span style="color:#000080;">library(plyr)</span>
<span style="color:#000080;">l &#60;- myList
do.call(rbind.fill, l)</span>

<span style="color:#008000;"># another example</span>

<span style="color:#000080;">l &#60;- list(data.frame(a=1, b=2), data.frame(a=2, c=3, d=5))
do.call(rbind.fill, l)

</span></pre>
<p><span style="color:#000000;">The results:</span></p>
<pre><span style="color:#000080;">  Intercept  x1  x2  x3
1       0.4 0.4 0.2 0.7
2       0.5  NA 0.8  NA</span></pre>
<p>Now, this is nice! It is really worthwhile having a look at Hadley Wickhams <em>plyr </em>package as it provides a lot of functions that make life a lot easier when it comes to splitting list or data frames, doing a calculation or not and merge them afterwards again. More on that another day.</p>
<p>Cheers, Mark</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[R: Kurzübersicht R (Impulsvortrag) ]]></title>
<link>http://markheckmann.wordpress.com/2009/01/14/r-kurzubersicht-r-impulsvortrag/</link>
<pubDate>Wed, 14 Jan 2009 15:19:43 +0000</pubDate>
<dc:creator>markheckmann</dc:creator>
<guid>http://markheckmann.wordpress.com/2009/01/14/r-kurzubersicht-r-impulsvortrag/</guid>
<description><![CDATA[Der R Vortrag von heute ist nun online. Hier können die Präsentation und die vorgestellten Beispiele]]></description>
<content:encoded><![CDATA[<p>Der R Vortrag von heute ist nun online. Hier können die Präsentation und die vorgestellten Beispiele runtergeladen werden. Der Vortrag war Teil des Seminars &#8220;<a title="HCWs GLM Workshop" href="http://www.fire.uni-bremen.de/waldmann/courses/meth/lab3.html" target="_blank"><em>GLM Workshop</em></a>&#8220;, bei <a title="Site von PD Dr. Hans-Christian Waldmann" href="http://www.fire.uni-bremen.de/" target="_blank">PD Dr. H.-C. Waldmann</a>.  Hierbei handelt sich um einen Impulsvortrag (1,5h), bei dem es darum geht, die Möglichkeiten von <a title="Die R Projekt Heimat" href="http://www.r-project.org" target="_blank">R</a> aufzuzeigen und ein Gefühl für das Programm zu vermitteln. Es ging nicht darum, das Programmieren in R Schritt für Schritt zu zeigen.</p>
<p>Für alle, die Interesse haben, ist <a href="http://www.slideshare.net/onmywaytogod/r-bersicht-glm-workshop-2009-presentation" target="_blank">hier die Präsentation</a> zu sehen. Sie enthält einige Links um mir R anzufangen.</p>
<iframe src='http://www.slideshare.net/slideshow/embed_code/2885431' width='425' height='348'></iframe>
<p>Der Code, der im Seminar besprochen wurde findet sich <a title="Seminar Code" href="http://docs.google.com/Doc?id=dgn6bwb3_49hhd4skgw" target="_blank">hier</a>. Die OpenOffice Datei, in die der R-Code eingefügt und anschließend geweavt (<tt>odfWeave()</tt>) wurde findet sich <a title="Meine Ausarbeitung" href="http://docs.google.com/Doc?id=dgn6bwb3_151gqhgw3hs" target="_blank">hier</a>.</p>
<p>Grüße, Mark</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[R: Normalized Google distance (NGD) in R part II ]]></title>
<link>http://ryouready.wordpress.com/2009/01/12/r-normalized-google-distance-ngd-in-r-part-ii/</link>
<pubDate>Mon, 12 Jan 2009 16:01:44 +0000</pubDate>
<dc:creator>markheckmann</dc:creator>
<guid>http://ryouready.wordpress.com/2009/01/12/r-normalized-google-distance-ngd-in-r-part-ii/</guid>
<description><![CDATA[After my last posting on how to extract the google number count I was searching the web and found a ]]></description>
<content:encoded><![CDATA[<p><a href="http://blogoscoped.com/archive/2005-01-27-n48.html"><img class="alignleft size-full wp-image-93" style="margin-right:20px;margin-left:20px;" title="brain" src="http://ryouready.files.wordpress.com/2008/12/brain.gif" alt="brain" width="288" height="195" /></a>After my <a title="Retrieving information from google with RCurl package" href="http://ryouready.wordpress.com/2009/01/01/r-retrieving-information-from-google-with-rcurl-package/" target="_blank">last posting</a> on how to extract the google number count I was searching the web and found a nice <a href="http://cwl-projects.cogsci.rpi.edu/msr/" target="_blank">website</a> allowing you to calculate many semantic relatedness measures. On request it seems to be possible to get free access to their API. The API allows you to post a request via the GET or POST method which can be implemented in R using the RCurl package.</p>
<p>Anyway, I will post the code to do the <em>normalized google distance</em> (NGD) calculation using R only. As last time the code for the google count extration implemented in R was posted as a first step, here comes the second step, the calculation, using the function described <a href="http://ryouready.wordpress.com/2009/01/01/r-retrieving-information-from-google-with-rcurl-package/" target="_self">last time</a>.</p>
<p>The calculation formula might look a bit scary at a first glance:</p>
<p style="text-align:center;"><img class="aligncenter size-full wp-image-131" style="border:0 none;margin-top:25px;margin-bottom:25px;" title="google_distance_formula1" src="http://ryouready.files.wordpress.com/2009/01/google_distance_formula1.png" alt="google_distance_formula1" width="383" height="42" /></p>
<p>Looking at its step-by-step development in the article <a title="Calibrasi, Vatanyi (2007). Automatic Meaning Discovery Using Google" href="http://homepages.cwi.nl/~paulv/papers/amdug.pdf" target="_blank">Automatic Meaning Discovery Using Google</a> it gets quite easy to understand the rationale behind it. What we need to know here is that <em>M</em> is the total number of web pages searched by Google.  <em>f</em>(<em>x</em>) and <em>f</em>(<em>y</em>) are the counts for search terms <em>x</em> and <em>y</em>, respectively. <em>f</em>(<em>x</em>, <em>y</em>) is the number of web pages found on which both <em>x</em> and <em>y</em> occur (also see <a href="http://en.wikipedia.org/wiki/Semantic_relatedness#Google_distance" target="_blank">Wikipedia</a>). So the ingredients are clear. Here comes the function.</p>
<p><!--more--></p>
<pre><span style="color:#008000;">###############################################################</span>
<span style="color:#000080;"><span style="color:#008000;">#
#  description:  returns the normalized google distance as
#                numeric value
#
#  usage:        NGD(words, language, print, list, ...)
#
#  arguments:
#                words      TWO terms to measure for in
#                           vector form e.g. c("wiki","R")
#                language   in which lnguage to search.
#                           Either "en" (english) or
#                           "de" (german)
#                print      </span></span><span style="color:#000080;"><span style="color:#008000;">print alls results (NGD, counts)
#                           to console </span></span><span style="color:#000080;"><span style="color:#008000;">(no default)
</span></span><span style="color:#000080;"><span style="color:#008000;">#                </span></span><span style="color:#000080;"><span style="color:#008000;">list       </span></span><span style="color:#000080;"><span style="color:#008000;">returns list of results (no default)
#</span></span><span style="color:#000080;"><span style="color:#008000;">                           containing </span></span><span style="color:#000080;"><span style="color:#008000;">NGD and </span></span><span style="color:#000080;"><span style="color:#008000;">all counts</span></span><span style="color:#000080;"><span style="color:#008000;">.
#                ... </span></span>       <span style="color:#008000;">at the moment nothing</span>

<span style="color:#000080;">
NGD &#60;- function(words, language="de", print=FALSE,
                list=FALSE, ...){

    <span style="color:#008000;"># check for arguments</span>
    if(!hasArg(words)) stop('NGD needs TWO strings like
                       c("word","word2") as word argument!')
    if(length(words)!=2) stop('word arguments has to be of
                         length two, e.g. c("word","word2")')
</span><span style="color:#000080;">    <span style="color:#008000;">
    # M: total number of web pages searched by google (2007)</span></span>
<span style="color:#000080;">    if(hasArg(M)) M &#60;- list(...)$M else M &#60;- 8058044651    

    x &#60;- words[1]
    y &#60;- words[2]

    <span style="color:#008000;"># using getGoogleCount() function <a href="http://ryouready.wordpress.com/2009/01/01/r-retrieving-information-from-google-with-rcurl-package/" target="_self">(see here)</a></span>
    freq.x  &#60;- getGoogleCount(x, language=language)
    freq.y  &#60;- getGoogleCount(y, language=language)
    freq.xy &#60;- getGoogleCount(c(x,y), language=language)

   <span style="color:#008000;"> # apply formula</span>
    NGD = (max(log(freq.x), log(freq.y)) - log(freq.xy)) /
          (log(M) - min( log(freq.x), log(freq.y)) )

    <span style="color:#008000;"># print results to console if requested</span>
    if(print==TRUE){
        cat("\t", x,":", freq.x, "\n",
            "\t", y,":", freq.y, "\n",
            "\t", x,"+", y,":", freq.xy, "\n",
            "\t", "normalized google distance (NGD):",
                                          NGD, "\n", "\n")
    }

    </span>
<span style="color:#000080;"><span style="color:#008000;">    # return list of results if requested (no default)
</span></span><span style="color:#000080;"><span style="color:#008000;">    # containing </span></span><span style="color:#000080;"><span style="color:#008000;">NGD and </span></span><span style="color:#000080;"><span style="color:#008000;">all counts</span></span><span style="color:#000080;"><span style="color:#008000;">. As default only one
    # the NGD is returned as numeric value
    </span></span><span style="color:#000080;"><span style="color:#008000;">
</span></span><span style="color:#000080;">    results &#60;- list(NGD=NGD,
                    x=c(x, freq.x),
                    y=c(y, freq.y),
                    xy=c(paste(x,"+",y), freq.xy)) 

    if(list==TRUE) return(results) else  return(NGD)
}

</span><span style="color:#008000;">
# NOT RUN:
</span><span style="color:#000080;">
NGD(c("rider","horse"), print=T)
</span><span style="color:#000080;">NGD(c("rider","horse"), list=TRUE)</span>   <span style="color:#008000;">          # returns a list</span>
<span style="color:#000080;"><span style="color:#008000;"># may be applied to dataframes</span>
</span><span style="color:#000080;">DF &#60;- data.frame(A=c("rider","religion"), B=c("</span><span style="color:#000080;">horse</span><span style="color:#000080;">","god"))
apply(DF, 1, NGD, print=TRUE)    

</span><span style="color:#008000;">###############################################################
<span style="color:#000000;">
</span></span></pre>
<p><span style="color:#000000;">The function returns the <em>normalized google distance</em> and can be applied onto a data frame cointaining pairs of words (like DF, see above). I am not sure if the calculation renders correct results. So if someone might notice a flaw, please comment.<br />
</span></p>
<p><span style="color:#008000;"><span style="color:#000000;">Mark</span></span></p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[R: Retrieving information from google using the RCurl package]]></title>
<link>http://ryouready.wordpress.com/2009/01/01/r-retrieving-information-from-google-with-rcurl-package/</link>
<pubDate>Thu, 01 Jan 2009 20:33:09 +0000</pubDate>
<dc:creator>markheckmann</dc:creator>
<guid>http://ryouready.wordpress.com/2009/01/01/r-retrieving-information-from-google-with-rcurl-package/</guid>
<description><![CDATA[Lately I read the article Automatic Meaning Discovery Using Google by Cilibras and VitanyiIt which i]]></description>
<content:encoded><![CDATA[<p><img class="size-full wp-image-96 alignleft" style="border:0 none;margin:6px 10px;" title="semantics1" src="http://ryouready.files.wordpress.com/2008/12/semantics1.gif" alt="semantics1" width="253" height="252" />Lately I read the article <em><a title="Calibrasi, Vatanyi (2007). Automatic Meaning Discovery Using Google" href="http://homepages.cwi.nl/~paulv/papers/amdug.pdf" target="_blank">Automatic Meaning Discovery Using Google</a> </em>by Cilibras and VitanyiIt which introduces the <em>normalized google distance</em> (NGD) as a measure of <a href="http://en.wikipedia.org/wiki/Semantic_relatedness" target="_blank"><em>semantic relatedness</em></a> of two search terms. As its basis for  calculation the NGD uses simple google search result counts.</p>
<p>Now I want to figure out how to impelement this calculation using R. The first step is to retrieve the needed information from the google website. The second step is to do the calculations. For today only step one, the rest will follow.</p>
<p>I found a nice <a title="Explanations to RCurl" href="http://www.omegahat.org/RCurl/philosophy.html" target="_blank">site</a> written by <span class="firstname">Duncan</span> <span class="surname">Temple Lang </span>that explains the extraction of HTML code from any internet site using the <em>RCurl </em>package. Via the package it is possible to specify URL, user name, password and many other things. The package provides features to send requests to a site either via the browsers HTTP line or directly to a site&#8217;s forms. Also an example is given of how to submit a search request to google. This is all we need to get going.</p>
<p><!--more--></p>
<pre><span style="color:#008000;">###############################################################</span></pre>
<pre><span style="color:#000080;">library(RCurl)
</span><span style="color:#008000;">
</span><span style="color:#008000;"># now lets extract the HTML code from my blog using getURL()
# from the RCurl package

</span><span style="color:#000080;">getURL("http://www.markheckmann.de")</span>

<span style="color:#008000;"># this looks pretty unstructured. But we may have an organized
# view using htmlTreeParse() from the XML package</span>
<span style="color:#008000;"># This is just to see what we are dealing with

</span><span style="color:#000080;">library(XML)
</span><span style="color:#000080;">htmlTreeParse(</span><span style="color:#000080;">getURL("http://www.markheckmann.de")</span><span style="color:#000080;">

</span><span style="color:#008000;"># Now let's do a google request using the browsers command
# line. This can be achieved via the RCurl getForm() function,
# which con<span style="color:#008000;">structs and sends such a line. Here we can choose
# hl=language, q</span></span><span style="color:#000080;"><span style="color:#008000;">= search terms and several other parameters.
# Let's search for the term "r-project".
</span>
site &#60;- getForm("http://www.google.com/search", hl="en",
                lr="", q="r-project", btnG="Search")
htmlTreeParse(</span><span style="color:#000080;">site</span><span style="color:#000080;">)
</span>
<span style="color:#008000;"># Now we have the Google result HTML code and have to
# extract the relevant information from it.

</span><span style="color:#000080;">typeof(site)</span>
<span style="color:#008000;">
# As we see, site contains plain character HTML code, so
# I can use use simple text manipulation functions here.</span>

<span style="color:#008000;"># What part of the code do I have to extract now? Somewhere
# in the HTML code there is a line like this:
#   </span>              <span style="color:#000080;"> &#60;b&#62; some numerics &#60;/b&#62;</span><span style="color:#008000;">
</span><span style="color:#008000;"># So the number is in bewteen the &#60;b&#62; &#60;/b&#62; argument. How
# can we get this?</span>

<span style="color:#000080;">text &#60;- "We are looking for something like &#60;b&#62;12.345&#60;/b&#62;
         or similar"
gregexpr("&#60;b&#62;12.345&#60;/b&#62;", text, fixed = TRUE)</span>

<span style="color:#008000;"># gregexpr will return the position of the text we are searching
# for. Now we need to generalize this to all numbers. I am
# still not too familiar with regular expressions. Chapter
# seven in <a href="http://www.springer.com/statistics/computational/book/978-0-387-74730-9" target="_blank">Spector, P. (2008). Data Manipulation with R (UseR)</a>
# contains a good explanation of these.</span>

<span style="color:#000080;">gregexpr('&#60;b&#62;[0-9.,]{1,20}&#60;/b&#62;', text)</span>

<span style="color:#008000;"># This does the job<span style="color:#008000;">!</span></span><span style="color:#008000;"> The problem now is that there are a
# number of brackets like the one above containing numbers.
# So we need a to find the exact parts which to extract. </span><span style="color:#008000;">
# In an English google search there is the words "of about"
# followed by the search count. In German it is preceeded by
# the word "ungefähr". I will use these as indicator words to
# spot the position from where to extract.

<span style="color:#000080;">indicatorWord &#60;- "of about"</span>

# start extraction after indicator word
<span style="color:#000080;">posExtractStart &#60;- gregexpr(</span></span><span style="color:#008000;"><span style="color:#000080;">indicatorWord</span></span><span style="color:#008000;"><span style="color:#000080;">, siteHTML,
                            fixed = TRUE)[[1]]</span>

# extract string of 30 chracters </span><span style="color:#008000;">length which should be enough
# to get the numbers
</span><span style="color:#008000;"><span style="color:#000080;">stringExtract &#60;- substring(siteHTML, first=posExtractStart,
                           last = posExtractStart + 30)
</span>
# search for &#60;b&#62;number&#60;/b&#62; (see above)
<span style="color:#000080;">posResults &#60;- gregexpr('&#60;b&#62;[0-9.,]{1,20}&#60;/b&#62;', stringExtract)</span>
<span style="color:#000080;">posFirst &#60;- posResults[[1]][1]
textLength  &#60;- attributes(posResults[[1]])$match.length
stringExtract &#60;- substring(stringExtract, first=posFirst,
                           last = posFirst + textLength)
</span>
</span><span style="color:#008000;"># actually the last four lines are usually not necessary. Just
# in case the search term itself is numeric we would run the
# risk of unwillingly extracting some abundant numerics
# distorting the count results.

</span><span style="color:#008000;"># erase everything but the numbers
</span><span style="color:#008000;"><span style="color:#000080;">stringExtract &#60;- </span></span><span style="color:#008000;"><span style="color:#000080;">gsub("[^0-9]", "", stringExtract)</span>

</span><span style="color:#000080;">print(stringExtract)</span><span style="color:#008000;">

# now we can use this for the calculation of the normalized
# google distance

</span><span style="color:#008000;">###############################################################
</span></pre>
<p>The above implementation surely is not technically mature (e.g. the extraction code). Especially as I suppose this could be done much easier using Google APIs. Comments are welcome!</p>
<p>As a last step let&#8217;s wrap the above way to extract the google search results count into a function.</p>
<pre><span style="color:#008000;">###############################################################</span>
<span style="color:#000080;"><span style="color:#008000;">#
#  description:   returns the google results count
#  usage:         getGoogleCount(searchTerms, language, ...)
#  arguments:
#                 searchterms   The terms searched for in
#                               vector form e.g. c("wikipedia")
#                               or  c("wikipedia","R")
#                 language      in which lnguage to search.
#                               Either "en" (english) or
#                               "de" (german)          </span>
</span><span style="color:#008000;">
</span><span style="color:#000080;">getGoogleCount &#60;- function(searchTerms=NULL,
                           language="de",
                           ...){

    <span style="color:#008000;"># check for arguments</span>
    if(is.null(searchTerms)) stop("Please enter search terms!")
    if(!any(language==c("de","en"))) stop("Please enter correct
                                           language (de, en)!")

   <span style="color:#008000;"> # construct google like expression
</span></span><span style="color:#000080;">    require(RCurl)</span><span style="color:#000080;"><span style="color:#008000;">
    # Collapse search terms.
</span>    entry &#60;- paste(searchTerms, collapse="+")
    siteHTML &#60;- getForm("http://www.google.com/search",
                        hl=language, lr="", q=entry,
                        btnG="Search")
    </span>
<span style="color:#000080;"><span style="color:#008000;">    # select language sepcific indicator word
</span></span><span style="color:#000080;">    if(language=="de") indicatorWord &#60;- "ungefähr" else
                       indicatorWord &#60;- "of about"      <span style="color:#008000;">  </span>

    <span style="color:#008000;"># start extraction at indicator word</span> <span style="color:#008000;">position</span>
    posExtractStart &#60;- gregexpr(indicatorWord, siteHTML,
                                fixed = TRUE)[[1]]
    <span style="color:#008000;"># extract string of 30 chracters length</span>
    stringExtract &#60;- substring(siteHTML, first=posExtractStart,
                               last = posExtractStart + 30)
    <span style="color:#008000;"># search for &#60;b&#62;number&#60;/b&#62; (can be left out, see above)</span>
    posResults &#60;- gregexpr('&#60;b&#62;[0-9.,]{1,20}&#60;/b&#62;', stringExtract)
    posFirst &#60;- posResults[[1]][1]
    textLength  &#60;- attributes(posResults[[1]])$match.length
    stringExtract &#60;- substring(stringExtract, first=posFirst,
                               last = posFirst + textLength)
    <span style="color:#008000;"># erase everything but the numbers</span>
    matchCount &#60;- as.numeric(gsub("[^0-9]", "", stringExtract))

    return(matchCount)
}

</span><span style="color:#008000;"># NOR RUN</span>
<span style="color:#000080;">
getGoogleCount(c("r-project"), language="en")
getGoogleCount(c("r-project", "europe"), language="en")

</span><span style="color:#008000;">###############################################################</span></pre>
<p>Next time I will use this function to calculate the <em>normalized google distance</em>. Comments are welcome!</p>
<p>Happy New Year!</p>
<p>Mark</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[R: jackknife the coefficients of a linear regression model]]></title>
<link>http://ryouready.wordpress.com/2008/12/19/r-jackknife-the-coefficients-of-a-linear-regression-model/</link>
<pubDate>Fri, 19 Dec 2008 14:06:58 +0000</pubDate>
<dc:creator>markheckmann</dc:creator>
<guid>http://ryouready.wordpress.com/2008/12/19/r-jackknife-the-coefficients-of-a-linear-regression-model/</guid>
<description><![CDATA[For one of my statistics classes I had to do a jackknife (leave-on-out) estimation of a the paramete]]></description>
<content:encoded><![CDATA[<p><a title="capcase. &#34;Rough Rider Barlow &#34;. Online image. Flickr. 1 January 2009." href="http://www.flickr.com/photos/capcase/2861356574/"><img class="alignleft size-full wp-image-112" style="margin:5px 10px;" src="http://ryouready.files.wordpress.com/2009/01/jackknife.jpg" alt="jackknife" width="288" height="216" /></a>For one of my statistics classes I had to do a jackknife (leave-on-out) estimation of a the parameters of simple linear regression model.</p>
<p>The difficulty with the jackknife method from the <em>bootstrap package </em>is that by default it returns a scalar only. Thus some programming is needed to return all the coefficients. Thanks to Simon Knapp for helping me out here (see r-help, dec. 2008).</p>
<pre><span style="color:#008000;">###############################################################</span>
<span style="color:#000080;">
library(bootstrap)
</span><span style="color:#008000;">
</span><span style="color:#008000;"># to do a leave-on-out jackknife estimate for the mean of the
# data ?jackknife gives an example</span>
<span style="color:#000080;"><span style="color:#008000;"># Having a look at the jackknife function we see that it demands
# two parameters: x and theta. x is supposed to contain the data
# and theta the function that is applied to the data

</span></span><span style="color:#000080;">x &#60;- rnorm(20)
theta &#60;- function(x){mean(x)}
results &#60;- jackknife(x,theta)

</span><span style="color:#008000;">###############################################################</span></pre>
<p><span style="color:#000000;"> Now I want to program the estimation for one coefficient of a linear regression model.<br />
</span></p>
<pre><span style="color:#008000;">###############################################################</span><span style="color:#000080;">

DF &#60;- as.data.frame(matrix(rnorm(250), ncol=5)) <span style="color:#008000;"> </span></span><span style="color:#008000;">  # my data
<span style="color:#000080;">model.lm &#60;- formula(V1 ~ V2 + V3 + V4) </span>            # my model
</span>
<span style="color:#000080;"><span style="color:#008000;"># Now I need to specify the theta function. Here x is not the
# data itself but is used as the row index vector to select
# a subset from the data frame (xdata). Also the coefficient
# to be returned is specified. 

</span></span><span style="color:#000080;">theta &#60;- function(x, xdata, coefficient){
              coef(lm(model.lm, data=xdata[x,]))[coefficient] }<span style="color:#008000;">

# So now at each leave-on-out run the model is calculated with
# a subset defined by the vector x (here one is left out) and one
# coefficient is returned:
</span>
results &#60;- jackknife(1:50, theta, xdata=DF,
                     coefficient="(Intercept)")

</span><span style="color:#008000;">###############################################################</span><span style="color:#000080;">
</span></pre>
<p><span style="color:#000000;">To  expand this code onto the estimation of all the regression coefficients is only a small step now. As the theta function is supposed to return a scalar and not a  list of estimates for each coefficient, the following workaround is used: The sapply function calls the jackknife function four times prompting a different parameter estimate at each run. The prompted coeffient is passed on to the jackknife function by the three point </span><span style="color:#000000;">(&#8230;) </span><span style="color:#000000;">argument .</span></p>
<p><span style="color:#000000;"><!--more--><br />
</span></p>
<pre><span style="color:#008000;">
###############################################################

# The following function calculates all then coefficients

<span style="color:#000080;">jackknife.apply &#60;- function(x, xdata, coefs)
{
sapply(coefs,
<span style="color:#000080;">       function(coefficient) jackknife(x, theta, xdata=xdata,
                                       coefficient=coefficient),
        simplify=F)
</span></span><span style="color:#000080;">}</span>

# no<span style="color:#008000;">w </span></span><span style="color:#008000;">jackknife.apply() </span><span style="color:#008000;"><span style="color:#008000;">can</span> be called

<span style="color:#000080;">results &#60;- jackknife.apply(1:50, DF, c("(Intercept)", "V2", 
                                       "V3", "V3"))</span>

</span><span style="color:#008000;">###############################################################</span><span style="color:#000080;"> </span><span style="color:#000080;">
</span></pre>
<p>The output is a list containing four elements with attributes as aspecified in the jackknife function. So the last step is to bring the output into a nice format, let&#8217;s say like this:</p>
<pre>+--------------------------------------------------------------+
&#124;               Intercept      V1          V2         V3       &#124;
&#124;    1             0.34        2.2        .03         1.1      &#124;
&#124;    2             0.29        1.9        .11         1.2      &#124;
&#124;    ...           ...        ...         ...         ...      &#124;</pre>
<p><span style="color:#000000;">I will use Hadley Wickhams <em>plyr </em>package functions for that &#8211; quite a new package (oct. 2008) which is really worthwhile having a look at! I am just starting&#8230;  The following solution sure is a kludge but does the job.<br />
</span></p>
<pre><span style="color:#008000;">
###############################################################

# ldply() takes a list, applies a function and puts the results
# into a data frame (unfortunately by cbind(), thus the
# transposition). Here the function only selects the
# $jack.values from each list element
<span style="color:#000080;">
library(plyr)

jack.values &#60;- t(ldply(results, function(x) x$jack.values))
dimnames(jack.values)[[2]]  &#60;- names(results)
dimnames(jack.values)[[1]]  &#60;- 1:50</span>
</span>
<span style="color:#008000;">###############################################################</span>
<span style="color:#000080;"> </span></pre>
<p>Now let&#8217;s do the same for the other $jack.se and $jack.bias.</p>
<pre><span style="color:#008000;">###############################################################</span><span style="color:#008000;"><span style="color:#000080;">

jack.par &#60;- t(ldply(results, function(x) cbind(x$jack.se, x$jack.bias)))
dimnames(jack.</span></span><span style="color:#008000;"><span style="color:#000080;">par</span></span><span style="color:#008000;"><span style="color:#000080;">)[[2]]  &#60;- names(results)
dimnames(jack.</span></span><span style="color:#008000;"><span style="color:#000080;">par</span></span><span style="color:#008000;"><span style="color:#000080;">)[[1]]  &#60;- c("jack.se", "jack.bias")</span>
</span>
<span style="color:#008000;">###############################################################</span></pre>
<p>Although the function works perfectly its obvious disatvantage is its inefficiency. Normally only <em>n</em> models would be calculated leaving on data element out each time. Now the number of calculated models multiplies with number of estimated model coefficients, so its <em>n</em> times <em>estimated regression parameters</em>. This surely boosts calculation time! If anyone knows a solution here, please feel free to comment!</p>
<p>mh</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[R: Generate random string name]]></title>
<link>http://ryouready.wordpress.com/2008/12/18/generate-random-string-name/</link>
<pubDate>Thu, 18 Dec 2008 17:24:21 +0000</pubDate>
<dc:creator>markheckmann</dc:creator>
<guid>http://ryouready.wordpress.com/2008/12/18/generate-random-string-name/</guid>
<description><![CDATA[I want to produce a lot of .png files and need to name them. One possibility would be to let R autom]]></description>
<content:encoded><![CDATA[<p>I want to produce a lot of .png files and need to name them. One possibility would be to let R automatically name them e.g. when used in  a for loop.</p>
<p>In my case I need the name for each graphic as an object as I want to address it via a link. Thus I decided to write a small function that produces a random string made up of small and capital letters as well as numbers. The default setting is to generate  one string with 12 characters length.</p>
<pre><span style="color:#008000;">###############################################################
#
# MHmakeRandomString(n, length)
# function generates a random string random string of the
# length (length), made up of numbers, small and capital letters</span>

<span style="color:#000080;">MHmakeRandomString &#60;- function(n=1, lenght=12)
{
    randomString &#60;- c(1:n)                  <span style="color:#008000;"># initialize vector</span>
    for (i in 1:n)
    {
        randomString[i] &#60;- paste(sample(c(0:9, letters, LETTERS),
                                 lenght, replace=TRUE),
                                 collapse="")
    }
    return(randomString)
}

<span style="color:#008000;"># </span> <span style="color:#008000;">&#62; MHmakeRandomString()
#  [1] "XM2xjggXX19r"
</span>
</span><span style="color:#008000;">###############################################################</span></pre>
<p>Now I can use the function to generate a filename when opening a new png device.</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[R: Changing Trellis graphics settings - remove margins]]></title>
<link>http://ryouready.wordpress.com/2008/12/18/changing-trellis-graphics-settings/</link>
<pubDate>Thu, 18 Dec 2008 17:07:23 +0000</pubDate>
<dc:creator>markheckmann</dc:creator>
<guid>http://ryouready.wordpress.com/2008/12/18/changing-trellis-graphics-settings/</guid>
<description><![CDATA[My aim was to produce a tiny histogram as a png output. I want to use these tiny histograms to put t]]></description>
<content:encoded><![CDATA[<p>My aim was to produce a tiny histogram as a png output. I want to use these tiny histograms to put them into a table, so the distribution of a variable can be assessed at an instance and be compared to others right away.</p>
<p>To produce such a graphic in the <em>base </em>graphic system is quite easy. Note that I use the function MHmakeRandomString (described in former posting) to generate a random name for the picture.</p>
<pre><span style="color:#008000;">###############################################################</span>
<span style="color:#008000;">
# basic information about output directory</span>
<span style="color:#333399;"><span style="color:#000080;">OutputDir<span style="color:#000080;">Relative &#60;- "output"
<span style="color:#008000;">
# combine the working directory with output path</span>
OutputDir &#60;- paste(getwd(), OutputDirRelative, sep="/")</span></span><span style="color:#000080;">
<span style="color:#008000;">
# function that generates random string (12 characters)    </span>
randomName &#60;- MHmakeRandomString()
filename &#60;- paste(OutputDir, "/", randomName, ".png", sep="") 

<span style="color:#008000;"># open png device</span>
png(filename = filename, res= 72, width= 35, height= 18)
     par(mar=c(0,0,0,0), oma=c(0,0,0,0))
     hist(data , main="", col=col)
     box()

dev.off()

</span></span><span style="color:#008000;">###############################################################</span></pre>
<div id="attachment_27" class="wp-caption aligncenter" style="width: 45px"><img class="size-full wp-image-27" title="mhmaketinyhistogram_base" src="http://ryouready.files.wordpress.com/2008/12/mhmaketinyhistogram_base.png" alt="base" width="35" height="18" /><p class="wp-caption-text">base</p></div>
<p style="text-align:left;">This produces a histogram with no annotation or margins at all. Now I wanted the same but implemented in the <em>grid </em>graphics system. It caused me some trouble to remove the margins until some experts from the <a href="http://www.r-project.org/mail.html" target="_blank">r-help list</a> helped me out.</p>
<pre><span style="color:#008000;">
###############################################################

</span>
<span style="color:#333399;"><span style="color:#000080;">randomName &#60;- MHmakeRandomString()
filename &#60;- paste(OutputDir, "/", randomName, ".png", sep="") 

<span style="color:#008000;"># open png device</span>
png(filename = filename, res= 72, width= 35, height= 18)</span></span><span style="color:#000080;">x &#60;- rnorm(100)
    limits &#60;- prepanel.default.histogram(x, breaks = NULL) ##
</span><span style="color:#000080;"><span style="color:#008000;">    # to start a new page </span>
</span><span style="color:#000080;">    grid.newpage()
    pushViewport(viewport(xscale = extendrange(limits$xlim),
                          yscale = extendrange(limits$ylim)))
                          panel.histogram(x, breaks = NULL)
dev.off()

</span><span style="color:#008000;">###############################################################
</span></pre>
<div id="attachment_28" class="wp-caption aligncenter" style="width: 45px"><img class="size-full wp-image-28" title="mhmaketinyhistogram_grid" src="http://ryouready.files.wordpress.com/2008/12/mhmaketinyhistogram_grid.png" alt="grid" width="35" height="18" /><p class="wp-caption-text">grid</p></div>
<p style="text-align:left;">A tiny histogram produces by grid/trellis graphics. Nice!</p>
<pre><span style="color:#008000;">
</span></pre>
]]></content:encoded>
</item>
<item>
<title><![CDATA[R: Start meines "R" you ready? learning blogs]]></title>
<link>http://markheckmann.wordpress.com/2008/12/18/r-start-meines-r-you-ready-learning-blogs/</link>
<pubDate>Thu, 18 Dec 2008 09:57:23 +0000</pubDate>
<dc:creator>markheckmann</dc:creator>
<guid>http://markheckmann.wordpress.com/2008/12/18/r-start-meines-r-you-ready-learning-blogs/</guid>
<description><![CDATA[Hallo. Da ich seit etwa einem Jahr mir R zu tun habe, habe ich mich nun entschlossen, meine neuen Er]]></description>
<content:encoded><![CDATA[<p><a href="http://www.r-project.org"><img class="size-full wp-image-138 alignleft" style="margin:8px 16px;" title="R Projekt" src="http://markheckmann.files.wordpress.com/2009/01/rlogo.jpg" alt="R Projekt" width="100" height="76" /></a></p>
<p>Hallo. Da ich seit etwa einem Jahr mir <em><a title="Heimat des R Projektes" href="www.r-project.org" target="_blank">R</a></em> zu tun habe, habe ich mich nun entschlossen, meine neuen Erkenntnisse in <em>R</em> hin und wieder niederzuschreiben, um sie nicht wieder sofort zu vergessen. Hierzu zählen sowohl einfache Erkenntnisse als auch komplexere Probleme. Dazu nutze ich einen Blog mit dem Namen<em> <a title="&#34;R&#34; you ready learning Blog" href="http://ryouready.wordpress.com" target="_blank">&#8220;R&#8221; you ready?</a></em> Die neuesten Beiträge auf diesem Blog sind auch hier auf dieser Seite rechts in dem RSS Feed zu finden.</p>
<p>Ich probiere den R-Code auf <a title="&#34;R&#34; you ready learning Blog" href="http://ryouready.wordpress.com" target="_blank"><em>&#8220;R&#8221; you ready</em></a> immer gut zu kommentieren und auch direkt ausführbar zu halten, d. h. so, dass keine Code Stücke fehlen. Über Kommentare und Verbesserungen zum R-Code freue ich mich, schließlich ist es ein Lernblog :) und es gibt immer wieder Sachen, die man zu Beginn viel zu umständlich oder gar fehlerhaft programmiert. Viel Spaß damit!</p>
<p>Mark</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[R: Good practice - adding footnotes to graphics]]></title>
<link>http://markheckmann.wordpress.com/2008/12/03/r-good-practice-adding-footnotes-to-graphics/</link>
<pubDate>Wed, 03 Dec 2008 17:33:10 +0000</pubDate>
<dc:creator>markheckmann</dc:creator>
<guid>http://markheckmann.wordpress.com/2008/12/03/r-good-practice-adding-footnotes-to-graphics/</guid>
<description><![CDATA[In some statistical programs there is the option available to attach a footnote to the graphical out]]></description>
<content:encoded><![CDATA[<p>In some statistical programs there is the option available to attach a footnote to the graphical output that is created. This footnote may contain the name of the script or file that produced the graphic, the author&#8217;s name and the date of creation. In SAS for example there is a <em>footnote </em>command to achieve this. Ever since I realized that this makes life a lot easier, I wrote a simple three-lines function in R which I use at the end of the construction of any graphic. I suppose, that this is what my professors meant with &#8220;good practice&#8221;.The nice thing about implementing this in the <em>grid </em>graphics system is that you can produce multiple graphics [e.g. by par(mfrow=c(2, 2)) ] and still the footnote will be positioned correctly.</p>
<pre><span style="color:#008000;">###############################################################
##                                                           ##
##      R: Good practice - adding footnotes to graphics      ##
##                                                           ##
###############################################################</span>
<span style="color:#008000;"># basic information at the beginning of each script</span>
<span style="color:#000080;">scriptName &#60;- "filename.R"
author &#60;- "mh"
footnote &#60;- paste(scriptName, format(Sys.time(), "%d %b %Y"),
                  author, sep=" / ")

<span style="color:#008000;"># default footnote is today's date, cex=.7 (size) and color
# is a kind of grey</span>

makeFootnote &#60;- function(footnoteText=
                         format(Sys.time(), "%d %b %Y"),
                         size= .7, color= grey(.5))
{
   require(grid)
   pushViewport(viewport())
   grid.text(label= footnoteText ,
             x = unit(1,"npc") - unit(2, "mm"),
             y= unit(2, "mm"),
             just=c("right", "bottom"),
             gp=gpar(cex= size, col=color))
   popViewport()
}

makeFootnote(footnote)</span>

<span style="color:#008000;">###############################################################</span>

<div id="attachment_85" class="wp-caption alignnone" style="width: 310px"><a href="http://markheckmann.files.wordpress.com/2008/12/r_posting_2008_12_03_good_practice_footnotes_in_graphs.png?w=300"><img class="size-medium wp-image-85" title="r_posting_2008_12_03_good_practice_footnotes_in_graphs" src="http://markheckmann.files.wordpress.com/2008/12/r_posting_2008_12_03_good_practice_footnotes_in_graphs.png?w=300&h=300" alt="Correlation matrix with footnote" width="300" height="300" /></a><p class="wp-caption-text">Correlation matrix with footnote</p></div>

ciao,
Mark</pre>
]]></content:encoded>
</item>

</channel>
</rss>

