<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress.com" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>density-estimation &amp;laquo; WordPress.com Tag Feed</title>
	<link>http://en.wordpress.com/tag/density-estimation/</link>
	<description>Feed of posts on WordPress.com tagged "density-estimation"</description>
	<pubDate>Mon, 20 May 2013 13:20:40 +0000</pubDate>

	<generator>http://en.wordpress.com/tags/</generator>
	<language>en</language>

<item>
<title><![CDATA[MCMC: The Gibbs Sampler]]></title>
<link>http://theclevermachine.wordpress.com/2012/11/05/mcmc-the-gibbs-sampler/</link>
<pubDate>Mon, 05 Nov 2012 19:38:44 +0000</pubDate>
<dc:creator>dustinstansbury</dc:creator>
<guid>http://theclevermachine.wordpress.com/2012/11/05/mcmc-the-gibbs-sampler/</guid>
<description><![CDATA[In the previous post, we compared using block-wise and component-wise implementations of the Metropo]]></description>
<content:encoded><![CDATA[<p>In the previous <a title="Multivariate MCMC" href="http://theclevermachine.wordpress.com/2012/11/04/mcmc-multivariate-distributions-block-wise-component-wise-updates/" target="_blank">post,</a> we compared using block-wise and component-wise implementations of the Metropolis-Hastings algorithm for sampling from a multivariate probability distribution<img src='http://s0.wp.com/latex.php?latex=p%28%5Cbold+x%29%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(&#92;bold x)&amp;s=-1' title='p(&#92;bold x)&amp;s=-1' class='latex' />. Component-wise updates for MCMC algorithms are generally more efficient for multivariate problems than blockwise updates in that we are more likely to accept a proposed sample by drawing each component/dimension independently of the others. However, samples may still be rejected, leading to excess computation that is never used. The Gibbs sampler, another popular MCMC sampling technique, provides a means of avoiding such wasted computation. Like the component-wise implementation of the Metropolis-Hastings algorithm, the Gibbs sampler also uses component-wise updates. However, unlike in the Metropolis-Hastings algorithm, all proposed samples are accepted, so there is no wasted computation.</p>
<p>The Gibbs sampler is applicable for certain classes of problems, based on two main criterion. Given a target distribution <img src='http://s0.wp.com/latex.php?latex=p%28%5Cbold+x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(&#92;bold x)' title='p(&#92;bold x)' class='latex' />, where <img src='http://s0.wp.com/latex.php?latex=%5Cbold+x+%3D+%28x_1%2C+x_2%2C+%5Cdots%2C+x_D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;bold x = (x_1, x_2, &#92;dots, x_D' title='&#92;bold x = (x_1, x_2, &#92;dots, x_D' class='latex' />, ),  The first criterion is 1) that it is necessary that we have an analytic (mathematical) expression for the conditional distribution of each variable in the joint distribution given all other variables in the joint. Formally, if the target distribution <img src='http://s0.wp.com/latex.php?latex=p%28%5Cbold+x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(&#92;bold x)' title='p(&#92;bold x)' class='latex' /> is <img src='http://s0.wp.com/latex.php?latex=D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='D' title='D' class='latex' />-dimensional, we must have <img src='http://s0.wp.com/latex.php?latex=D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='D' title='D' class='latex' /> individual expressions for</p>
<p><img src='http://s0.wp.com/latex.php?latex=p%28x_i%26%23124%3Bx_1%2Cx_2%2C%5Cdots%2Cx_%7Bi-1%7D%2Cx_%7Bi%2B1%7D%2C%5Cdots%2Cx_D%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(x_i&#124;x_1,x_2,&#92;dots,x_{i-1},x_{i+1},&#92;dots,x_D)' title='p(x_i&#124;x_1,x_2,&#92;dots,x_{i-1},x_{i+1},&#92;dots,x_D)' class='latex' /></p>
<p><img src='http://s0.wp.com/latex.php?latex=%3D+p%28x_i+%26%23124%3B+x_j%29%2C+j%5Cneq+i+%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='= p(x_i &#124; x_j), j&#92;neq i &amp;s=-1' title='= p(x_i &#124; x_j), j&#92;neq i &amp;s=-1' class='latex' />.</p>
<p>Each of these expressions defines the probability of the <img src='http://s0.wp.com/latex.php?latex=i%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='i&amp;s=-1' title='i&amp;s=-1' class='latex' />-th dimension given that we have values for all other (<img src='http://s0.wp.com/latex.php?latex=j+%5Cneq+i%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='j &#92;neq i&amp;s=-1' title='j &#92;neq i&amp;s=-1' class='latex' />) dimensions. Having the conditional distribution for each variable means that we don&#8217;t need a proposal distribution or an accept/reject criterion, like in the Metropolis-Hastings algorithm. Therefore, we can simply sample from each conditional while keeping all other variables held fixed. This leads to the second criterion 2) that we must be able to sample from each conditional distribution. This caveat is obvious if we want an implementable algorithm.</p>
<p>The Gibbs sampler works in much the same way as the component-wise Metropolis-Hastings algorithms except that instead drawing from a proposal distribution for each dimension, then accepting or rejecting the proposed sample, we simply draw a value for that dimension according to the variable&#8217;s corresponding conditional distribution. We also accept all values that are drawn. Similar to the component-wise Metropolis-Hastings algorithm, we step through each variable sequentially, sampling it while keeping all other variables fixed. The Gibbs sampling procedure is outlined below</p>
<ol>
<li>set <img src='http://s0.wp.com/latex.php?latex=t+%3D+0&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='t = 0' title='t = 0' class='latex' /></li>
<li>generate an initial state <img src='http://s0.wp.com/latex.php?latex=%5Cbold+x%5E%7B%280%29%7D+%5Csim+%5Cpi%5E%7B%280%29%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;bold x^{(0)} &#92;sim &#92;pi^{(0)}' title='&#92;bold x^{(0)} &#92;sim &#92;pi^{(0)}' class='latex' /></li>
<li>repeat until <img src='http://s0.wp.com/latex.php?latex=t+%3D+M&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='t = M' title='t = M' class='latex' /></li>
</ol>
<p style="padding-left:60px;">set <img src='http://s0.wp.com/latex.php?latex=t+%3D+t%2B1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='t = t+1' title='t = t+1' class='latex' /></p>
<p style="padding-left:60px;">for each dimension <img src='http://s0.wp.com/latex.php?latex=i+%3D+1..D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='i = 1..D' title='i = 1..D' class='latex' /></p>
<p style="padding-left:60px;">draw <img src='http://s0.wp.com/latex.php?latex=x_i&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_i' title='x_i' class='latex' /> from <img src='http://s0.wp.com/latex.php?latex=p%28x_i%26%23124%3Bx_1%2Cx_2%2C%5Cdots%2Cx_%7Bi-1%7D%2Cx_%7Bi%2B1%7D%2C%5Cdots%2Cx_D%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(x_i&#124;x_1,x_2,&#92;dots,x_{i-1},x_{i+1},&#92;dots,x_D)' title='p(x_i&#124;x_1,x_2,&#92;dots,x_{i-1},x_{i+1},&#92;dots,x_D)' class='latex' /></p>
<p>To get a better understanding of the Gibbs sampler at work, let&#8217;s implement the Gibbs sampler to solve the same multivariate sampling problem addressed in the previous post.</p>
<h3>Example: Sampling from a bivariate a Normal distribution</h3>
<p>This example parallels the examples in the previous post where we sampled from a 2-D Normal distribution using block-wise and component-wise Metropolis-Hastings algorithms. Here, we show how to implement a Gibbs sampler to draw samples from the same target distribution. As a reminder, the target distribution <img src='http://s0.wp.com/latex.php?latex=p%28%5Cbold+x%29%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(&#92;bold x)&amp;s=-1' title='p(&#92;bold x)&amp;s=-1' class='latex' /> is a Normal form with following parameterization:</p>
<p><img src='http://s0.wp.com/latex.php?latex=p%28%5Cbold+x%29+%3D+%5Cmathcal+N+%28%5Cbold%7B%5Cmu%7D%2C+%5Cbold+%5CSigma%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(&#92;bold x) = &#92;mathcal N (&#92;bold{&#92;mu}, &#92;bold &#92;Sigma)' title='p(&#92;bold x) = &#92;mathcal N (&#92;bold{&#92;mu}, &#92;bold &#92;Sigma)' class='latex' /></p>
<p>with mean</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cmu+%3D+%5B%5Cmu_1%2C%5Cmu_2%5D%3D+%5B0%2C+0%5D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;mu = [&#92;mu_1,&#92;mu_2]= [0, 0]' title='&#92;mu = [&#92;mu_1,&#92;mu_2]= [0, 0]' class='latex' /></p>
<p>and covariance</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cbold+%5CSigma+%3D+%5Cbegin%7Bbmatrix%7D+1+%26%2338%3B+%5Crho_%7B12%7D+%5C%5C+%5Crho_%7B21%7D+%26%2338%3B+1%5Cend%7Bbmatrix%7D+%3D+%5Cbegin%7Bbmatrix%7D+1+%26%2338%3B+0.8+%5C%5C+0.8+%26%2338%3B+1%5Cend%7Bbmatrix%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;bold &#92;Sigma = &#92;begin{bmatrix} 1 &amp; &#92;rho_{12} &#92;&#92; &#92;rho_{21} &amp; 1&#92;end{bmatrix} = &#92;begin{bmatrix} 1 &amp; 0.8 &#92;&#92; 0.8 &amp; 1&#92;end{bmatrix}' title='&#92;bold &#92;Sigma = &#92;begin{bmatrix} 1 &amp; &#92;rho_{12} &#92;&#92; &#92;rho_{21} &amp; 1&#92;end{bmatrix} = &#92;begin{bmatrix} 1 &amp; 0.8 &#92;&#92; 0.8 &amp; 1&#92;end{bmatrix}' class='latex' /></p>
<p>In order to sample from this distribution using a Gibbs sampler, we need to have in hand the conditional distributions for variables/dimensions <img src='http://s0.wp.com/latex.php?latex=x_1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_1' title='x_1' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=x_2&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_2' title='x_2' class='latex' />:</p>
<p><img src='http://s0.wp.com/latex.php?latex=p%28x_1+%26%23124%3B+x_2%5E%7B%28t-1%29%7D%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(x_1 &#124; x_2^{(t-1)})' title='p(x_1 &#124; x_2^{(t-1)})' class='latex' /> (i.e. the conditional for the first dimension, <img src='http://s0.wp.com/latex.php?latex=x_1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_1' title='x_1' class='latex' />)</p>
<p>and</p>
<p><img src='http://s0.wp.com/latex.php?latex=p%28x_2+%26%23124%3B+x_1%5E%7B%28t%29%7D%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(x_2 &#124; x_1^{(t)})' title='p(x_2 &#124; x_1^{(t)})' class='latex' /> (the conditional for the second dimension, <img src='http://s0.wp.com/latex.php?latex=x_2&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_2' title='x_2' class='latex' />)</p>
<p>Where <img src='http://s0.wp.com/latex.php?latex=x_2%5E%7B%28t-1%29%7D%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_2^{(t-1)}&amp;s=-1' title='x_2^{(t-1)}&amp;s=-1' class='latex' /> is the previous state of the second dimension, and <img src='http://s0.wp.com/latex.php?latex=x_1%5E%7B%28t%29%7D%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_1^{(t)}&amp;s=-1' title='x_1^{(t)}&amp;s=-1' class='latex' /> is the state of the first dimension after drawing from <img src='http://s0.wp.com/latex.php?latex=p%28x_1+%26%23124%3B+x_2%5E%7B%28t-1%29%7D%29%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(x_1 &#124; x_2^{(t-1)})&amp;s=-1' title='p(x_1 &#124; x_2^{(t-1)})&amp;s=-1' class='latex' />. The reason for the discrepancy between updating <img src='http://s0.wp.com/latex.php?latex=x_1%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_1&amp;s=-1' title='x_1&amp;s=-1' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=x_2%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_2&amp;s=-1' title='x_2&amp;s=-1' class='latex' /> using states <img src='http://s0.wp.com/latex.php?latex=%28t-1%29%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='(t-1)&amp;s=-1' title='(t-1)&amp;s=-1' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=%28t%29%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='(t)&amp;s=-1' title='(t)&amp;s=-1' class='latex' />, can be is seen in step 3 of the algorithm outlined in the previous section. At iteration <img src='http://s0.wp.com/latex.php?latex=t%26%2338%3Bs%3D-1%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='t&amp;s=-1&amp;s=-1' title='t&amp;s=-1&amp;s=-1' class='latex' /> we first sample a new state for variable <img src='http://s0.wp.com/latex.php?latex=x_1%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_1&amp;s=-1' title='x_1&amp;s=-1' class='latex' /> conditioned on the most recent state of variable <img src='http://s0.wp.com/latex.php?latex=x_2%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_2&amp;s=-1' title='x_2&amp;s=-1' class='latex' />, which is from iteration <img src='http://s0.wp.com/latex.php?latex=%28t-1%29%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='(t-1)&amp;s=-1' title='(t-1)&amp;s=-1' class='latex' />. We then sample a new state for the variable <img src='http://s0.wp.com/latex.php?latex=x_2%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_2&amp;s=-1' title='x_2&amp;s=-1' class='latex' /> conditioned on the most recent state of variable <img src='http://s0.wp.com/latex.php?latex=x_1%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_1&amp;s=-1' title='x_1&amp;s=-1' class='latex' />, which is now from the current iteration, <img src='http://s0.wp.com/latex.php?latex=t%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='t&amp;s=-1' title='t&amp;s=-1' class='latex' />.</p>
<p>After some math (which which I will skip for some brevity, but see the <a title="Derivation of Multivariate Normal Conditionals" href="http://fourier.eng.hmc.edu/e161/lectures/gaussianprocess/node7.html" target="_blank">following</a> for some details), we find that the two conditional distributions for the target Normal distribution are:</p>
<p><img src='http://s0.wp.com/latex.php?latex=p%28x_1+%26%23124%3B+x_2%5E%7B%28t-1%29%7D%29+%3D+%5Cmathcal+N%28%5Cmu_1+%2B+%5Crho_%7B21%7D%28x_2%5E%7B%28t-1%29%7D+-+%5Cmu_2%29%2C%5Csqrt%7B1-%5Crho_%7B21%7D%5E2%7D%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(x_1 &#124; x_2^{(t-1)}) = &#92;mathcal N(&#92;mu_1 + &#92;rho_{21}(x_2^{(t-1)} - &#92;mu_2),&#92;sqrt{1-&#92;rho_{21}^2})' title='p(x_1 &#124; x_2^{(t-1)}) = &#92;mathcal N(&#92;mu_1 + &#92;rho_{21}(x_2^{(t-1)} - &#92;mu_2),&#92;sqrt{1-&#92;rho_{21}^2})' class='latex' /></p>
<p>and</p>
<p><img src='http://s0.wp.com/latex.php?latex=p%28x_2+%26%23124%3B+x_1%5E%7B%28t%29%7D%29%3D%5Cmathcal+N%28%5Cmu_2+%2B+%5Crho_%7B12%7D%28x_1%5E%7B%28t%29%7D-%5Cmu_1%29%2C%5Csqrt%7B1-%5Crho_%7B12%7D%5E2%7D%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(x_2 &#124; x_1^{(t)})=&#92;mathcal N(&#92;mu_2 + &#92;rho_{12}(x_1^{(t)}-&#92;mu_1),&#92;sqrt{1-&#92;rho_{12}^2})' title='p(x_2 &#124; x_1^{(t)})=&#92;mathcal N(&#92;mu_2 + &#92;rho_{12}(x_1^{(t)}-&#92;mu_1),&#92;sqrt{1-&#92;rho_{12}^2})' class='latex' />,</p>
<p>which are both univariate Normal distributions, each with a mean that is dependent on the value of the most recent state of the conditioning variable, and a variance that is dependent on the target covariances between the two variables.</p>
<p>Using the above expressions for the conditional probabilities of variables <img src='http://s0.wp.com/latex.php?latex=x_1%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_1&amp;s=-1' title='x_1&amp;s=-1' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=x_2%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_2&amp;s=-1' title='x_2&amp;s=-1' class='latex' />, we implement the Gibbs sampler using MATLAB below. The output of the sampler is shown here:</p>
<div id="attachment_2487" class="wp-caption aligncenter" style="width: 520px"><a href="http://theclevermachine.files.wordpress.com/2012/11/gibbssampler-2dnormal1.png"><img class="size-full wp-image-2487" title="gibbsSampler-2DNormal" alt="" src="http://theclevermachine.files.wordpress.com/2012/11/gibbssampler-2dnormal1.png?w=510&#038;h=382" height="382" width="510" /></a><p class="wp-caption-text">Gibbs sampler Markov chain and samples for bivariate Normal target distribution</p></div>
<p>Inspecting the figure above, note how at each iteration the Markov chain for the Gibbs sampler first takes a step only along the <img src='http://s0.wp.com/latex.php?latex=x_1%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_1&amp;s=-1' title='x_1&amp;s=-1' class='latex' /> direction, then only along the <img src='http://s0.wp.com/latex.php?latex=x_2%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_2&amp;s=-1' title='x_2&amp;s=-1' class='latex' /> direction.  This shows how the Gibbs sampler sequentially samples the value of each variable separately, in a component-wise fashion.</p>
<pre class="brush: matlabkey; title: ; notranslate" title="">
% EXAMPLE: GIBBS SAMPLER FOR BIVARIATE NORMAL
rand('seed' ,12345);
nSamples = 5000;

mu = [0 0]; % TARGET MEAN
rho(1) = 0.8; % rho_21
rho(2) = 0.8; % rho_12

% INITIALIZE THE GIBBS SAMPLER
propSigma = 1; % PROPOSAL VARIANCE
minn = [-3 -3];
maxx = [3 3];

% INITIALIZE SAMPLES
x = zeros(nSamples,2);
x(1,1) = unifrnd(minn(1), maxx(1));
x(1,2) = unifrnd(minn(2), maxx(2));

dims = 1:2; % INDEX INTO EACH DIMENSION

% RUN GIBBS SAMPLER
t = 1;
while t &#60; nSamples
    t = t + 1;
    T = [t-1,t];
    for iD = 1:2 % LOOP OVER DIMENSIONS
        % UPDATE SAMPLES
        nIx = dims~=iD; % *NOT* THE CURRENT DIMENSION
        % CONDITIONAL MEAN
        muCond = mu(iD) + rho(iD)*(x(T(iD),nIx)-mu(nIx));
        % CONDITIONAL VARIANCE
        varCond = sqrt(1-rho(iD)^2);
        % DRAW FROM CONDITIONAL
        x(t,iD) = normrnd(muCond,varCond);
    end
end

% DISPLAY SAMPLING DYNAMICS
figure;
h1 = scatter(x(:,1),x(:,2),'r.');

% CONDITIONAL STEPS/SAMPLES
hold on;
for t = 1:50
    plot([x(t,1),x(t+1,1)],[x(t,2),x(t,2)],'k-');
    plot([x(t+1,1),x(t+1,1)],[x(t,2),x(t+1,2)],'k-');
    h2 = plot(x(t+1,1),x(t+1,2),'ko');
end

h3 = scatter(x(1,1),x(1,2),'go','Linewidth',3);
legend([h1,h2,h3],{'Samples','1st 50 Samples','x(t=0)'},'Location','Northwest')
hold off;
xlabel('x_1');
ylabel('x_2');
axis square
</pre>
<h2>Wrapping Up</h2>
<p>The Gibbs sampler is a popular MCMC method for sampling from complex, multivariate probability distributions. However, the Gibbs sampler cannot be used for general sampling problems. For many target distributions, it may difficult or impossible to obtain a closed-form expression for all the needed conditional distributions. In other scenarios, analytic expressions may exist for all conditionals but it may be difficult to sample from any or all of the conditional distributions (in these scenarios it is common to use univariate sampling methods such as rejection sampling and (surprise!) Metropolis-type MCMC techniques to approximate samples from each conditional). Gibbs samplers are very popular for Bayesian methods where models are often devised in such a way that conditional expressions for all model variables are easily obtained and take well-known forms that can be sampled from efficiently.</p>
<p>Gibbs sampling, like many MCMC techniques suffer from what is often called &#8220;slow mixing.&#8221; Slow mixing occurs when the underlying Markov chain takes a long time to sufficiently explore the values of <img src='http://s0.wp.com/latex.php?latex=%5Cbold+x%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;bold x&amp;s=-1' title='&#92;bold x&amp;s=-1' class='latex' /> in order to give a good characterization of <img src='http://s0.wp.com/latex.php?latex=p%28%5Cbold+x%29%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(&#92;bold x)&amp;s=-1' title='p(&#92;bold x)&amp;s=-1' class='latex' />. Slow mixing is due to a number of factors including the &#8220;random walk&#8221; nature of the Markov chain, as well as the tendency of the Markov chain to get &#8220;stuck,&#8221; only sampling a single region of <img src='http://s0.wp.com/latex.php?latex=%5Cbold+x%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;bold x&amp;s=-1' title='&#92;bold x&amp;s=-1' class='latex' /> having high-probability under <img src='http://s0.wp.com/latex.php?latex=p%28%5Cbold+x%29%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(&#92;bold x)&amp;s=-1' title='p(&#92;bold x)&amp;s=-1' class='latex' />. Such behaviors are bad for sampling distributions with multiple modes or heavy tails. More advanced techniques, such as Hybrid Monte Carlo have been developed to incorporate additional dynamics that increase the efficiency of the Markov chain path. We will discuss Hybrid Monte Carlo in a future post.</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Introduction to Minimax Lower Bounds]]></title>
<link>http://maikolsolis.wordpress.com/2012/10/13/introduction-minimax-lower-bounds/</link>
<pubDate>Sat, 13 Oct 2012 19:32:28 +0000</pubDate>
<dc:creator>Maikol Solís</dc:creator>
<guid>http://maikolsolis.wordpress.com/2012/10/13/introduction-minimax-lower-bounds/</guid>
<description><![CDATA[In my most recent research, I&#8217;m working on finding &#8220;Minimax Lower Bounds&#8221; for some]]></description>
<content:encoded><![CDATA[<p style="text-align:left;">In my most recent research, I&#8217;m working on finding <em><strong>&#8220;Minimax Lower Bounds&#8221;</strong></em> for some kind of estimators. Therefore,  to learn a little more and get my ideas clear, I&#8217;ll going to start a series of posts about the topic<em>.</em> I pretend to make some review in the general method and introduce some bounds depending on the divergence between two probability measures. Also, I want to study the classic results of <strong>Le Cam</strong>, <strong>Fano</strong> and <strong>Assouad</strong>. I hope that these publications are very educational for all of us.</p>
<div id="attachment_1250" class="wp-caption aligncenter" style="width: 438px"><a href="http://maikolsolis.files.wordpress.com/2012/10/starting_line.jpg"><img class=" wp-image-1250" title="starting_line" alt="" src="http://maikolsolis.files.wordpress.com/2012/10/starting_line.jpg?w=428&#038;h=230" height="230" width="428" /></a><p class="wp-caption-text">A journey of a thousand miles begins with a single step. Lao-tzu. Photo credit: <a href="http://www.lbecker.com/blog/?p=1533" target="_blank">Larry Becker</a></p></div>
<p style="text-align:center;"><!--more--></p>
<p>In older posts (<a title="Density Estimation by Histograms (Part II)" href="http://maikolsolis.wordpress.com/2011/10/23/density-estimation-by-histograms-part-ii/" target="_blank">here</a>, <a title="Kernel density estimation" href="http://maikolsolis.wordpress.com/2011/12/23/kernel-density-estimation/" target="_blank">here</a>, <a title="A global measure of risk for kernel estimators in Nikol’ski classes" href="http://maikolsolis.wordpress.com/2011/12/30/a-global-measure-of-risk-for-kernel-estimators-in-nikolski-classes/" target="_blank">here </a>and <a title="Rates of convergence for the MISE in Sobolev classes of densities" href="http://maikolsolis.wordpress.com/2012/01/09/rates-of-convergence-for-the-mise-in-sobolev-classes-of-densities/" target="_blank">here</a>), we have already analyzed upper bounds for the <a title="Density Estimation by Histograms (Part II)" href="http://maikolsolis.wordpress.com/2011/10/23/density-estimation-by-histograms-part-ii/" target="_blank">MSE</a> or <a title="A global measure of risk for kernel estimators in Nikol’ski classes" href="http://maikolsolis.wordpress.com/2011/12/30/a-global-measure-of-risk-for-kernel-estimators-in-nikolski-classes/" target="_blank">MISE</a> which have the form</p>
<p style="text-align:center;"><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Csup_%7Bf%5Cin%5Cmathcal%7BF%7D%7D%5Cmathbb+E%5Cleft%5Bd%5E%7B2%7D%28%5Chat%7Bf%7D_%7Bn%7D%2Cf%29%5Cright%5D%5Cleq+C%5Cpsi_%7Bn%7D+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;sup_{f&#92;in&#92;mathcal{F}}&#92;mathbb E&#92;left[d^{2}(&#92;hat{f}_{n},f)&#92;right]&#92;leq C&#92;psi_{n} &amp;fg=000000' title='&#92;displaystyle &#92;sup_{f&#92;in&#92;mathcal{F}}&#92;mathbb E&#92;left[d^{2}(&#92;hat{f}_{n},f)&#92;right]&#92;leq C&#92;psi_{n} &amp;fg=000000' class='latex' /></p>
<p>where <img src='http://s0.wp.com/latex.php?latex=%7B%5Cpsi_%7Bn%7D%5E%7B2%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;psi_{n}^{2}}&amp;fg=000000' title='{&#92;psi_{n}^{2}}&amp;fg=000000' class='latex' /> is going to zero (for the density estimation case <img src='http://s0.wp.com/latex.php?latex=%7B%5Cpsi_%7Bn%7D%3Dn%5E%7B-4%2F5%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;psi_{n}=n^{-4/5}}&amp;fg=000000' title='{&#92;psi_{n}=n^{-4/5}}&amp;fg=000000' class='latex' />) and <img src='http://s0.wp.com/latex.php?latex=%7BC%26%2360%3B%5Cinfty%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{C&lt;&#92;infty}&amp;fg=000000' title='{C&lt;&#92;infty}&amp;fg=000000' class='latex' />. We would like to know if these rates are tight, in the sense that there is no other estimator that is better. To make this, we define a lower bound like</p>
<p style="text-align:center;"><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cinf_%7B%5Chat%7Bf%7D%7D%5Csup_%7Bf%5Cin%5Cmathcal%7BF%7D%7D%5Cmathbb+E%5Cleft%5Bd%5E%7B2%7D%28%5Chat%7Bf%7D_%7Bn%7D%2Cf%29%5Cright%5D%5Cgeq+c%5Cpsi_%7Bn%7D.+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;inf_{&#92;hat{f}}&#92;sup_{f&#92;in&#92;mathcal{F}}&#92;mathbb E&#92;left[d^{2}(&#92;hat{f}_{n},f)&#92;right]&#92;geq c&#92;psi_{n}. &amp;fg=000000' title='&#92;displaystyle &#92;inf_{&#92;hat{f}}&#92;sup_{f&#92;in&#92;mathcal{F}}&#92;mathbb E&#92;left[d^{2}(&#92;hat{f}_{n},f)&#92;right]&#92;geq c&#92;psi_{n}. &amp;fg=000000' class='latex' /></p>
<p>We will assume the following three element:</p>
<ul>
<li>A class of functions <img src='http://s0.wp.com/latex.php?latex=%7B%5Cmathcal%7BF%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;mathcal{F}}&amp;fg=000000' title='{&#92;mathcal{F}}&amp;fg=000000' class='latex' /> , containing the &#8220;true&#8221; function <img src='http://s0.wp.com/latex.php?latex=%7Bf%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{f}&amp;fg=000000' title='{f}&amp;fg=000000' class='latex' />. For example, <img src='http://s0.wp.com/latex.php?latex=%7B%5Cmathcal%7BF%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;mathcal{F}}&amp;fg=000000' title='{&#92;mathcal{F}}&amp;fg=000000' class='latex' /> could be a <a title="Kernel density estimation" href="http://maikolsolis.wordpress.com/2011/12/23/kernel-density-estimation/" target="_blank">Hölder class</a> <img src='http://s0.wp.com/latex.php?latex=%7B%5CSigma%28%5Cbeta%2CL%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;Sigma(&#92;beta,L)}&amp;fg=000000' title='{&#92;Sigma(&#92;beta,L)}&amp;fg=000000' class='latex' /> or a <a title="Rates of convergence for the MISE in Sobolev classes of densities" href="http://maikolsolis.wordpress.com/2012/01/09/rates-of-convergence-for-the-mise-in-sobolev-classes-of-densities/" target="_blank">Sobolev class</a> <img src='http://s0.wp.com/latex.php?latex=%7BW%28%5Cbeta%2CL%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{W(&#92;beta,L)}&amp;fg=000000' title='{W(&#92;beta,L)}&amp;fg=000000' class='latex' />.</li>
<li>A probability measure family <img src='http://s0.wp.com/latex.php?latex=%7B%5C%7B%5Cmathbb%7BP%7D_%7Bf%7D%2Cf%5Cin%5Cmathcal%7BF%7D%5C%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;{&#92;mathbb{P}_{f},f&#92;in&#92;mathcal{F}&#92;}}&amp;fg=000000' title='{&#92;{&#92;mathbb{P}_{f},f&#92;in&#92;mathcal{F}&#92;}}&amp;fg=000000' class='latex' />, indexed by <img src='http://s0.wp.com/latex.php?latex=%7B%5Cmathcal%7BF%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;mathcal{F}}&amp;fg=000000' title='{&#92;mathcal{F}}&amp;fg=000000' class='latex' /> on a measurable space <img src='http://s0.wp.com/latex.php?latex=%7B%28%5Cmathcal%7BX%7D%2C%5Cmathcal%7BA%7D%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{(&#92;mathcal{X},&#92;mathcal{A})}&amp;fg=000000' title='{(&#92;mathcal{X},&#92;mathcal{A})}&amp;fg=000000' class='latex' /> associated with the data. For example, in the density model, <img src='http://s0.wp.com/latex.php?latex=%7B%5Cmathbb%7BP%7D_%7Bf%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;mathbb{P}_{f}}&amp;fg=000000' title='{&#92;mathbb{P}_{f}}&amp;fg=000000' class='latex' /> is the probability measure associated with a sample <img src='http://s0.wp.com/latex.php?latex=%7B%28X_%7B1%7D%2C%5Cldots%2CX_%7Bn%7D%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{(X_{1},&#92;ldots,X_{n})}&amp;fg=000000' title='{(X_{1},&#92;ldots,X_{n})}&amp;fg=000000' class='latex' /> when the <img src='http://s0.wp.com/latex.php?latex=%7BX_%7Bi%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X_{i}}&amp;fg=000000' title='{X_{i}}&amp;fg=000000' class='latex' />&#8216;s density is <img src='http://s0.wp.com/latex.php?latex=%7Bf%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{f}&amp;fg=000000' title='{f}&amp;fg=000000' class='latex' />.</li>
<li>A distance <img src='http://s0.wp.com/latex.php?latex=%7Bd%28%5Ccdot%2C%5Ccdot%29%5Cgeq0%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{d(&#92;cdot,&#92;cdot)&#92;geq0}&amp;fg=000000' title='{d(&#92;cdot,&#92;cdot)&#92;geq0}&amp;fg=000000' class='latex' /> which compares the performance between the estimate <img src='http://s0.wp.com/latex.php?latex=%7B%5Chat%7Bf%7D_%7Bn%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;hat{f}_{n}}&amp;fg=000000' title='{&#92;hat{f}_{n}}&amp;fg=000000' class='latex' /> and the real value <img src='http://s0.wp.com/latex.php?latex=%7Bf%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{f}&amp;fg=000000' title='{f}&amp;fg=000000' class='latex' />. For example <img src='http://s0.wp.com/latex.php?latex=%7Bd%28f%2Cg%29%3D%5CVert+f-g%5CVert_%7B2%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{d(f,g)=&#92;Vert f-g&#92;Vert_{2}}&amp;fg=000000' title='{d(f,g)=&#92;Vert f-g&#92;Vert_{2}}&amp;fg=000000' class='latex' />, <img src='http://s0.wp.com/latex.php?latex=%7Bd%28f%2Cg%29%3D%5Cvert+f%28x_%7B0%7D%29-g%28x_%7B0%7D%29%5Cvert%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{d(f,g)=&#92;vert f(x_{0})-g(x_{0})&#92;vert}&amp;fg=000000' title='{d(f,g)=&#92;vert f(x_{0})-g(x_{0})&#92;vert}&amp;fg=000000' class='latex' /> for <img src='http://s0.wp.com/latex.php?latex=%7Bx_%7B0%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{x_{0}}&amp;fg=000000' title='{x_{0}}&amp;fg=000000' class='latex' /> fix.<br />
In fact, it is enough use a semi-distance which satisfies the <a class="zem_slink" title="Triangle inequality" href="http://en.wikipedia.org/wiki/Triangle_inequality" target="_blank" rel="wikipedia">triangle inequality</a>.</li>
</ul>
<p>Define, for a stochastic model <img src='http://s0.wp.com/latex.php?latex=%7B%5C%7B%5Cmathbb%7BP%7D_%7Bf%7D%2Cf%5Cin%5Cmathcal%7BF%7D%5C%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;{&#92;mathbb{P}_{f},f&#92;in&#92;mathcal{F}&#92;}}&amp;fg=000000' title='{&#92;{&#92;mathbb{P}_{f},f&#92;in&#92;mathcal{F}&#92;}}&amp;fg=000000' class='latex' /> and a distance <img src='http://s0.wp.com/latex.php?latex=%7Bd%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{d}&amp;fg=000000' title='{d}&amp;fg=000000' class='latex' />, the <em>minimax risk</em>as</p>
<p style="text-align:center;"><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cmathcal%7BR%7D%5E%7B%2A%7D%5Ctriangleq%5Cinf_%7B%5Chat%7Bf%7D%7D%5Csup_%7Bf%5Cin%5Cmathcal%7BF%7D%7D%5Cmathbb+E%5Cleft%5Bd%5E%7B2%7D%28%5Chat%7Bf%7D_%7Bn%7D%2Cf%29%5Cright%5D+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;mathcal{R}^{*}&#92;triangleq&#92;inf_{&#92;hat{f}}&#92;sup_{f&#92;in&#92;mathcal{F}}&#92;mathbb E&#92;left[d^{2}(&#92;hat{f}_{n},f)&#92;right] &amp;fg=000000' title='&#92;displaystyle &#92;mathcal{R}^{*}&#92;triangleq&#92;inf_{&#92;hat{f}}&#92;sup_{f&#92;in&#92;mathcal{F}}&#92;mathbb E&#92;left[d^{2}(&#92;hat{f}_{n},f)&#92;right] &amp;fg=000000' class='latex' /></p>
<p style="text-align:left;">where the <img src='http://s0.wp.com/latex.php?latex=%7B%5Cinf_%7B%5Chat%7Bf%7D%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;inf_{&#92;hat{f}}}&amp;fg=000000' title='{&#92;inf_{&#92;hat{f}}}&amp;fg=000000' class='latex' /> is taken over all estimators. The main goal is to get a result of the form</p>
<p style="text-align:center;"><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cmathcal%7BR%7D%5E%7B%2A%7D%5Cgeq+c%5Cpsi_%7Bn%7D+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;mathcal{R}^{*}&#92;geq c&#92;psi_{n} &amp;fg=000000' title='&#92;displaystyle &#92;mathcal{R}^{*}&#92;geq c&#92;psi_{n} &amp;fg=000000' class='latex' /></p>
<p>for <img src='http://s0.wp.com/latex.php?latex=%7Bc%26%2362%3B0%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{c&gt;0}&amp;fg=000000' title='{c&gt;0}&amp;fg=000000' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=%7B%5Cpsi_%7Bn%7D%5Crightarrow0%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;psi_{n}&#92;rightarrow0}&amp;fg=000000' title='{&#92;psi_{n}&#92;rightarrow0}&amp;fg=000000' class='latex' />.</p>
<p>Suppose that we have proved that</p>
<p style="text-align:center;"><a><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Climinf_%7Bn%5Crightarrow%5Cinfty%7D%5Cpsi_%7Bn%7D%5E%7B-1%7D%5Cmathcal%7BR%7D%5E%7B%2A%7D%5Cgeq+c%26%2362%3B0+%5C+%5C+%5C+%5C+%5C+%281%29%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;liminf_{n&#92;rightarrow&#92;infty}&#92;psi_{n}^{-1}&#92;mathcal{R}^{*}&#92;geq c&gt;0 &#92; &#92; &#92; &#92; &#92; (1)&amp;fg=000000' title='&#92;displaystyle &#92;liminf_{n&#92;rightarrow&#92;infty}&#92;psi_{n}^{-1}&#92;mathcal{R}^{*}&#92;geq c&gt;0 &#92; &#92; &#92; &#92; &#92; (1)&amp;fg=000000' class='latex' /></a></p>
<p>and if for a particular <img src='http://s0.wp.com/latex.php?latex=%7B%5Chat%7Bf%7D_%7Bn%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;hat{f}_{n}}&amp;fg=000000' title='{&#92;hat{f}_{n}}&amp;fg=000000' class='latex' /></p>
<p style="text-align:center;"><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Climsup_%7Bn%5Crightarrow%5Cinfty%7D%5Csup_%7Bf%5Cin%5Cmathcal%7BF%7D%7D%5Cpsi_%7Bn%7D%5E%7B-1%7D%5Cmathbb+E%5Cleft%5Bd%5E%7B2%7D%28%5Chat%7Bf%7D_%7Bn%7D%2Cf%29%5Cright%5D%5Cleq+C.+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;limsup_{n&#92;rightarrow&#92;infty}&#92;sup_{f&#92;in&#92;mathcal{F}}&#92;psi_{n}^{-1}&#92;mathbb E&#92;left[d^{2}(&#92;hat{f}_{n},f)&#92;right]&#92;leq C. &amp;fg=000000' title='&#92;displaystyle &#92;limsup_{n&#92;rightarrow&#92;infty}&#92;sup_{f&#92;in&#92;mathcal{F}}&#92;psi_{n}^{-1}&#92;mathbb E&#92;left[d^{2}(&#92;hat{f}_{n},f)&#92;right]&#92;leq C. &amp;fg=000000' class='latex' /></p>
<p>It implies directly that</p>
<p style="text-align:center;"><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Climsup_%7Bn%5Crightarrow%5Cinfty%7D%5Cpsi_%7Bn%7D%5E%7B-1%7D%5Cmathcal%7BR%7D%5E%7B%2A%7D%5Cleq+C.+%5C+%5C+%5C+%5C+%5C+%282%29%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;limsup_{n&#92;rightarrow&#92;infty}&#92;psi_{n}^{-1}&#92;mathcal{R}^{*}&#92;leq C. &#92; &#92; &#92; &#92; &#92; (2)&amp;fg=000000' title='&#92;displaystyle &#92;limsup_{n&#92;rightarrow&#92;infty}&#92;psi_{n}^{-1}&#92;mathcal{R}^{*}&#92;leq C. &#92; &#92; &#92; &#92; &#92; (2)&amp;fg=000000' class='latex' /></p>
<p>If inequalities <a>(1)</a> and <a>(2)</a> are satisfied simultaneously, we say that <img src='http://s0.wp.com/latex.php?latex=%7B%5Cpsi_%7Bn%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;psi_{n}}&amp;fg=000000' title='{&#92;psi_{n}}&amp;fg=000000' class='latex' /> is the optimal <a class="zem_slink" title="Rate of convergence" href="http://en.wikipedia.org/wiki/Rate_of_convergence" target="_blank" rel="wikipedia">rate of convergence</a> for this problem and that <img src='http://s0.wp.com/latex.php?latex=%7B%5Chat%7Bf%7D_%7Bn%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;hat{f}_{n}}&amp;fg=000000' title='{&#92;hat{f}_{n}}&amp;fg=000000' class='latex' /> attains that rate.</p>
<p><b>Remark 1</b> <em>Two rates of convergence <img src='http://s0.wp.com/latex.php?latex=%7B%5Cpsi_%7Bn%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;psi_{n}}&amp;fg=000000' title='{&#92;psi_{n}}&amp;fg=000000' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=%7B%5Cpsi_%7Bn%7D%5E%7B%5Cprime%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;psi_{n}^{&#92;prime}}&amp;fg=000000' title='{&#92;psi_{n}^{&#92;prime}}&amp;fg=000000' class='latex' /> are equivalent (we write <img src='http://s0.wp.com/latex.php?latex=%7B%5Cpsi_%7Bn%7D%5Casymp%5Cpsi_%7Bn%7D%5E%7B%5Cprime%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;psi_{n}&#92;asymp&#92;psi_{n}^{&#92;prime}}&amp;fg=000000' title='{&#92;psi_{n}&#92;asymp&#92;psi_{n}^{&#92;prime}}&amp;fg=000000' class='latex' />) if</em></p>
<p style="text-align:center;"><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+0%26%2360%3B%5Climinf_%7Bn%5Crightarrow%5Cinfty%7D%5Cfrac%7B%5Cpsi_%7Bn%7D%7D%7B%5Cpsi_%7Bn%7D%5E%7B%5Cprime%7D%7D%5Cleq%5Climsup_%7Bn%5Crightarrow%5Cinfty%7D%5Cfrac%7B%5Cpsi_%7Bn%7D%7D%7B%5Cpsi_%7Bn%7D%5E%7B%5Cprime%7D%7D%26%2360%3B%5Cinfty.+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle 0&lt;&#92;liminf_{n&#92;rightarrow&#92;infty}&#92;frac{&#92;psi_{n}}{&#92;psi_{n}^{&#92;prime}}&#92;leq&#92;limsup_{n&#92;rightarrow&#92;infty}&#92;frac{&#92;psi_{n}}{&#92;psi_{n}^{&#92;prime}}&lt;&#92;infty. &amp;fg=000000' title='&#92;displaystyle 0&lt;&#92;liminf_{n&#92;rightarrow&#92;infty}&#92;frac{&#92;psi_{n}}{&#92;psi_{n}^{&#92;prime}}&#92;leq&#92;limsup_{n&#92;rightarrow&#92;infty}&#92;frac{&#92;psi_{n}}{&#92;psi_{n}^{&#92;prime}}&lt;&#92;infty. &amp;fg=000000' class='latex' /></p>
<p>The big question here is: <em><strong>How can we prove the equation <a>(1)</a> if it is necessary bound <img src='http://s0.wp.com/latex.php?latex=%7B%5Cmathcal%7BR%7D%5E%7B%2A%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;mathcal{R}^{*}}&amp;fg=000000' title='{&#92;mathcal{R}^{*}}&amp;fg=000000' class='latex' /> for all the estimators <img src='http://s0.wp.com/latex.php?latex=%7B%5Chat%7Bf%7D_%7Bn%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;hat{f}_{n}}&amp;fg=000000' title='{&#92;hat{f}_{n}}&amp;fg=000000' class='latex' /> of <img src='http://s0.wp.com/latex.php?latex=%7Bf%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{f}&amp;fg=000000' title='{f}&amp;fg=000000' class='latex' />?</strong></em></p>
<p>At a first glance, it will be an impossible task, just imagine the massive quantity of possible estimators for <img src='http://s0.wp.com/latex.php?latex=%7Bf%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{f}&amp;fg=000000' title='{f}&amp;fg=000000' class='latex' />. Fortunately, we can apply a &#8220;reduction procedure&#8221; in order to simplify the problem and find the bound required. We will study it the next post.</p>
<p>As always any thoughts, suggestions or improvements are welcome in the commentaries.</p>
<h6 class="zemanta-related-title" style="font-size:1em;">Related articles</h6>
<ul class="zemanta-article-ul zemanta-article-ul-image" style="margin:0;padding:0;overflow:hidden;">
<li class="zemanta-article-ul-li-image zemanta-article-ul-li" style="padding:0;background:none;list-style:none;display:block;float:left;vertical-align:top;text-align:left;width:84px;font-size:11px;margin:2px 10px 10px 2px;"><a style="box-shadow:0 0 4px #999;padding:2px;display:block;border-radius:2px;text-decoration:none;" href="http://maikolsolis.wordpress.com/2012/01/09/rates-of-convergence-for-the-mise-in-sobolev-classes-of-densities/" target="_blank"><img style="padding:0;margin:0;border:0;display:block;width:80px;max-width:100%;" alt="" src="http://i.zemanta.com/noimg_6_80_80.jpg" /></a><a style="display:block;overflow:hidden;text-decoration:none;line-height:12pt;height:80px;padding:5px 2px 0;" href="http://maikolsolis.wordpress.com/2012/01/09/rates-of-convergence-for-the-mise-in-sobolev-classes-of-densities/" target="_blank">Rates of convergence for the MISE in Sobolev classes of densities</a></li>
<li class="zemanta-article-ul-li-image zemanta-article-ul-li" style="padding:0;background:none;list-style:none;display:block;float:left;vertical-align:top;text-align:left;width:84px;font-size:11px;margin:2px 10px 10px 2px;"><a style="box-shadow:0 0 4px #999;padding:2px;display:block;border-radius:2px;text-decoration:none;" href="http://maikolsolis.wordpress.com/2012/10/21/a-first-minimax-lower-bound-in-the-two-hypothesis-scenario/" target="_blank"><img style="padding:0;margin:0;border:0;display:block;width:80px;max-width:100%;" alt="" src="http://i.zemanta.com/120231848_80_80.jpg" /></a><a style="display:block;overflow:hidden;text-decoration:none;line-height:12pt;height:80px;padding:5px 2px 0;" href="http://maikolsolis.wordpress.com/2012/10/21/a-first-minimax-lower-bound-in-the-two-hypothesis-scenario/" target="_blank">A first minimax lower bound in the two hypothesis scenario</a></li>
<li class="zemanta-article-ul-li-image zemanta-article-ul-li" style="padding:0;background:none;list-style:none;display:block;float:left;vertical-align:top;text-align:left;width:84px;font-size:11px;margin:2px 10px 10px 2px;"><a style="box-shadow:0 0 4px #999;padding:2px;display:block;border-radius:2px;text-decoration:none;" href="http://maikolsolis.wordpress.com/2012/10/16/a-general-reduction-schema-for-minimax-lower-bounds/" target="_blank"><img style="padding:0;margin:0;border:0;display:block;width:80px;max-width:100%;" alt="" src="http://i.zemanta.com/119215326_80_80.jpg" /></a><a style="display:block;overflow:hidden;text-decoration:none;line-height:12pt;height:80px;padding:5px 2px 0;" href="http://maikolsolis.wordpress.com/2012/10/16/a-general-reduction-schema-for-minimax-lower-bounds/" target="_blank">A general reduction scheme for minimax lower bounds</a></li>
<li class="zemanta-article-ul-li-image zemanta-article-ul-li" style="padding:0;background:none;list-style:none;display:block;float:left;vertical-align:top;text-align:left;width:84px;font-size:11px;margin:2px 10px 10px 2px;"><a style="box-shadow:0 0 4px #999;padding:2px;display:block;border-radius:2px;text-decoration:none;" href="http://maikolsolis.wordpress.com/2012/10/13/introduction-to-minimax-lower-bounds/" target="_blank"><img style="padding:0;margin:0;border:0;display:block;width:80px;max-width:100%;" alt="" src="http://i.zemanta.com/118531803_80_80.jpg" /></a><a style="display:block;overflow:hidden;text-decoration:none;line-height:12pt;height:80px;padding:5px 2px 0;" href="http://maikolsolis.wordpress.com/2012/10/13/introduction-to-minimax-lower-bounds/" target="_blank">Introduction to Minimax Lower Bounds</a></li>
<li class="zemanta-article-ul-li-image zemanta-article-ul-li" style="padding:0;background:none;list-style:none;display:block;float:left;vertical-align:top;text-align:left;width:84px;font-size:11px;margin:2px 10px 10px 2px;"><a style="box-shadow:0 0 4px #999;padding:2px;display:block;border-radius:2px;text-decoration:none;" href="http://maikolsolis.wordpress.com/2011/12/30/a-global-measure-of-risk-for-kernel-estimators-in-nikolski-classes/" target="_blank"><img style="padding:0;margin:0;border:0;display:block;width:80px;max-width:100%;" alt="" src="http://i.zemanta.com/117224061_80_80.jpg" /></a><a style="display:block;overflow:hidden;text-decoration:none;line-height:12pt;height:80px;padding:5px 2px 0;" href="http://maikolsolis.wordpress.com/2011/12/30/a-global-measure-of-risk-for-kernel-estimators-in-nikolski-classes/" target="_blank">A global measure of risk for kernel estimators in Nikol&#8217;ski classes</a></li>
</ul>
<p>&#160;</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Multi-Temporal Unmixing: an Analogy to Change Detection]]></title>
<link>http://andreasrabe.wordpress.com/2012/10/04/temporal-unmixing-an-analogy-to-change-detection/</link>
<pubDate>Thu, 04 Oct 2012 10:07:09 +0000</pubDate>
<dc:creator>Andreas Rabe</dc:creator>
<guid>http://andreasrabe.wordpress.com/2012/10/04/temporal-unmixing-an-analogy-to-change-detection/</guid>
<description><![CDATA[Last week, during a talk given by Ben Somers from the VITO, I heared something about multi-temporal]]></description>
<content:encoded><![CDATA[<p>Last week, during a talk given by Ben Somers from the <a href="http://hyperspectral.vgt.vito.be/">VITO</a>, I heared something about multi-temporal spectral unmixing. Basically, the features are given by a stack of hyperspectral images (different month of a single year) and the endmembers are image endmembers. The main assumption is, that their is <strong>no change in pixel cover fraction</strong> throughout the year. This approach works very well and produces better results compared to single-date unmixing.</p>
<p>Now I was wondering if this approach can be extended to situations where the <strong>pixel cover fractions are changing</strong> over time? In analogy to change detection for qualitative classes, where we for example want to find the areas that have changed from Forest to Agriculture to Grassland, we could use multi-temporal unmixing to answer this question even more precise.</p>
<p>When dealing with change detection there are to common ways to deal with multi-temporal data: a) you classify each date individually and calculate the change classes afterword, b) you use the stack of all images and directly deal with the change classes. Both approaches do have some disadvantages: for a), only single-date information is available, and for b) the number of change classes quickly rises with the number of classes and the number of dates. To overcome this, one could use the whole image stack to calculate each single date classification (let us call this c)). Some tests show that this approach works well (to my knowledge, it is not yet published).</p>
<p>To come back to the multi-temporal unmixing problem with changing pixel cover fractios: approach a) and b) can directly be transfered for unmixing problems. Unfortunately, approach c) is a bit tricky, because we want to use the stack of all images to unmix a single date image. But unfortunately, the single date endmembers do not match the image stack feature space. Have to thing about a solution for this!</p>
<p>By writing all this, it came to my mind, that even for the regression case, their might be an analogy to the multi-temporal setting. To be even more general, all of the function learning concepts like classification, regression and density estimation might be put into a multi-temporal framework like described above.</p>
<p><strong>Short summary:</strong></p>
<ul>
<li>New change detection approach: use the multi-temporal image stack to classify single-date images; calculate the change classes afterward; advantage: use all multi-temporal information without exploding class combinations; disadvantages: no (?)</li>
<li>Transfer this approach to multi-temporal unmixing; unresolved problem: single-date endmembers do not match multi-temporal stack</li>
<li>general multi-temporal function learning framework might be worth thinking of</li>
</ul>
]]></content:encoded>
</item>
<item>
<title><![CDATA[A Brief Introduction to Markov Chains]]></title>
<link>http://theclevermachine.wordpress.com/2012/09/24/a-brief-introduction-to-markov-chains/</link>
<pubDate>Tue, 25 Sep 2012 06:43:36 +0000</pubDate>
<dc:creator>dustinstansbury</dc:creator>
<guid>http://theclevermachine.wordpress.com/2012/09/24/a-brief-introduction-to-markov-chains/</guid>
<description><![CDATA[Markov chains are an essential component of Markov chain Monte Carlo (MCMC) techniques. Under MCMC,]]></description>
<content:encoded><![CDATA[<p>Markov chains are an essential component of Markov chain Monte Carlo (MCMC) techniques. Under MCMC, the Markov chain is used to sample from some target distribution. To get a better understanding of what a Markov chain is, and further, how it can be used to sample form a distribution, this post introduces and applies a few basic concepts.</p>
<p>A Markov chain is a stochastic process that operates sequentially (e.g. temporally), transitioning from one state to another within an allowed set of states.<sup>†</sup></p>
<p><img src='http://s0.wp.com/latex.php?latex=x%5E%7B%280%29%7D+%5Crightarrow+x%5E%7B%281%29%7D+%5Crightarrow+x%5E%7B%282%29%7D+%5Cdots+%5Crightarrow+x%5E%7B%28t%29%7D+%5Crightarrow+%5Cdots&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x^{(0)} &#92;rightarrow x^{(1)} &#92;rightarrow x^{(2)} &#92;dots &#92;rightarrow x^{(t)} &#92;rightarrow &#92;dots' title='x^{(0)} &#92;rightarrow x^{(1)} &#92;rightarrow x^{(2)} &#92;dots &#92;rightarrow x^{(t)} &#92;rightarrow &#92;dots' class='latex' /></p>
<p>A Markov chain is defined by three elements:</p>
<ol>
<li>A <strong><em>state space</em></strong> <img src='http://s0.wp.com/latex.php?latex=x&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x' title='x' class='latex' />, which is a set of values that the chain is allowed to take</li>
<li>A <strong><em>transition operator</em></strong> <img src='http://s0.wp.com/latex.php?latex=p%28x%5E%7B%28t%2B1%29%7D+%26%23124%3B+x%5E%7B%28t%29%7D%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(x^{(t+1)} &#124; x^{(t)})' title='p(x^{(t+1)} &#124; x^{(t)})' class='latex' /> that defines  the probability of moving from state <img src='http://s0.wp.com/latex.php?latex=x%5E%7B%28t%29%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x^{(t)}' title='x^{(t)}' class='latex' /> to <img src='http://s0.wp.com/latex.php?latex=x%5E%7B%28t%2B1%29%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x^{(t+1)}' title='x^{(t+1)}' class='latex' /> .</li>
<li>An <strong><em>initial condition distribution</em></strong> <img src='http://s0.wp.com/latex.php?latex=%5Cpi%5E%7B%280%29%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;pi^{(0)}' title='&#92;pi^{(0)}' class='latex' /> which defines the probability of being in any one of the possible states at the initial iteration <img src='http://s0.wp.com/latex.php?latex=t+%3D+0&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='t = 0' title='t = 0' class='latex' />.</li>
</ol>
<p>The Markov chain starts at some initial state, which is sampled from <img src='http://s0.wp.com/latex.php?latex=%5Cpi%5E%7B%280%29%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;pi^{(0)}' title='&#92;pi^{(0)}' class='latex' />, then transitions from one state to another according to the transition operator <img src='http://s0.wp.com/latex.php?latex=p%28x%5E%7B%28t%2B1%29%7D+%26%23124%3B+x%5E%7B%28t%29%7D%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(x^{(t+1)} &#124; x^{(t)})' title='p(x^{(t+1)} &#124; x^{(t)})' class='latex' />.</p>
<p>A Markov chain is called <strong><em>memoryless </em></strong>if the next state only depends on the current state and not on any of the states previous to the current:</p>
<p><img src='http://s0.wp.com/latex.php?latex=p%28x%5E%7B%28t%2B1%29%7D%26%23124%3Bx%5E%7B%28t%29%7D%2Cx%5E%7B%28t-1%29%7D%2C...x%5E%7B%280%29%7D%29+%3D+p%28x%5E%7B%28t%2B1%29%7D%26%23124%3Bx%5E%7B%28t%29%7D%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(x^{(t+1)}&#124;x^{(t)},x^{(t-1)},...x^{(0)}) = p(x^{(t+1)}&#124;x^{(t)})' title='p(x^{(t+1)}&#124;x^{(t)},x^{(t-1)},...x^{(0)}) = p(x^{(t+1)}&#124;x^{(t)})' class='latex' /></p>
<p>(This memoryless property is formally know as the <em>Markov property</em>).</p>
<p>If the transition operator for a Markov chain does not change across transitions, the Markov chain is called <strong><em>time homogenous</em></strong>.  A nice  property of time homogenous Markov chains is that as the chain runs for a long time and <img src='http://s0.wp.com/latex.php?latex=t+%5Crightarrow+%5Cinfty&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='t &#92;rightarrow &#92;infty' title='t &#92;rightarrow &#92;infty' class='latex' />, the chain will reach an equilibrium that is called the chain&#8217;s <strong><em>stationary distribution</em></strong>:</p>
<p><img src='http://s0.wp.com/latex.php?latex=p%28x%5E%7B%28t%2B1%29%7D+%26%23124%3B+x%5E%7B%28t%29%7D%29+%3D+p%28x%5E%7B%28t%29%7D+%26%23124%3B+x%5E%7B%28t-1%29%7D%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(x^{(t+1)} &#124; x^{(t)}) = p(x^{(t)} &#124; x^{(t-1)})' title='p(x^{(t+1)} &#124; x^{(t)}) = p(x^{(t)} &#124; x^{(t-1)})' class='latex' /></p>
<p>We&#8217;ll see later how the stationary distribution of a Markov chain is important for sampling from probability distributions, a technique that is at the heart of Markov Chain Monte Carlo (MCMC) methods.</p>
<h2>Finite state-space (time homogenous) Markov chain</h2>
<p>If the state space of a Markov chain takes on a finite number of distinct values, and it is time homogenous, then the transition operator can be defined by a matrix <img src='http://s0.wp.com/latex.php?latex=P&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='P' title='P' class='latex' />, where the entries of <img src='http://s0.wp.com/latex.php?latex=P&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='P' title='P' class='latex' /> are:</p>
<p><img src='http://s0.wp.com/latex.php?latex=p_%7Bij%7D+%3D+p%28X%5E%7B%28t%2B1%29%7D+%3D+j+%26%23124%3B+x%5E%7B%28t%29%7D+%3D+i%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p_{ij} = p(X^{(t+1)} = j &#124; x^{(t)} = i)' title='p_{ij} = p(X^{(t+1)} = j &#124; x^{(t)} = i)' class='latex' /></p>
<p>This means that if the chain is currently in the <img src='http://s0.wp.com/latex.php?latex=i&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='i' title='i' class='latex' />-th state, the transition operator assigns the probability of moving to the  <img src='http://s0.wp.com/latex.php?latex=j&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='j' title='j' class='latex' />-th state by the entries of <img src='http://s0.wp.com/latex.php?latex=i&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='i' title='i' class='latex' />-th row of <img src='http://s0.wp.com/latex.php?latex=P&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='P' title='P' class='latex' /> (i.e.  each row of <img src='http://s0.wp.com/latex.php?latex=P&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='P' title='P' class='latex' /> defines a conditional probability distribution on the state space). Let&#8217;s take a look at a finite state-space Markov chain in action with a simple example.</p>
<h3>Example: Predicting the weather with a finite state-space Markov chain</h3>
<p>In Berkeley, CA, there are (literally) only 3 types of weather: sunny, foggy, or rainy (this is analogous to a state-space that takes on three discrete values). The weather patterns are very stable there, so a Berkeley weatherman can easily predict the weather next week based on the weather today with the following transition rules:</p>
<p>If it is <em>sunny</em> today, then</p>
<ul>
<li>it is highly likely that it will be<em> sunny </em>next week
<ul>
<li><img src='http://s0.wp.com/latex.php?latex=p%28X%5E%7B%28week%29%7D%3Dsunny+%26%23124%3B+X%5E%7B%28today%29%7D%3Dsunny%29%3D0.8%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(X^{(week)}=sunny &#124; X^{(today)}=sunny)=0.8&amp;s=-1' title='p(X^{(week)}=sunny &#124; X^{(today)}=sunny)=0.8&amp;s=-1' class='latex' />,</li>
</ul>
</li>
<li>it is very unlikely that it will be <em>raining </em>next week
<ul>
<li><img src='http://s0.wp.com/latex.php?latex=p%28X%5E%7B%28week%29%7D%3Drainy+%26%23124%3B+X%5E%7B%28today%29%7D%3Dsunny%29%3D0.05%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(X^{(week)}=rainy &#124; X^{(today)}=sunny)=0.05&amp;s=-1' title='p(X^{(week)}=rainy &#124; X^{(today)}=sunny)=0.05&amp;s=-1' class='latex' /></li>
</ul>
</li>
<li>and somewhat likely that it will <em>foggy </em>next week
<ul>
<li><img src='http://s0.wp.com/latex.php?latex=p%28X%5E%7B%28week%29%7D%3Dfoggy+%26%23124%3B+X%5E%7B%28today%29%7D%3Dsunny%29%3D0.15%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(X^{(week)}=foggy &#124; X^{(today)}=sunny)=0.15&amp;s=-1' title='p(X^{(week)}=foggy &#124; X^{(today)}=sunny)=0.15&amp;s=-1' class='latex' /></li>
</ul>
</li>
</ul>
<p>If it is <em>foggy</em> today then</p>
<ul>
<li>it is somewhat likely that it will be <em>sunny </em>next week
<ul>
<li><img src='http://s0.wp.com/latex.php?latex=p%28X%5E%7B%28week%29%7D%3Dsunny+%26%23124%3B+X%5E%7B%28today%29%7D%3Dfoggy%29%3D0.4%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(X^{(week)}=sunny &#124; X^{(today)}=foggy)=0.4&amp;s=-1' title='p(X^{(week)}=sunny &#124; X^{(today)}=foggy)=0.4&amp;s=-1' class='latex' /></li>
</ul>
</li>
<li>but slightly less likely that it will be<em> foggy </em>next week
<ul>
<li><img src='http://s0.wp.com/latex.php?latex=p%28X%5E%7B%28week%29%7D%3Dfoggy+%26%23124%3B+X%5E%7B%28today%29%7D%3Dfoggy%29%3D0.5%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(X^{(week)}=foggy &#124; X^{(today)}=foggy)=0.5&amp;s=-1' title='p(X^{(week)}=foggy &#124; X^{(today)}=foggy)=0.5&amp;s=-1' class='latex' />,</li>
</ul>
</li>
<li>and fairly unlikely that it will be <em>raining </em>next week.
<ul>
<li><img src='http://s0.wp.com/latex.php?latex=p%28X%5E%7B%28week%29%7D%3Drainy+%26%23124%3B+X%5E%7B%28today%29%7D%3Dfoggy%29%3D0.1%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(X^{(week)}=rainy &#124; X^{(today)}=foggy)=0.1&amp;s=-1' title='p(X^{(week)}=rainy &#124; X^{(today)}=foggy)=0.1&amp;s=-1' class='latex' />,</li>
</ul>
</li>
</ul>
<p>If it is <em>rainy</em> today then</p>
<ul>
<li>it is unlikely that it will be <em>sunny </em>next week
<ul>
<li><img src='http://s0.wp.com/latex.php?latex=p%28X%5E%7B%28week%29%7D%3Dsunny+%26%23124%3B+X%5E%7B%28today%29%7D%3Drainy%29%3D0.1%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(X^{(week)}=sunny &#124; X^{(today)}=rainy)=0.1&amp;s=-1' title='p(X^{(week)}=sunny &#124; X^{(today)}=rainy)=0.1&amp;s=-1' class='latex' />,</li>
</ul>
</li>
<li>it is somewhat likely that it will be <em>foggy </em>next week
<ul>
<li><img src='http://s0.wp.com/latex.php?latex=p%28X%5E%7B%28week%29%7D%3Dfoggy+%26%23124%3B+X%5E%7B%28today%29%7D%3Drainy%29%3D0.3%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(X^{(week)}=foggy &#124; X^{(today)}=rainy)=0.3&amp;s=-1' title='p(X^{(week)}=foggy &#124; X^{(today)}=rainy)=0.3&amp;s=-1' class='latex' />,</li>
</ul>
</li>
<li>and it is fairly likely that it will be <em>rainy </em>next week
<ul>
<li><img src='http://s0.wp.com/latex.php?latex=p%28X%5E%7B%28week%29%7D%3Drainy+%26%23124%3B+X%5E%7B%28today%29%7D%3Drainy%29%3D0.6%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(X^{(week)}=rainy &#124; X^{(today)}=rainy)=0.6&amp;s=-1' title='p(X^{(week)}=rainy &#124; X^{(today)}=rainy)=0.6&amp;s=-1' class='latex' />,</li>
</ul>
</li>
</ul>
<p>All of these transition rules can be instantiated in a single 3 x 3 transition operator matrix:</p>
<p><img src='http://s0.wp.com/latex.php?latex=P+%3D+%5Cbegin%7Bbmatrix%7D+0.8+%26%2338%3B+0.15+%26%2338%3B+0.05%5C%5C+0.4+%26%2338%3B+0.5+%26%2338%3B+0.1%5C%5C+0.1%26%2338%3B+0.3+%26%2338%3B+0.6+%5Cend%7Bbmatrix%7D+&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='P = &#92;begin{bmatrix} 0.8 &amp; 0.15 &amp; 0.05&#92;&#92; 0.4 &amp; 0.5 &amp; 0.1&#92;&#92; 0.1&amp; 0.3 &amp; 0.6 &#92;end{bmatrix} ' title='P = &#92;begin{bmatrix} 0.8 &amp; 0.15 &amp; 0.05&#92;&#92; 0.4 &amp; 0.5 &amp; 0.1&#92;&#92; 0.1&amp; 0.3 &amp; 0.6 &#92;end{bmatrix} ' class='latex' /></p>
<p>Where each row of <img src='http://s0.wp.com/latex.php?latex=P&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='P' title='P' class='latex' /> corresponds to the weather at iteration <img src='http://s0.wp.com/latex.php?latex=t&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='t' title='t' class='latex' /> (<em>today</em>), and each column corresponds to the weather the next week. Let&#8217;s say that it is rainy today, what is the probability it will be sunny next week, in two weeks, or in 6 months? We can answer these questions by running a Markov chain from the initial state of &#8220;rainy,&#8221; transitioning according to <img src='http://s0.wp.com/latex.php?latex=P&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='P' title='P' class='latex' />. The following chunk of MATLAB code runs the Markov chain.</p>
<pre class="brush: matlabkey; title: ; notranslate" title="">
% FINITE STATE-SPACE MARKOV CHAIN EXAMPLE

% TRANSITION OPERATOR
%     S  F  R
%     U  O  A
%     N  G  I
%     N  G  N
%     Y  Y  Y
P = [.8 .15 .05;  % SUNNY
     .4 .5  .1;   % FOGGY
     .1 .3  .6];  % RAINY
nWeeks = 25

% INITIAL STATE IS RAINY
X(1,:) = [0 0 1];

% RUN MARKOV CHAIN
for iB = 2:nWeeks
    X(iB,:) = X(iB-1,:)*P; % TRANSITION
end

% DISPLAY
figure; hold on
h(1) = plot(1:nWeeks,X(:,1),'r','Linewidth',2);
h(2) = plot(1:nWeeks,X(:,2),'k','Linewidth',2);
h(3) = plot(1:nWeeks,X(:,3),'b','Linewidth',2);
h(4) = plot([15 15],[0 1],'g--','Linewidth',2);
hold off
legend(h, {'Sunny','Foggy','Rainy','Burn In'});
xlabel('Week')
ylabel('p(Weather)')
xlim([1,nWeeks]);
ylim([0 1]);

% PREDICTIONS
fprintf('\np(weather) in 1 week --&#62;'), disp(X(2,:))
fprintf('\np(weather) in 2 weeks --&#62;'), disp(X(3,:))
fprintf('\np(weather) in 6 months --&#62;'), disp(X(25,:))
</pre>
<div id="attachment_1630" class="wp-caption aligncenter" style="width: 520px"><a href="http://theclevermachine.files.wordpress.com/2012/09/markovchainintro6.png"><img class="size-full wp-image-1630" title="markovChainIntro" alt="" src="http://theclevermachine.files.wordpress.com/2012/09/markovchainintro6.png?w=510&#038;h=382" height="382" width="510" /></a><p class="wp-caption-text">Finite state-space Markov chain for predicting the weather</p></div>
<p>Here we see that at week 1 the probability of sunny weather is 0.1. The next week, the probability of sunny weather is 0.26, and in 6 months, there is a 60% chance that it will be sunny. Also note that after approximately 15 weeks the Markov chain has reached the equilibrium/stationary distribution and, chances are, the weather will be sunny. This 15-week period is what is known as the <strong><em>burn in</em></strong> period for the Markov chain, and is the number of transitions it takes the chain to move from the initial conditions to the stationary distribution.</p>
<p>A cool thing about finite state-space time-homogeneous Markov chain is that it is not necessary  to run the chain sequentially through all iterations in order to predict a state in the future. Instead we can predict by first raising the transition operator to the <img src='http://s0.wp.com/latex.php?latex=T&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='T' title='T' class='latex' />-th power, where <img src='http://s0.wp.com/latex.php?latex=T&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='T' title='T' class='latex' /> is the iteration at which we want to predict, then multiplying the result by the distribution over the initial state, <img src='http://s0.wp.com/latex.php?latex=%5Cpi%5E%7B%280%29%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;pi^{(0)}' title='&#92;pi^{(0)}' class='latex' />. For instance, to predict the probability of the weather in 2 weeks, knowing that it is rainy today (i.e. <img src='http://s0.wp.com/latex.php?latex=%5Cpi%5E%7B%280%29%7D+%3D+%5B0%2C0%2C1%5D%26%2338%3Bs%3D-1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;pi^{(0)} = [0,0,1]&amp;s=-1' title='&#92;pi^{(0)} = [0,0,1]&amp;s=-1' class='latex' />):</p>
<p><img src='http://s0.wp.com/latex.php?latex=p%28x%5E%7B%28week2%29%7D%29+%3D+%5Cpi%5E%7B%280%29%7DP%5E2+%3D+%5B0.26%2C+0.345%2C+0.395%5D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(x^{(week2)}) = &#92;pi^{(0)}P^2 = [0.26, 0.345, 0.395]' title='p(x^{(week2)}) = &#92;pi^{(0)}P^2 = [0.26, 0.345, 0.395]' class='latex' /></p>
<p>and in six months:</p>
<p><img src='http://s0.wp.com/latex.php?latex=p%28x%5E%7B%28week24%29%7D%29+%3D+%5Cpi%5E%7B%280%29%7DP%5E%7B24%7D+%3D+%5B0.596%2C+0.263%2C+0.140%5D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(x^{(week24)}) = &#92;pi^{(0)}P^{24} = [0.596, 0.263, 0.140]' title='p(x^{(week24)}) = &#92;pi^{(0)}P^{24} = [0.596, 0.263, 0.140]' class='latex' /></p>
<p>These are the same results we get by running the Markov chain sequentially through each number of transitions. Therefore we can calculate an approximation to the stationary distribution from <img src='http://s0.wp.com/latex.php?latex=P&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='P' title='P' class='latex' /> by setting <img src='http://s0.wp.com/latex.php?latex=T&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='T' title='T' class='latex' /> to a large number. It turns out that it is also possible to analytically derive the stationary distribution from <img src='http://s0.wp.com/latex.php?latex=P&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='P' title='P' class='latex' /> (hint: think about the properties of eigenvectors).</p>
<h2>Continuous state-space Markov chains</h2>
<p>A Markov chain can also have a continuous state space that exists in the real numbers <img src='http://s0.wp.com/latex.php?latex=x+%5Cin+%5Cmathbb%7BR%7D%5EN&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x &#92;in &#92;mathbb{R}^N' title='x &#92;in &#92;mathbb{R}^N' class='latex' />. In this case the transition operator cannot be instantiated simply as a matrix, but is instead some continuous function on the real numbers. Note that the continuous state-space Markov chain also has a burn in period and a stationary distribution. However, the stationary distribution will also be over a continuous set of variables. To get a better understanding of the workings of a continuous state-space Markov chain, let&#8217;s look at a simple example.</p>
<h3>Example: Sampling from a continuous distribution using continuous state-space Markov chains</h3>
<p>We can use the stationary distribution of a continuous state-space Markov chain in order to sample from a continuous probability distribution: we  run a Markov chain for a sufficient amount of time so that it has reached its stationary distribution, then keep the states that the chain visits as samples from that stationary distribution.</p>
<p>In the following example we define a continuous state-space Markov chain. The transition operator is a Normal distribution with unit variance and a mean that is half the distance between zero and the previous state, and the distribution over initial conditions is a Normal distribution with zero mean and unit variance.</p>
<p>To ensure that the chain has moved sufficiently far from the initial conditions and that we are sampling  from the chain&#8217;s stationary distribution,  we will choose to throw away the first 50 burn in states of the chain. We can also run multiple chains simultaneously in order to sample the stationary distribution more densely. Here we choose to run 5 chains simultaneously.</p>
<pre class="brush: matlabkey; title: ; notranslate" title="">
% EXAMPLE OF CONTINUOUS STATE-SPACE MARKOV CHAIN

% INITIALIZE
randn('seed',12345)
nBurnin = 50; % # BURNIN
nChains = 5;  % # MARKOV CHAINS

% DEFINE TRANSITION OPERATOR
P = inline('normrnd(.5*x,1,1,nChains)','x','nChains');
nTransitions = 1000;
x = zeros(nTransitions,nChains);
x(1,:) = randn(1,nChains);

% RUN THE CHAINS
for iT = 2:nTransitions
    x(iT,:) = P(x(iT-1),nChains);
end

% DISPLAY BURNIN
figure
subplot(221); plot(x(1:100,:)); hold on;
minn = min(x(:));
maxx = max(x(:));
l = line([nBurnin nBurnin],[minn maxx],'color','k','Linewidth',2);
ylim([minn maxx])
legend(l,'~Burn-in','Location','SouthEast')
title('First 100 Samples'); hold off

% DISPLAY ENTIRE MARKOV CHAIN
subplot(223); plot(x);hold on;
l = line([nBurnin nBurnin],[minn maxx],'color','k','Linewidth',2);
legend(l,'~Burn-in','Location','SouthEast')
title('Entire Chain');

% DISPLAY SAMPLES FROM STATIONARY DISTRIBUTION
samples = x(nBurnin+1:end,:);
subplot(122);
[counts,bins] = hist(samples(:),100); colormap hot
b = bar(bins,counts);
legend(b,sprintf('Markov Chain\nSamples'));
title(['\mu=',num2str(mean(samples(:))),' \sigma=',num2str(var(samples(:)))])
</pre>
<div id="attachment_1655" class="wp-caption aligncenter" style="width: 520px"><a href="http://theclevermachine.files.wordpress.com/2012/09/markovchainintro22.png"><img class="size-full wp-image-1655" title="markovChainIntro2" alt="" src="http://theclevermachine.files.wordpress.com/2012/09/markovchainintro22.png?w=510&#038;h=276" height="276" width="510" /></a><p class="wp-caption-text">Sampling from the stationary distribution of a continuous state-space Markov chain</p></div>
<p>In the upper left panel of the code output we see a close up of the first 100 of the 1000 transitions made by the 5 simultaneous Markov chains; the burn in cutoff is marked by the black line. In the lower left panel we see the entire sequence of transitions for the Markov chains. In the right panel, we can tell from the sampled states that the stationary distribution for this chain is a Normal distribution, with mean equal to zero, and a variance equal to 1.3.</p>
<h2>Wrapping Up</h2>
<p>In the previous example we were able to deduce the stationary distribution of the Markov chain by looking at the samples generated from the chain after the burn in period. However, in order to use Markov chains to sample from a specific target distribution, we have to design the transition operator such that the resulting chain reaches a stationary distribution that matches the target distribution. This is where MCMC methods like the Metropolis sampler, the Metropolis-Hastings sampler, and the Gibbs sampler come to rescue. We will discuss each of these Markov-chain-based sampling methods separately in later posts.</p>
<p><!--more--></p>
<p><sup>†</sup> On notation:</p>
<ul>
<li>Here we use the shorthand notation <img src='http://s0.wp.com/latex.php?latex=p%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(x)' title='p(x)' class='latex' /> to correspond to <img src='http://s0.wp.com/latex.php?latex=p%28X+%3D+x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(X = x)' title='p(X = x)' class='latex' />, for some random variable <img src='http://s0.wp.com/latex.php?latex=X&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='X' title='X' class='latex' /></li>
<li>A superscript in parentheses is an index into an iteration or point in time, not a power</li>
</ul>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Rejection Sampling]]></title>
<link>http://theclevermachine.wordpress.com/2012/09/10/rejection-sampling/</link>
<pubDate>Tue, 11 Sep 2012 04:58:44 +0000</pubDate>
<dc:creator>dustinstansbury</dc:creator>
<guid>http://theclevermachine.wordpress.com/2012/09/10/rejection-sampling/</guid>
<description><![CDATA[Suppose that we want to sample from a distribution that is difficult or impossible to sample from di]]></description>
<content:encoded><![CDATA[<p>Suppose that we want to sample from a distribution <img src='http://s0.wp.com/latex.php?latex=f%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='f(x)' title='f(x)' class='latex' /> that is difficult or impossible to sample from directly, but instead have a simpler distribution <img src='http://s0.wp.com/latex.php?latex=q%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='q(x)' title='q(x)' class='latex' /> from which sampling is easy.  The idea behind Rejection sampling (aka Acceptance-rejection sampling) is to sample from <img src='http://s0.wp.com/latex.php?latex=q%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='q(x)' title='q(x)' class='latex' /> and apply some rejection/acceptance criterion such that the samples that are accepted are distributed according to <img src='http://s0.wp.com/latex.php?latex=f%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='f(x)' title='f(x)' class='latex' />.</p>
<h2>Envelope distribution and rejection criterion</h2>
<p>In order to be able to reject samples from <img src='http://s0.wp.com/latex.php?latex=q%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='q(x)' title='q(x)' class='latex' /> such that they are sampled from <img src='http://s0.wp.com/latex.php?latex=f%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='f(x)' title='f(x)' class='latex' />, <img src='http://s0.wp.com/latex.php?latex=q%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='q(x)' title='q(x)' class='latex' /> must &#8220;cover&#8221; or envelop the distribution <img src='http://s0.wp.com/latex.php?latex=f%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='f(x)' title='f(x)' class='latex' />. This is generally done by choosing a constant <img src='http://s0.wp.com/latex.php?latex=c+%26%2362%3B+1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='c &gt; 1' title='c &gt; 1' class='latex' /> such that  <img src='http://s0.wp.com/latex.php?latex=cq%28x%29+%26%2362%3B+f%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='cq(x) &gt; f(x)' title='cq(x) &gt; f(x)' class='latex' /> for all <img src='http://s0.wp.com/latex.php?latex=x&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x' title='x' class='latex' />. For this reason <img src='http://s0.wp.com/latex.php?latex=cq%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='cq(x)' title='cq(x)' class='latex' /> is often called the <em>envelope distribution</em>. A common criterion for accepting samples from <img src='http://s0.wp.com/latex.php?latex=x+%5Csim+q%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x &#92;sim q(x)' title='x &#92;sim q(x)' class='latex' /> is based on the ratio of the target distribution to that of the envelope distribution. The samples are accepted if</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cfrac%7Bf%28x%29%7D%7Bcq%28x%29%7D+%26%2362%3B+u&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;frac{f(x)}{cq(x)} &gt; u' title='&#92;frac{f(x)}{cq(x)} &gt; u' class='latex' /></p>
<p>where <img src='http://s0.wp.com/latex.php?latex=u+%5Csim+Unif%280%2C1%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='u &#92;sim Unif(0,1)' title='u &#92;sim Unif(0,1)' class='latex' />, and rejected otherwise. If the ratio is close to one, then <img src='http://s0.wp.com/latex.php?latex=f%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='f(x)' title='f(x)' class='latex' /> must have a large amount of probability mass around <img src='http://s0.wp.com/latex.php?latex=x&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x' title='x' class='latex' /> and that sample should  be more likely accepted. If the ratio is small, then it means that <img src='http://s0.wp.com/latex.php?latex=f%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='f(x)' title='f(x)' class='latex' /> has low probability mass around <img src='http://s0.wp.com/latex.php?latex=x&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x' title='x' class='latex' /> and we should be less likely to accept the sample. This criterion is demonstrated in the chunk of MATLAB code and the resulting figure below:</p>
<pre class="brush: matlabkey; title: ; notranslate" title="">
rand('seed',12345);
x = -10:.1:10;
% CREATE A &#34;COMPLEX DISTRIBUTION&#34; f(x) AS A MIXTURE OF TWO NORMAL
% DISTRIBUTIONS
f = inline('normpdf(x,3,2) + normpdf(x,-5,1)','x');
t = plot(x,f(x),'b','linewidth',2); hold on;

% PROPOSAL IS A CENTERED NORMAL DISTRIBUTION
q = inline('normpdf(x,0,4)','x');

% DETERMINE SCALING CONSTANT
c = max(f(x)./q(x))

%PLOT SCALED PROPOSAL/ENVELOP DISTRIBUTION
p = plot(x,c*q(x),'k--');

% DRAW A SAMPLE FROM q(x);
qx = normrnd(0,4);
fx = f(qx);

% PLOT THE RATIO OF f(q(x)) to cq(x)
a = plot([qx,qx],[0 fx],'g','Linewidth',2);
r = plot([qx,qx],[fx,c*q(qx)],'r','Linewidth',2);
legend([t,p,a,r],{'Target','Proposal','Accept','Reject'});
xlabel('x');
</pre>
<div id="attachment_902" class="wp-caption aligncenter" style="width: 520px"><a href="http://theclevermachine.files.wordpress.com/2012/09/rejectionsamplingcriterion.png"><img class="size-full wp-image-902" title="rejectionSamplingCriterion" alt="" src="http://theclevermachine.files.wordpress.com/2012/09/rejectionsamplingcriterion.png?w=510&#038;h=382" height="382" width="510" /></a><p class="wp-caption-text">Rejection Sampling with a Normal proposal distribution</p></div>
<p>Here a zero-mean Normal distribution is used as the proposal distribution. This distribution is scaled by a factor <img src='http://s0.wp.com/latex.php?latex=c+%3D+9.2&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='c = 9.2' title='c = 9.2' class='latex' />, determined from <img src='http://s0.wp.com/latex.php?latex=f%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='f(x)' title='f(x)' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=q%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='q(x)' title='q(x)' class='latex' /> to ensure that the proposal distribution covers <img src='http://s0.wp.com/latex.php?latex=f%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='f(x)' title='f(x)' class='latex' />. We then sample from <img src='http://s0.wp.com/latex.php?latex=q%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='q(x)' title='q(x)' class='latex' />, and compare the proportion of <img src='http://s0.wp.com/latex.php?latex=cq%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='cq(x)' title='cq(x)' class='latex' /> occupied by <img src='http://s0.wp.com/latex.php?latex=f%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='f(x)' title='f(x)' class='latex' />. If we compare this proportion to a random number sampled  from <img src='http://s0.wp.com/latex.php?latex=Unif%280%2C1%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='Unif(0,1)' title='Unif(0,1)' class='latex' /> (i.e. the criterion outlined above), then we would accept this sample with probability proportional to the length of the green line segment and reject the sample with probability proportional to the length of the red line segment.</p>
<h2>Rejection sampling of a random discrete distribution</h2>
<p>This next example shows how rejection sampling can be used to sample from any arbitrary distribution, continuous or not, and with or without an analytic probability density function.</p>
<div id="attachment_706" class="wp-caption aligncenter" style="width: 520px"><a href="http://theclevermachine.files.wordpress.com/2012/09/rejectionsamplingtargetproposal.png"><img class="size-full wp-image-706" title="rejectionSamplingTargetProposal" alt="" src="http://theclevermachine.files.wordpress.com/2012/09/rejectionsamplingtargetproposal.png?w=510&#038;h=382" height="382" width="510" /></a><p class="wp-caption-text">Random Discrete Target Distribution and Proposal that Bounds It.</p></div>
<p>The figure above shows a random <strong><em>discrete</em></strong> probability density function <img src='http://s0.wp.com/latex.php?latex=f%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='f(x)' title='f(x)' class='latex' /> generated on the interval (0,15). We will use rejection sampling as described above to sample from <img src='http://s0.wp.com/latex.php?latex=f%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='f(x)' title='f(x)' class='latex' />. Our proposal/envelope distribution is the uniform discrete distribution on the same interval (i.e. any of the integers from 1-15 are equally probable) multiplied by a constant <img src='http://s0.wp.com/latex.php?latex=c&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='c' title='c' class='latex' /> that is determined such that the maximum value of <img src='http://s0.wp.com/latex.php?latex=f%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='f(x)' title='f(x)' class='latex' /> lies under (or equal to) <img src='http://s0.wp.com/latex.php?latex=cq%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='cq(x)' title='cq(x)' class='latex' />.</p>
<p><div id="attachment_702" class="wp-caption aligncenter" style="width: 528px"><a href="http://theclevermachine.files.wordpress.com/2012/09/rejectionsamplingdiscrete.png"><img class="size-full wp-image-702" title="rejectionSamplingDiscrete" alt="" src="http://theclevermachine.files.wordpress.com/2012/09/rejectionsamplingdiscrete.png?w=518&#038;h=388" height="388" width="518" /></a><p class="wp-caption-text">Rejection Samples For Discrete Distribution on interval [1 15]</p></div>Plotted above is the target distribution (in red) along with the discrete samples obtained using the rejection sampling. The MATLAB code used to sample from the target distribution and display the plot above is here:</p>
<pre class="brush: matlabkey; title: ; notranslate" title="">
rand('seed',12345)
randn('seed',12345)

fLength = 15;
% CREATE A RANDOM DISTRIBUTION ON THE INTERVAL [1 fLength]
f = rand(1,fLength); f = f/sum(f);

figure; h = plot(f,'r','Linewidth',2);
hold on;
l = plot([1 fLength],[max(f) max(f)],'k','Linewidth',2);

legend([h,l],{'f(x)','q(x)'},'Location','Southwest');
xlim([0 fLength + 1])
xlabel('x');
ylabel('p(x)');
title('Target (f(x)) and Proposal (q(x)) Distributions');

% OUR PROPOSAL IS THE DISCRETE UNIFORM ON THE INTERVAL [1 fLength]
% SO OUR CONSTANT IS
c = max(f/(1/fLength));

nSamples = 10000;
i = 1;
while i &#60; nSamples
   proposal = unidrnd(fLength);
   q = c*/fLenth; % ENVELOPE DISTRIBUTION
   if rand &#60; f(proposal)/q
      samps(i) = proposal;
      i = i + 1;
   end
end

% DISPLAY THE SAMPLES AND COMPARE TO THE TARGET DISTRIBUTION
bins = 1:fLength;
counts = histc(samps,bins);
figure
b = bar(1:fLength,counts/sum(counts),'FaceColor',[.8 .8 .8])
hold on;
h = plot(f,'r','Linewidth',2)
legend([h,b],{'f(x)','samples'});
xlabel('x'); ylabel('p(x)');
xlim([0 fLength + 1]);
</pre>
<h2>Rejection sampling from the unit circle to estimate <img src='http://s0.wp.com/latex.php?latex=%5Cpi&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;pi' title='&#92;pi' class='latex' /></h2>
<p>Though the ratio-based acceptance-rejection criterion introduced above is a common choice for drawing samples from complex distributions, it is not the only criterion we could use. For instance we could use a different set of criteria to generate some geometrically-bounded distribution. If we wanted to generate points uniformly within the unit circle (i.e. a circle centered at <img src='http://s0.wp.com/latex.php?latex=%28y%2Cx%29+%3D+0&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='(y,x) = 0' title='(y,x) = 0' class='latex' /> and with radius <img src='http://s0.wp.com/latex.php?latex=r+%3D+1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='r = 1' title='r = 1' class='latex' />), we could do so by sampling Cartesian spatial coordinates <img src='http://s0.wp.com/latex.php?latex=x&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x' title='x' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=y&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='y' title='y' class='latex' /> uniformly from the interval (-1,1)&#8211;which samples form a square centered at (0,0)&#8211;and reject those points that lie outside of the radius <img src='http://s0.wp.com/latex.php?latex=r+%3D+%5Csqrt%7Bx%5E2+%2B+y%5E2%7D+%3D+1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='r = &#92;sqrt{x^2 + y^2} = 1' title='r = &#92;sqrt{x^2 + y^2} = 1' class='latex' /></p>
<div id="attachment_699" class="wp-caption aligncenter" style="width: 520px"><a href="http://theclevermachine.files.wordpress.com/2012/09/unitcircleinsquare2.png"><img class="size-full wp-image-699" title="unitCircleInSquare" alt="" src="http://theclevermachine.files.wordpress.com/2012/09/unitcircleinsquare2.png?w=510&#038;h=382" height="382" width="510" /></a><p class="wp-caption-text">Unit Circle Inscribed in Square</p></div>
<div>Something clever that we can do with such a set of samples is to approximate the value <img src='http://s0.wp.com/latex.php?latex=%5Cpi&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;pi' title='&#92;pi' class='latex' />: Because a square that inscribes the unit circle has area:</div>
<p><img src='http://s0.wp.com/latex.php?latex=A_%7Bsquare%7D+%3D+%282r%29%5E2+%3D+4r%5E2&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='A_{square} = (2r)^2 = 4r^2' title='A_{square} = (2r)^2 = 4r^2' class='latex' /></p>
<p>and the unit circle has the area:</p>
<p><img src='http://s0.wp.com/latex.php?latex=A_%7Bcircle%7D+%3D+%5Cpi+r%5E2&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='A_{circle} = &#92;pi r^2' title='A_{circle} = &#92;pi r^2' class='latex' /></p>
<p>We can use the ratio of their areas to approximate <img src='http://s0.wp.com/latex.php?latex=%5Cpi&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;pi' title='&#92;pi' class='latex' />:</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cpi+%3D+4%5Cfrac%7BA_%7Bcircle%7D%7D%7BA_%7Bsquare%7D%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;pi = 4&#92;frac{A_{circle}}{A_{square}}' title='&#92;pi = 4&#92;frac{A_{circle}}{A_{square}}' class='latex' /></p>
<p>The figure below shows the rejection sampling process and the resulting estimate of <img src='http://s0.wp.com/latex.php?latex=%5Cpi&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;pi' title='&#92;pi' class='latex' /> from the samples. One-hundred thousand 2D points are sampled uniformly from the interval (-1,1).  Those points that lie within the unit circle are plotted as blue dots. Those points that lie outside of the unit circle are plotted as red x&#8217;s. If we take four times the ratio of the area in blue to the entire area, we get a very close approximation to 3.14 for <img src='http://s0.wp.com/latex.php?latex=%5Cpi&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;pi' title='&#92;pi' class='latex' />.</p>
<div id="attachment_697" class="wp-caption aligncenter" style="width: 520px"><a href="http://theclevermachine.files.wordpress.com/2012/09/rejectionsamplingpi.png"><img class="size-full wp-image-697" title="rejectionSamplingPi" alt="" src="http://theclevermachine.files.wordpress.com/2012/09/rejectionsamplingpi.png?w=510&#038;h=382" height="382" width="510" /></a><p class="wp-caption-text">Rejection Criterion</p></div>
<p>The MATLAB code used to generate the example figures is below:</p>
<pre class="brush: matlabkey; title: ; notranslate" title="">
% DISPLAY A CIRCLE INSCRIBED IN A SQUARE

figure;
a = 0:.01:2*pi;
x = cos(a); y = sin(a);
hold on
plot(x,y,'k','Linewidth',2)

t = text(0.5, 0.05,'r');
l = line([0 1],[0 0],'Linewidth',2);
axis equal
box on
xlim([-1 1])
ylim([-1 1])
title('Unit Circle Inscribed in a Square')

pause;
rand('seed',12345)
randn('seed',12345)
delete(l); delete(t);

% DRAW SAMPLES FROM PROPOSAL DISTRIBUTION
samples = 2*rand(2,100000) - 1;

% REJECTION
reject = sum(samples.^2) &#62; 1;

% DISPLAY REJECTION CRITERION
scatter(samples(1,~reject),samples(2,~reject),'b.')
scatter(samples(1,reject),samples(2,reject),'rx')
hold off
xlim([-1 1])
ylim([-1 1])

piHat = mean(sum(samples.*samples)&#60;1)*4;

title(['Estimate of \pi = ',num2str(piHat)]);
</pre>
<h2>Wrapping Up</h2>
<p>Rejection sampling is a simple way to generate samples from complex distributions. However, Rejection sampling also has a number of weaknesses:</p>
<ul>
<li>Finding a proposal distribution that can cover the support of the target distribution is a non-trivial task.</li>
<li>Additionally, as the dimensionality of the target distribution increases, the proportion of points that are rejected also increases. This curse of dimensionality makes rejection sampling an inefficient technique for sampling multi-dimensional distributions, as the majority of the points proposed are not accepted as valid samples.</li>
<li>Some of these problems are solved by changing the form of the proposal distribution to &#8220;hug&#8221; the target distribution as we gain knowledge of the target from observing accepted samples. Such a process is called <em>Adaptive Rejection Sampling</em>, which will be covered in another post.</li>
</ul>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Inverse Transform Sampling]]></title>
<link>http://theclevermachine.wordpress.com/2012/09/09/inverse-transform-sampling/</link>
<pubDate>Mon, 10 Sep 2012 04:15:21 +0000</pubDate>
<dc:creator>dustinstansbury</dc:creator>
<guid>http://theclevermachine.wordpress.com/2012/09/09/inverse-transform-sampling/</guid>
<description><![CDATA[There are a number of sampling methods used in machine learning, each of which has various strengths]]></description>
<content:encoded><![CDATA[<p>There are a number of sampling methods used in machine learning, each of which has various strengths and/or weaknesses depending on the nature of the sampling task at hand. One simple method for generating samples from distributions with closed-form descriptions is Inverse Transform (IT) Sampling.</p>
<p>The idea behind IT Sampling is that the probability mass for a random variable <img src='http://s0.wp.com/latex.php?latex=X&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='X' title='X' class='latex' /> distributed according to the probability density function <img src='http://s0.wp.com/latex.php?latex=f%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='f(x)' title='f(x)' class='latex' /> integrates to one and therefore the cumulative distribution function <img src='http://s0.wp.com/latex.php?latex=C%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='C(x)' title='C(x)' class='latex' /> can be used to map from values in the interval <img src='http://s0.wp.com/latex.php?latex=%280%2C1%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='(0,1)' title='(0,1)' class='latex' /> (i.e. probabilities) to the domain of <img src='http://s0.wp.com/latex.php?latex=f%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='f(x)' title='f(x)' class='latex' />. Because it is easy to sample values <img src='http://s0.wp.com/latex.php?latex=z&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='z' title='z' class='latex' /> uniformly from the interval <img src='http://s0.wp.com/latex.php?latex=%280%2C1%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='(0,1)' title='(0,1)' class='latex' />, we can use the inverse of the CDF <img src='http://s0.wp.com/latex.php?latex=C%28x%29%5E%7B-1%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='C(x)^{-1}' title='C(x)^{-1}' class='latex' /> to transform these sampled probabilities into samples <img src='http://s0.wp.com/latex.php?latex=x&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x' title='x' class='latex' />. The code below demonstrates this process at work in order to sample from a student&#8217;s t distribution with 10 degrees of freedom.</p>
<pre class="brush: matlabkey; title: ; notranslate" title="">
rand('seed',12345)

% DEGREES OF FREEDOM
dF = 10;
x = -3:.1:3;
Cx = cdf('t',x,dF)
z = rand;

% COMPARE VALUES OF
zIdx = min(find(Cx&#62;z));

% DRAW SAMPLE
sample = x(zIdx);

% DISPLAY
figure; hold on
plot(x,Cx,'k','Linewidth',2);
plot([x(1),x(zIdx)],[Cx(zIdx),Cx(zIdx)],'r','LineWidth',2);
plot([x(zIdx),x(zIdx)],[Cx(zIdx),0],'b','LineWidth',2);
plot(x(zIdx),z,'ko','LineWidth',2);
text(x(1)+.1,z + .05,'z','Color','r')
text(x(zIdx)+.05,.05,'x_{sampled}','Color','b')
ylabel('C(x)')
xlabel('x')
hold off
</pre>
<div id="attachment_807" class="wp-caption aligncenter" style="width: 520px"><a href="http://theclevermachine.files.wordpress.com/2012/09/inversetransformstudent3.png"><img class="size-full wp-image-807" title="inverseTransformStudent" alt="" src="http://theclevermachine.files.wordpress.com/2012/09/inversetransformstudent3.png?w=510&#038;h=382" height="382" width="510" /></a><p class="wp-caption-text">IT Sampling from student&#8217;s-t(10)</p></div>
<p>However, the scheme used to create to plot above is inefficient in that one must compare current values of <img src='http://s0.wp.com/latex.php?latex=z&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='z' title='z' class='latex' /> with the <img src='http://s0.wp.com/latex.php?latex=C%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='C(x)' title='C(x)' class='latex' /> for all values of <img src='http://s0.wp.com/latex.php?latex=x&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x' title='x' class='latex' />. A much more efficient method is to evaluate <img src='http://s0.wp.com/latex.php?latex=C%5E%7B-1%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='C^{-1}' title='C^{-1}' class='latex' /> directly:</p>
<ol>
<li>Derive <img src='http://s0.wp.com/latex.php?latex=C%5E%7B-1%7D%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='C^{-1}(x)' title='C^{-1}(x)' class='latex' /> (or a good approximation) from <img src='http://s0.wp.com/latex.php?latex=f%28x%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='f(x)' title='f(x)' class='latex' /></li>
<li>for <img src='http://s0.wp.com/latex.php?latex=i+%3D+1%3An&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='i = 1:n' title='i = 1:n' class='latex' /></li>
</ol>
<ul>
<li>- draw <img src='http://s0.wp.com/latex.php?latex=z_i&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='z_i' title='z_i' class='latex' /> from <img src='http://s0.wp.com/latex.php?latex=Unif%280%2C1%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='Unif(0,1)' title='Unif(0,1)' class='latex' /></li>
<li>- <img src='http://s0.wp.com/latex.php?latex=x_i+%3D+CDF%5E%7B-1%7D%28z_i%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_i = CDF^{-1}(z_i)' title='x_i = CDF^{-1}(z_i)' class='latex' /></li>
<li>- end for</li>
</ul>
<p>The IT sampling process is demonstrated in the next chunk of code to sample from the Beta distribution, a distribution for which <img src='http://s0.wp.com/latex.php?latex=C%5E%7B-1%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='C^{-1}' title='C^{-1}' class='latex' />  is easy to approximate using Netwon&#8217;s method (which we let MATLAB do for us within the function icdf.m)</p>
<pre class="brush: matlabkey; title: ; notranslate" title="">
rand('seed',12345)
nSamples = 1000;

% BETA PARAMETERS
alpha = 2; beta = 10;

% DRAW PROPOSAL SAMPLES
z = rand(1,nSamples);

% EVALUATE PROPOSAL SAMPLES AT INVERSE CDF
samples = icdf('beta',z,alpha,beta);
bins = linspace(0,1,50);
counts = histc(samples,bins);
probSampled = counts/sum(counts)
probTheory = betapdf(bins,alpha,beta);

% DISPLAY
b = bar(bins,probSampled,'FaceColor',[.9 .9 .9]);
hold on;
t = plot(bins,probTheory/sum(probTheory),'r','LineWidth',2);
xlim([0 1])
xlabel('x')
ylabel('p(x)')
legend([t,b],{'Theory','IT Samples'})
hold off
</pre>
<div id="attachment_770" class="wp-caption aligncenter" style="width: 520px"><a href="http://theclevermachine.files.wordpress.com/2012/09/inversetransformbeta.png"><img class="size-full wp-image-770" title="inverseTransformBeta" alt="" src="http://theclevermachine.files.wordpress.com/2012/09/inversetransformbeta.png?w=510&#038;h=382" height="382" width="510" /></a><p class="wp-caption-text">Inverse Transform Sampling of Beta(2,10)</p></div>
<h2>Wrapping Up</h2>
<p>The IT sampling method is generally only used for univariate distributions where <img src='http://s0.wp.com/latex.php?latex=C%5E%7B-1%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='C^{-1}' title='C^{-1}' class='latex' /> can be computed in closed form, or approximated. However, it is a nice example of how uniform random variables can be used to sample from much more complicated distributions.</p>
<h2></h2>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Multivariate kernel density estimation]]></title>
<link>http://maikolsolis.wordpress.com/2012/03/27/multivariate-kernel-density-estimation/</link>
<pubDate>Tue, 27 Mar 2012 13:46:19 +0000</pubDate>
<dc:creator>Maikol Solís</dc:creator>
<guid>http://maikolsolis.wordpress.com/2012/03/27/multivariate-kernel-density-estimation/</guid>
<description><![CDATA[Briefly, we shall see the definition of a kernel density estimator in the multivariate case. Suppose]]></description>
<content:encoded><![CDATA[<p>Briefly, we shall see the definition of a <a class="zem_slink" title="Kernel density estimation" href="http://en.wikipedia.org/wiki/Kernel_density_estimation" target="_blank" rel="wikipedia">kernel density</a> estimator in the multivariate case.</p>
<p>Suppose that the data is d-dimensional so that <img src='http://s0.wp.com/latex.php?latex=%7BX_%7Bi%7D%3D%28X_%7Bi1%7D%2C%5Cldots%2CX_%7Bid%7D%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X_{i}=(X_{i1},&#92;ldots,X_{id})}&amp;fg=000000' title='{X_{i}=(X_{i1},&#92;ldots,X_{id})}&amp;fg=000000' class='latex' />. We will use the product kernel</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Chat%7Bf%7D_%7Bh%7D%28x%29%3D%5Cfrac%7B1%7D%7Bnh_%7B1%7D%5Ccdots+h_%7Bd%7D%7D%5Cleft%5C%7B+%5Cprod_%7Bj%3D1%7D%5E%7Bd%7DK%5Cleft%28%5Cfrac%7Bx_%7Bj%7D-X_%7Bij%7D%7D%7Bh_%7Bj%7D%7D%5Cright%29%5Cright%5C%7D+.+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;hat{f}_{h}(x)=&#92;frac{1}{nh_{1}&#92;cdots h_{d}}&#92;left&#92;{ &#92;prod_{j=1}^{d}K&#92;left(&#92;frac{x_{j}-X_{ij}}{h_{j}}&#92;right)&#92;right&#92;} . &amp;fg=000000' title='&#92;displaystyle &#92;hat{f}_{h}(x)=&#92;frac{1}{nh_{1}&#92;cdots h_{d}}&#92;left&#92;{ &#92;prod_{j=1}^{d}K&#92;left(&#92;frac{x_{j}-X_{ij}}{h_{j}}&#92;right)&#92;right&#92;} . &amp;fg=000000' class='latex' /></p>
<p>The risk is given by</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cmathrm%7BMISE%7D%5Capprox%5Cfrac%7B%5Cleft%28%5Cmu_%7B2%7D%28K%29%5Cright%29%5E%7B4%7D%7D%7B4%7D%5Cleft%5B%5Csum_%7Bj%3D1%7D%5E%7Bd%7Dh_%7Bj%7D%5E%7B4%7D%5Cint+f_%7Bjj%7D%5E%7B2%7D%28x%29dx%2B%5Csum_%7Bj%5Cneq+k%7Dh_%7Bj%7D%5E%7B2%7Dh_%7Bk%7D%5E%7B2%7D%5Cint+f_%7Bjj%7Df_%7Bkk%7Ddx%5Cright%5D%2B%5Cfrac%7B%5Cleft%28%5Cint+K%5E%7B2%7D%28x%29dx%5Cright%29%5E%7Bd%7D%7D%7Bnh_%7B1%7D%5Ccdots+h_%7Bd%7D%7D+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;mathrm{MISE}&#92;approx&#92;frac{&#92;left(&#92;mu_{2}(K)&#92;right)^{4}}{4}&#92;left[&#92;sum_{j=1}^{d}h_{j}^{4}&#92;int f_{jj}^{2}(x)dx+&#92;sum_{j&#92;neq k}h_{j}^{2}h_{k}^{2}&#92;int f_{jj}f_{kk}dx&#92;right]+&#92;frac{&#92;left(&#92;int K^{2}(x)dx&#92;right)^{d}}{nh_{1}&#92;cdots h_{d}} &amp;fg=000000' title='&#92;displaystyle &#92;mathrm{MISE}&#92;approx&#92;frac{&#92;left(&#92;mu_{2}(K)&#92;right)^{4}}{4}&#92;left[&#92;sum_{j=1}^{d}h_{j}^{4}&#92;int f_{jj}^{2}(x)dx+&#92;sum_{j&#92;neq k}h_{j}^{2}h_{k}^{2}&#92;int f_{jj}f_{kk}dx&#92;right]+&#92;frac{&#92;left(&#92;int K^{2}(x)dx&#92;right)^{d}}{nh_{1}&#92;cdots h_{d}} &amp;fg=000000' class='latex' /></p>
<div class="wp-caption aligncenter" style="width: 310px"><a href="http://commons.wikipedia.org/wiki/File:Bivariate_example.png" target="_blank"><img class="zemanta-img-inserted zemanta-img-configured" title="Diagram explaining mathematics of density esti..." alt="Diagram explaining mathematics of density esti..." src="http://upload.wikimedia.org/wikipedia/commons/thumb/1/1b/Bivariate_example.png/300px-Bivariate_example.png" height="236" width="300" /></a><p class="wp-caption-text">Multivariate density estimation.  (Photo credit: Wikipedia)</p></div>
<p><!--more-->where <img src='http://s0.wp.com/latex.php?latex=%7Bf_%7Bjj%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{f_{jj}}&amp;fg=000000' title='{f_{jj}}&amp;fg=000000' class='latex' /> is the second <a class="zem_slink" title="Partial derivative" href="http://en.wikipedia.org/wiki/Partial_derivative" target="_blank" rel="wikipedia">partial derivative</a> of <img src='http://s0.wp.com/latex.php?latex=%7Bf%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{f}&amp;fg=000000' title='{f}&amp;fg=000000' class='latex' />. The optimal bandwidth satisfies <img src='http://s0.wp.com/latex.php?latex=%7Bh_%7Bi%7D%3DO%28n%5E%7B-1%2F%284%2Bd%29%7D%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{h_{i}=O(n^{-1/(4+d)})}&amp;fg=000000' title='{h_{i}=O(n^{-1/(4+d)})}&amp;fg=000000' class='latex' /> leading to a risk of order <img src='http://s0.wp.com/latex.php?latex=%7BO%28n%5E%7B-4%2F%284%2Bd%29%7D%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{O(n^{-4/(4+d)})}&amp;fg=000000' title='{O(n^{-4/(4+d)})}&amp;fg=000000' class='latex' /> (for further details see Hardle (2004)).</p>
<p>The interesting effect of <img src='http://s0.wp.com/latex.php?latex=%7BO%28n%5E%7B-4%2F%284%2Bd%29%7D%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{O(n^{-4/(4+d)})}&amp;fg=000000' title='{O(n^{-4/(4+d)})}&amp;fg=000000' class='latex' /> here is that the risk increase exponentially as the dimension grows. We call to this behavior the <strong><a class="zem_slink" title="Curse of dimensionality" href="http://en.wikipedia.org/wiki/Curse_of_dimensionality" target="_blank" rel="wikipedia">curse of dimensionality</a></strong>. This phenomena says that the data is more sparse as we increase the dimensionality. This table from Silverman (1986) shows the sample size required to ensure a relative <a class="zem_slink" title="Mean squared error" href="http://en.wikipedia.org/wiki/Mean_squared_error" target="_blank" rel="wikipedia">mean squared error</a> less than 0.1 at 0 when the density is <a class="zem_slink" title="Multivariate normal distribution" href="http://en.wikipedia.org/wiki/Multivariate_normal_distribution" target="_blank" rel="wikipedia">multivariate normal</a> and the optimal bandwidth is selected.</p>
<table id="TBL-3" class="tabularaligncenter">
<tbody>
<tr style="vertical-align:baseline;">
<td class="td11" style="white-space:nowrap;text-align:center;">Dimension</td>
<td class="td11" style="white-space:nowrap;text-align:center;"><a class="zem_slink" title="Sample size determination" href="http://en.wikipedia.org/wiki/Sample_size_determination" target="_blank" rel="wikipedia">Sample size</a></td>
</tr>
<tr class="hline">
<td>
<hr />
</td>
<td>
<hr />
</td>
</tr>
<tr style="vertical-align:baseline;">
<td class="td11" style="white-space:nowrap;text-align:center;">1</td>
<td class="td11" style="white-space:nowrap;text-align:center;">4</td>
</tr>
<tr style="vertical-align:baseline;">
<td class="td11" style="white-space:nowrap;text-align:center;">2</td>
<td class="td11" style="white-space:nowrap;text-align:center;">19</td>
</tr>
<tr style="vertical-align:baseline;">
<td class="td11" style="white-space:nowrap;text-align:center;">3</td>
<td class="td11" style="white-space:nowrap;text-align:center;">67</td>
</tr>
<tr style="vertical-align:baseline;">
<td class="td11" style="white-space:nowrap;text-align:center;">4</td>
<td class="td11" style="white-space:nowrap;text-align:center;">223</td>
</tr>
<tr style="vertical-align:baseline;">
<td class="td11" style="white-space:nowrap;text-align:center;">5</td>
<td class="td11" style="white-space:nowrap;text-align:center;">768</td>
</tr>
<tr style="vertical-align:baseline;">
<td class="td11" style="white-space:nowrap;text-align:center;">6</td>
<td class="td11" style="white-space:nowrap;text-align:center;">2790</td>
</tr>
<tr style="vertical-align:baseline;">
<td class="td11" style="white-space:nowrap;text-align:center;">7</td>
<td class="td11" style="white-space:nowrap;text-align:center;">10,700</td>
</tr>
<tr style="vertical-align:baseline;">
<td class="td11" style="white-space:nowrap;text-align:center;">8</td>
<td class="td11" style="white-space:nowrap;text-align:center;">43,700</td>
</tr>
<tr style="vertical-align:baseline;">
<td class="td11" style="white-space:nowrap;text-align:center;">9</td>
<td class="td11" style="white-space:nowrap;text-align:center;">187,000</td>
</tr>
<tr style="vertical-align:baseline;">
<td class="td11" style="white-space:nowrap;text-align:center;">10</td>
<td class="td11" style="white-space:nowrap;text-align:center;">842,000</td>
</tr>
</tbody>
</table>
<p>For this reason it is important to search methods for dimension reduction. One of these methods was proposed by Li (1991) in its article Sliced Inverse Regression for Dimension Reduction. I used this method to find another <a class="zem_slink" title="Efficient estimator" href="http://en.wikipedia.org/wiki/Efficient_estimator" target="_blank" rel="wikipedia">efficient estimator</a> based in a <a class="zem_slink" title="Taylor series" href="http://en.wikipedia.org/wiki/Taylor_series" target="_blank" rel="wikipedia">Taylor approximation</a> (see Solís Chacón, M et al. (2012) ). In a next post I going to talk a little about the details of those articles.</p>
<p><strong>Sources:</strong></p>
<ul>
<li>Hardle, W. (2004). <a class="zem_slink" title="Nonparametric and Semiparametric Models" href="http://www.amazon.com/Nonparametric-Semiparametric-Models-Wolfgang-H%C3%A4rdle/dp/3540207228%3FSubscriptionId%3D0G81C5DAZ03ZR9WH9X82%26tag%3Dzemanta-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D3540207228" target="_blank" rel="amazon">Nonparametric and Semiparametric Models</a>. <a class="zem_slink" title="Springer Science+Business Media" href="http://www.springer.com" target="_blank" rel="homepage">Springer</a> Series in Statistics. Springer.</li>
<li>Li, K.-C. (1991). Sliced Inverse Regression for Dimension Reduction. <a class="zem_slink" title="Journal of the American Statistical Association" href="http://www.amstat.org/PUBLICATIONS/jasa" target="_blank" rel="homepage">Journal of the American Statistical Association</a>, 86(414), 316-327. Retrieved from <a title="http://www.jstor.org/stable/2290563" href="http://www.jstor.org/stable/2290563">http://www.jstor.org/stable/2290563</a></li>
<li>Tsybakov, A. (2009). <em>Introduction to <a class="zem_slink" title="Non-parametric statistics" href="http://en.wikipedia.org/wiki/Non-parametric_statistics" target="_blank" rel="wikipedia">nonparametric estimation</a></em>. Springer.</li>
<li>Silverman, B. W. (1986). <a class="zem_slink" title="Density estimation" href="http://en.wikipedia.org/wiki/Density_estimation" target="_blank" rel="wikipedia">Density Estimation</a> for Statistics and <a class="zem_slink" title="Data analysis" href="http://en.wikipedia.org/wiki/Data_analysis" target="_blank" rel="wikipedia">Data Analysis</a>, volume 26. Chapman &#38; Hall/CRC.</li>
<li>Solís Chacón, M., Loubes, J.-M., Clement, M. &#38; Da Veiga, S. (2012). Efficient estimation of conditional covariance matrices for dimension reduction. Arxiv preprint <a class="zem_slink" title="ArXiv" href="http://arXiv.org/" target="_blank" rel="homepage">arXiv</a>: Retrieved from <a title="http://arxiv.org/abs/1110.3238" href="http://arxiv.org/abs/1110.3238">http://arxiv.org/abs/1110.3238</a></li>
</ul>
<p>&#160;</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Choosing the smoothing parameter]]></title>
<link>http://maikolsolis.wordpress.com/2012/03/19/smoothing-parameter/</link>
<pubDate>Sun, 18 Mar 2012 23:58:20 +0000</pubDate>
<dc:creator>Maikol Solís</dc:creator>
<guid>http://maikolsolis.wordpress.com/2012/03/19/smoothing-parameter/</guid>
<description><![CDATA[Two popular methods to find the bandwidth for the nonparametric density estimator are the plug-in me]]></description>
<content:encoded><![CDATA[<p>Two popular methods to find the bandwidth <img src='http://s0.wp.com/latex.php?latex=%7Bh%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{h}&amp;fg=000000' title='{h}&amp;fg=000000' class='latex' /> for the nonparametric density estimator are the plug-in method and the method cross-validation. The first one we will focus in the &#8220;quick and dirty&#8221; plug-in method introduced by Silverman (1986). In cross-validation we will minimize a modified version of the quadratic risk of <img src='http://s0.wp.com/latex.php?latex=%7B%5Chat%7Bf%7D_%7Bh%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;hat{f}_{h}}&amp;fg=000000' title='{&#92;hat{f}_{h}}&amp;fg=000000' class='latex' />.</p>
<h3><strong>The normal reference rule</strong></h3>
<p>This method works well only if the true density is very smooth. Assume that <img src='http://s0.wp.com/latex.php?latex=%7Bf%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{f}&amp;fg=000000' title='{f}&amp;fg=000000' class='latex' /> is normal distributed. Then we have</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+h_%7Bplug%7D%3D1.06%5Csigma+n%5E%7B-1%2F5%7D.+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle h_{plug}=1.06&#92;sigma n^{-1/5}. &amp;fg=000000' title='&#92;displaystyle h_{plug}=1.06&#92;sigma n^{-1/5}. &amp;fg=000000' class='latex' /><img title="More..." alt="" src="https://maikolsolis.wordpress.com/wp-includes/js/tinymce/plugins/wordpress/img/trans.gif" /><!--more--></p>
<p>Usually <img src='http://s0.wp.com/latex.php?latex=%7B%5Csigma%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;sigma}&amp;fg=000000' title='{&#92;sigma}&amp;fg=000000' class='latex' /> is estimated by <img src='http://s0.wp.com/latex.php?latex=%7B%5Cmin%5C%7Bs%2CQ%2F1.34%5C%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;min&#92;{s,Q/1.34&#92;}}&amp;fg=000000' title='{&#92;min&#92;{s,Q/1.34&#92;}}&amp;fg=000000' class='latex' /> where <img src='http://s0.wp.com/latex.php?latex=%7Bs%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{s}&amp;fg=000000' title='{s}&amp;fg=000000' class='latex' /> is the <a title="Standard deviation" href="http://en.wikipedia.org/wiki/Standard_deviation" target="_blank" rel="wikipedia">sample standard deviation</a> and <img src='http://s0.wp.com/latex.php?latex=%7BQ%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{Q}&amp;fg=000000' title='{Q}&amp;fg=000000' class='latex' /> is the <a title="Interquartile range" href="http://en.wikipedia.org/wiki/Interquartile_range" target="_blank" rel="wikipedia">interquartile range</a>. Recall that the interquartile range is the <img src='http://s0.wp.com/latex.php?latex=%7B75%5E%7B%5Ctext%7Bth%7D%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{75^{&#92;text{th}}}&amp;fg=000000' title='{75^{&#92;text{th}}}&amp;fg=000000' class='latex' /> percentile minus the <img src='http://s0.wp.com/latex.php?latex=%7B25%5E%7B%5Ctext%7Bth%7D%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{25^{&#92;text{th}}}&amp;fg=000000' title='{25^{&#92;text{th}}}&amp;fg=000000' class='latex' /> percentile. Here, <img src='http://s0.wp.com/latex.php?latex=%7BQ%2F1.34%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{Q/1.34}&amp;fg=000000' title='{Q/1.34}&amp;fg=000000' class='latex' /> gives a consistent estimate of <img src='http://s0.wp.com/latex.php?latex=%7B%5Csigma%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;sigma}&amp;fg=000000' title='{&#92;sigma}&amp;fg=000000' class='latex' /> if the data comes from a <img src='http://s0.wp.com/latex.php?latex=%7BN%28%5Cmu%2C%5Csigma%5E%7B2%7D%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{N(&#92;mu,&#92;sigma^{2})}&amp;fg=000000' title='{N(&#92;mu,&#92;sigma^{2})}&amp;fg=000000' class='latex' />.</p>
<p>We can summarize this method in the following way</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+h_%7Bplug%7D%3D1.06%5Cmin%5Cleft%5C%7B+%5Csqrt%7B%5Cfrac%7B1%7D%7Bn-1%7D%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%28X_%7Bi%7D-%5Cbar%7BX%7D%29%5E%7B2%7D%7D%2C%5Cfrac%7BQ%7D%7B1.34%7D%5Cright%5C%7D+n%5E%7B-1%2F5%7D.+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle h_{plug}=1.06&#92;min&#92;left&#92;{ &#92;sqrt{&#92;frac{1}{n-1}&#92;sum_{i=1}^{n}(X_{i}-&#92;bar{X})^{2}},&#92;frac{Q}{1.34}&#92;right&#92;} n^{-1/5}. &amp;fg=000000' title='&#92;displaystyle h_{plug}=1.06&#92;min&#92;left&#92;{ &#92;sqrt{&#92;frac{1}{n-1}&#92;sum_{i=1}^{n}(X_{i}-&#92;bar{X})^{2}},&#92;frac{Q}{1.34}&#92;right&#92;} n^{-1/5}. &amp;fg=000000' class='latex' /></p>
<h3><strong><a title="Cross-validation (statistics)" href="http://en.wikipedia.org/wiki/Cross-validation_%28statistics%29" target="_blank" rel="wikipedia">Cross-validation</a></strong></h3>
<p>Define the <em>integrated squared error </em>as</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cbegin%7Barray%7D%7Brl%7D+%5Cdisplaystyle+%7B%5Crm+ISE%7D%28%5Chat%7Bf%7D_%7Bh%7D%29+%26%2338%3B+%3D%5Cdisplaystyle+%5Cint%5Cleft%28%5Chat%7Bf%7D_%7Bh%7D%28x%29-f%28x%29%5Cright%29%5E%7B2%7Ddx%5Cnonumber+%5C%5C+%26%2338%3B+%3D%5Cdisplaystyle+%5Cint%5Chat%7Bf%7D_%7Bh%7D%5E%7B2%7D%28x%29dx-2%5Cint%5Chat%7Bf%7D_%7Bh%7D%28x%29f%28x%29dx%2B%5Cint+f%5E%7B2%7D%28x%29dx.+%5Cend%7Barray%7D+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;begin{array}{rl} &#92;displaystyle {&#92;rm ISE}(&#92;hat{f}_{h}) &amp; =&#92;displaystyle &#92;int&#92;left(&#92;hat{f}_{h}(x)-f(x)&#92;right)^{2}dx&#92;nonumber &#92;&#92; &amp; =&#92;displaystyle &#92;int&#92;hat{f}_{h}^{2}(x)dx-2&#92;int&#92;hat{f}_{h}(x)f(x)dx+&#92;int f^{2}(x)dx. &#92;end{array} &amp;fg=000000' title='&#92;displaystyle &#92;begin{array}{rl} &#92;displaystyle {&#92;rm ISE}(&#92;hat{f}_{h}) &amp; =&#92;displaystyle &#92;int&#92;left(&#92;hat{f}_{h}(x)-f(x)&#92;right)^{2}dx&#92;nonumber &#92;&#92; &amp; =&#92;displaystyle &#92;int&#92;hat{f}_{h}^{2}(x)dx-2&#92;int&#92;hat{f}_{h}(x)f(x)dx+&#92;int f^{2}(x)dx. &#92;end{array} &amp;fg=000000' class='latex' /></p>
<p>&#160;</p>
<p>Notice that the MISE is indeed the expected value of ISE. Our goal is minimize the ISE as small as possible. Remark that the last term in (<a href="#eqISE_expansion">0</a>) does not depends on <img src='http://s0.wp.com/latex.php?latex=%7Bh%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{h}&amp;fg=000000' title='{h}&amp;fg=000000' class='latex' />, so minimize this risk the is equivalent to minimizing the expected value of</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%7B%5Crm+ISE%7D%28%5Chat%7Bf%7D_%7Bh%7D%29-%5Cint+f%5E%7B2%7D%28x%29dx%3D%5Cint%5Chat%7Bf%7D_%7Bh%7D%5E%7B2%7D%28x%29dx-2%5Cint%5Chat%7Bf%7D_%7Bh%7D%28x%29f%28x%29dx+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle {&#92;rm ISE}(&#92;hat{f}_{h})-&#92;int f^{2}(x)dx=&#92;int&#92;hat{f}_{h}^{2}(x)dx-2&#92;int&#92;hat{f}_{h}(x)f(x)dx &amp;fg=000000' title='&#92;displaystyle {&#92;rm ISE}(&#92;hat{f}_{h})-&#92;int f^{2}(x)dx=&#92;int&#92;hat{f}_{h}^{2}(x)dx-2&#92;int&#92;hat{f}_{h}(x)f(x)dx &amp;fg=000000' class='latex' /></p>
<p>If we look closer the term <img src='http://s0.wp.com/latex.php?latex=%7B%5Cint%5Chat%7Bf%7D_%7Bh%7D%28x%29f%28x%29dx%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;int&#92;hat{f}_{h}(x)f(x)dx}&amp;fg=000000' title='{&#92;int&#92;hat{f}_{h}(x)f(x)dx}&amp;fg=000000' class='latex' /> we notice that is the expected value of <img src='http://s0.wp.com/latex.php?latex=%7B%5Cmathbb+E%28%5Chat%7Bf%7D_%7Bh%7D%28X%29%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;mathbb E(&#92;hat{f}_{h}(X))}&amp;fg=000000' title='{&#92;mathbb E(&#92;hat{f}_{h}(X))}&amp;fg=000000' class='latex' />. The straight estimate for this expected value is</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cfrac%7B1%7D%7Bn%7D%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%5Chat%7Bf%7D_%7Bh%7D%28X_%7Bi%7D%29%3D%5Cfrac%7B1%7D%7Bn%5E%7B2%7Dh%7D%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%5Csum_%7Bj%3D1%7D%5E%7Bn%7DK%5Cleft%28%5Cfrac%7BX_%7Bj%7D-X_%7Bi%7D%7D%7Bh%7D%5Cright%29.+%5C+%5C+%5C+%5C+%5C+%281%29%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;frac{1}{n}&#92;sum_{i=1}^{n}&#92;hat{f}_{h}(X_{i})=&#92;frac{1}{n^{2}h}&#92;sum_{i=1}^{n}&#92;sum_{j=1}^{n}K&#92;left(&#92;frac{X_{j}-X_{i}}{h}&#92;right). &#92; &#92; &#92; &#92; &#92; (1)&amp;fg=000000' title='&#92;displaystyle &#92;frac{1}{n}&#92;sum_{i=1}^{n}&#92;hat{f}_{h}(X_{i})=&#92;frac{1}{n^{2}h}&#92;sum_{i=1}^{n}&#92;sum_{j=1}^{n}K&#92;left(&#92;frac{X_{j}-X_{i}}{h}&#92;right). &#92; &#92; &#92; &#92; &#92; (1)&amp;fg=000000' class='latex' /></p>
<p style="text-align:center;">
<p>The problem with it is that the observations to estimate the expectation are dependent of the observations to estimate <img src='http://s0.wp.com/latex.php?latex=%7B%5Chat%7Bf%7D_%7Bh%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;hat{f}_{h}}&amp;fg=000000' title='{&#92;hat{f}_{h}}&amp;fg=000000' class='latex' />. The solution to solve this, it is remove the <img src='http://s0.wp.com/latex.php?latex=%7Bi%5E%7B%5Ctext%7Bth%7D%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{i^{&#92;text{th}}}&amp;fg=000000' title='{i^{&#92;text{th}}}&amp;fg=000000' class='latex' /> observation for <img src='http://s0.wp.com/latex.php?latex=%7B%5Chat%7Bf%7D_%7Bh%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;hat{f}_{h}}&amp;fg=000000' title='{&#92;hat{f}_{h}}&amp;fg=000000' class='latex' />. Then, we define the <a href="http://en.wikipedia.org/wiki/Cross-validation_%28statistics%29#Leave-one-out_cross-validation" target="_blank">leave-one-out cross-validation estimator</a> of <img src='http://s0.wp.com/latex.php?latex=%7B%5Cint%5Chat%7Bf%7D_%7Bh%7D%28x%29f%28x%29dx%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;int&#92;hat{f}_{h}(x)f(x)dx}&amp;fg=000000' title='{&#92;int&#92;hat{f}_{h}(x)f(x)dx}&amp;fg=000000' class='latex' /> as</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cfrac%7B1%7D%7Bn%7D%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%5Chat%7Bf%7D_%7Bh%2C-i%7D%28X_%7Bi%7D%29%2C+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;frac{1}{n}&#92;sum_{i=1}^{n}&#92;hat{f}_{h,-i}(X_{i}), &amp;fg=000000' title='&#92;displaystyle &#92;frac{1}{n}&#92;sum_{i=1}^{n}&#92;hat{f}_{h,-i}(X_{i}), &amp;fg=000000' class='latex' /></p>
<p>where</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Chat%7Bf%7D_%7Bh%2C-i%7D%28x%29%3D%5Cfrac%7B1%7D%7Bn-1%7D%5Cmathop%7B%5Csum_%7Bj%3D1%7D%5E%7Bn%7D%7D_%7Bj%5Cneq+i%7DK_%7Bh%7D%28x-X_%7Bj%7D%29.+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;hat{f}_{h,-i}(x)=&#92;frac{1}{n-1}&#92;mathop{&#92;sum_{j=1}^{n}}_{j&#92;neq i}K_{h}(x-X_{j}). &amp;fg=000000' title='&#92;displaystyle &#92;hat{f}_{h,-i}(x)=&#92;frac{1}{n-1}&#92;mathop{&#92;sum_{j=1}^{n}}_{j&#92;neq i}K_{h}(x-X_{j}). &amp;fg=000000' class='latex' /></p>
<p style="text-align:left;">The following figure illustrates the idea behind the leave-one cross validation. The idea is to take one data point as your test data and the rest as your training data for each iteration.</p>
<div class="wp-caption aligncenter" style="width: 537px"><img title="Leave-one-out cross-validation" alt="" src="http://upload.wikimedia.org/wikipedia/commons/2/2d/Leave-one-out.jpg" height="263" width="527" /><p class="wp-caption-text">Leave-one-out cross-validation</p></div>
<p>Following with the <img src='http://s0.wp.com/latex.php?latex=%7B%5Cint%5Chat%7Bf%7D_%7Bh%7D%5E%7B2%7D%28x%29dx%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;int&#92;hat{f}_{h}^{2}(x)dx}&amp;fg=000000' title='{&#92;int&#92;hat{f}_{h}^{2}(x)dx}&amp;fg=000000' class='latex' /> term we have</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cbegin%7Barray%7D%7Brl%7D+%5Cdisplaystyle+%5Cint%5Chat%7Bf%7D_%7Bh%7D%5E%7B2%7D%28x%29dx+%26%2338%3B+%3D%5Cdisplaystyle+%5Cint%5Cleft%28%5Cfrac%7B1%7D%7Bn%7D%5Csum_%7Bi%3D1%7D%5E%7Bn%7DK_%7Bh%7D%28x-X_%7Bi%7D%29%5Cright%29%5E%7B2%7Ddx%5C%5C+%26%2338%3B+%3D%5Cdisplaystyle+%5Cfrac%7B1%7D%7Bn%5E%7B2%7Dh%5E%7B2%7D%7D%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%5Cint+K%5Cleft%28%5Cfrac%7Bx-X_%7Bi%7D%7D%7Bh%7D%5Cright%29K%5Cleft%28%5Cfrac%7Bx-X_%7Bj%7D%7D%7Bh%7D%5Cright%29dx%5C%5C+%26%2338%3B+%3D%5Cdisplaystyle+%5Cfrac%7B1%7D%7Bn%5E%7B2%7Dh%7D%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%5Cint+K%5Cleft%28u%5Cright%29K%5Cleft%28%5Cfrac%7BX_%7Bi%7D-X_%7Bj%7D%7D%7Bh%7D-u%5Cright%29du%5C%5C+%26%2338%3B+%3D%5Cdisplaystyle+%5Cfrac%7B1%7D%7Bn%5E%7B2%7Dh%7D%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%5Csum_%7Bi%3D1%7D%5E%7Bn%7DK%2AK%5Cleft%28%5Cfrac%7BX_%7Bi%7D-X_%7Bj%7D%7D%7Bh%7D%5Cright%29.+%5Cend%7Barray%7D+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;begin{array}{rl} &#92;displaystyle &#92;int&#92;hat{f}_{h}^{2}(x)dx &amp; =&#92;displaystyle &#92;int&#92;left(&#92;frac{1}{n}&#92;sum_{i=1}^{n}K_{h}(x-X_{i})&#92;right)^{2}dx&#92;&#92; &amp; =&#92;displaystyle &#92;frac{1}{n^{2}h^{2}}&#92;sum_{i=1}^{n}&#92;sum_{i=1}^{n}&#92;int K&#92;left(&#92;frac{x-X_{i}}{h}&#92;right)K&#92;left(&#92;frac{x-X_{j}}{h}&#92;right)dx&#92;&#92; &amp; =&#92;displaystyle &#92;frac{1}{n^{2}h}&#92;sum_{i=1}^{n}&#92;sum_{i=1}^{n}&#92;int K&#92;left(u&#92;right)K&#92;left(&#92;frac{X_{i}-X_{j}}{h}-u&#92;right)du&#92;&#92; &amp; =&#92;displaystyle &#92;frac{1}{n^{2}h}&#92;sum_{i=1}^{n}&#92;sum_{i=1}^{n}K*K&#92;left(&#92;frac{X_{i}-X_{j}}{h}&#92;right). &#92;end{array} &amp;fg=000000' title='&#92;displaystyle &#92;begin{array}{rl} &#92;displaystyle &#92;int&#92;hat{f}_{h}^{2}(x)dx &amp; =&#92;displaystyle &#92;int&#92;left(&#92;frac{1}{n}&#92;sum_{i=1}^{n}K_{h}(x-X_{i})&#92;right)^{2}dx&#92;&#92; &amp; =&#92;displaystyle &#92;frac{1}{n^{2}h^{2}}&#92;sum_{i=1}^{n}&#92;sum_{i=1}^{n}&#92;int K&#92;left(&#92;frac{x-X_{i}}{h}&#92;right)K&#92;left(&#92;frac{x-X_{j}}{h}&#92;right)dx&#92;&#92; &amp; =&#92;displaystyle &#92;frac{1}{n^{2}h}&#92;sum_{i=1}^{n}&#92;sum_{i=1}^{n}&#92;int K&#92;left(u&#92;right)K&#92;left(&#92;frac{X_{i}-X_{j}}{h}-u&#92;right)du&#92;&#92; &amp; =&#92;displaystyle &#92;frac{1}{n^{2}h}&#92;sum_{i=1}^{n}&#92;sum_{i=1}^{n}K*K&#92;left(&#92;frac{X_{i}-X_{j}}{h}&#92;right). &#92;end{array} &amp;fg=000000' class='latex' /></p>
<p>where <img src='http://s0.wp.com/latex.php?latex=%7BK%2AK%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{K*K}&amp;fg=000000' title='{K*K}&amp;fg=000000' class='latex' /> means the convolution of <img src='http://s0.wp.com/latex.php?latex=%7BK%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{K}&amp;fg=000000' title='{K}&amp;fg=000000' class='latex' /> with itself.</p>
<p>Finally it is possible define a reasonable criterion to choose the bandwidth,</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+CV%28h%29%3D%5Cfrac%7B1%7D%7Bn%5E%7B2%7Dh%7D%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%5Csum_%7Bj%3D1%7D%5E%7Bn%7DK%2AK%5Cleft%28%5Cfrac%7BX_%7Bi%7D-X_%7Bj%7D%7D%7Bh%7D%5Cright%29-%5Cfrac%7B2%7D%7Bn%28n-1%29%7D%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%5Cmathop%7B%5Csum_%7Bj%3D1%7D%5E%7Bn%7D%7D_%7Bj%5Cneq+i%7DK_%7Bh%7D%28X_%7Bi%7D-X_%7Bj%7D%29.+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle CV(h)=&#92;frac{1}{n^{2}h}&#92;sum_{i=1}^{n}&#92;sum_{j=1}^{n}K*K&#92;left(&#92;frac{X_{i}-X_{j}}{h}&#92;right)-&#92;frac{2}{n(n-1)}&#92;sum_{i=1}^{n}&#92;mathop{&#92;sum_{j=1}^{n}}_{j&#92;neq i}K_{h}(X_{i}-X_{j}). &amp;fg=000000' title='&#92;displaystyle CV(h)=&#92;frac{1}{n^{2}h}&#92;sum_{i=1}^{n}&#92;sum_{j=1}^{n}K*K&#92;left(&#92;frac{X_{i}-X_{j}}{h}&#92;right)-&#92;frac{2}{n(n-1)}&#92;sum_{i=1}^{n}&#92;mathop{&#92;sum_{j=1}^{n}}_{j&#92;neq i}K_{h}(X_{i}-X_{j}). &amp;fg=000000' class='latex' /></p>
<p><strong>Note:</strong> An alternative way to implement the leave-one-out <span class="zem_slink">cross validation is</span>,</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+CV%28h%29%3D%5Cint%5Chat%7Bf%7D_%7Bh%7D%5E%7B2%7D%28x%29dx-%5Cfrac%7B2%7D%7Bn%28n-1%29%7D%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%5Cmathop%7B%5Csum_%7Bj%3D1%7D%5E%7Bn%7D%7D_%7Bj%5Cneq+i%7DK_%7Bh%7D%28X_%7Bi%7D-X_%7Bj%7D%29+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle CV(h)=&#92;int&#92;hat{f}_{h}^{2}(x)dx-&#92;frac{2}{n(n-1)}&#92;sum_{i=1}^{n}&#92;mathop{&#92;sum_{j=1}^{n}}_{j&#92;neq i}K_{h}(X_{i}-X_{j}) &amp;fg=000000' title='&#92;displaystyle CV(h)=&#92;int&#92;hat{f}_{h}^{2}(x)dx-&#92;frac{2}{n(n-1)}&#92;sum_{i=1}^{n}&#92;mathop{&#92;sum_{j=1}^{n}}_{j&#92;neq i}K_{h}(X_{i}-X_{j}) &amp;fg=000000' class='latex' /></p>
<p>and then calculate numerically the integral.</p>
<p><strong>Sources:</strong></p>
<ul>
<li>Hardle, W. (2004). <a class="zem_slink" title="Nonparametric and Semiparametric Models" href="http://www.amazon.com/Nonparametric-Semiparametric-Models-Wolfgang-H%C3%A4rdle/dp/3540207228%3FSubscriptionId%3D0G81C5DAZ03ZR9WH9X82%26tag%3Dzemanta-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D3540207228" target="_blank" rel="amazon">Nonparametric and Semiparametric Models</a>. Springer Series in Statistics. Springer.</li>
<li>Tsybakov, A. (2009). <em>Introduction to nonparametric estimation</em>. Springer.</li>
<li>Silverman, B. W. (1986). <a class="zem_slink" title="Density estimation" href="http://en.wikipedia.org/wiki/Density_estimation" target="_blank" rel="wikipedia">Density Estimation</a> for Statistics and Data Analysis, volume 26. Chapman &#38; Hall/CRC.</li>
</ul>
<h6 class="zemanta-related-title" style="font-size:1em;">Related articles</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://andrewgelman.com/2012/03/check-your-missing-data-imputations-using-cross-validation/" target="_blank">Check your missing-data imputations using cross-validation</a> (andrewgelman.com)</li>
</ul>
<p>&#160;</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Kernel density estimation]]></title>
<link>http://maikolsolis.wordpress.com/2011/12/23/kernel-density-estimation/</link>
<pubDate>Fri, 23 Dec 2011 11:08:26 +0000</pubDate>
<dc:creator>Maikol Solís</dc:creator>
<guid>http://maikolsolis.wordpress.com/2011/12/23/kernel-density-estimation/</guid>
<description><![CDATA[A 1D Kernel density estimation I will make a summary of ideas about nonparametric estimation, includ]]></description>
<content:encoded><![CDATA[<div class="wp-caption alignright" style="width: 310px"><a href="http://commons.wikipedia.org/wiki/File:1D_kernel_density_estimate.svg"><img class="zemanta-img-inserted zemanta-img-configured" title="English: A 1D Kernel density estimation" alt="English: A 1D Kernel density estimation" src="http://upload.wikimedia.org/wikipedia/commons/thumb/0/0b/1D_kernel_density_estimate.svg/300px-1D_kernel_density_estimate.svg.png" height="225" width="300" /></a><p class="wp-caption-text">A 1D Kernel density estimation</p></div>
<p>I will make a summary of ideas about nonparametric estimation, including some basics results to develop more advanced theory later.</p>
<p>In the <a title="Importance of nonparametric statistics in regression." href="http://maikolsolis.wordpress.com/2011/10/09/importance-of-nonparametric-statistics-in-regression/">first post</a>  we talk something about the <a class="zem_slink" title="Density estimation" href="http://en.wikipedia.org/wiki/Density_estimation" rel="wikipedia">density estimation</a> and the nonparametric regression. Later, in posts about <a class="zem_slink" title="Histogram" href="http://en.wikipedia.org/wiki/Histogram" rel="wikipedia">histogram</a> (<a title="Density Estimation by Histograms (Part I)" href="http://maikolsolis.wordpress.com/2011/10/16/density-estimation-by-histograms-part-i/">I</a>,<a title="Density Estimation by Histograms (Part II)" href="http://maikolsolis.wordpress.com/2011/10/23/density-estimation-by-histograms-part-ii/">II</a>,<a title="Density Estimation by Histograms (Part III)" href="http://maikolsolis.wordpress.com/2011/11/01/density-estimation-by-histograms-part-iii/">III</a>,<a title="Density Estimation by Histograms (Part IV)" href="http://maikolsolis.wordpress.com/2011/11/07/density-estimation-by-histograms-part-iv/">IV</a>) , we saw how the histogram is a <a class="zem_slink" title="Non-parametric statistics" href="http://en.wikipedia.org/wiki/Non-parametric_statistics" rel="wikipedia">nonparametric estimator</a> and we studied its properties.</p>
<p>Now, we are ready to go one step further and study the kernel density estimators in general.</p>
<p><strong>1. <a class="zem_slink" title="Kernel density estimation" href="http://en.wikipedia.org/wiki/Kernel_density_estimation" rel="wikipedia">Kernel density estimation</a></strong></p>
<p>Let <img src='http://s0.wp.com/latex.php?latex=%7BX_%7B1%7D%2C%5Cldots%2CX_%7Bn%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X_{1},&#92;ldots,X_{n}}&amp;fg=000000' title='{X_{1},&#92;ldots,X_{n}}&amp;fg=000000' class='latex' /> i.i.d random variables with common probability <img src='http://s0.wp.com/latex.php?latex=%7Bp%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{p}&amp;fg=000000' title='{p}&amp;fg=000000' class='latex' /> in <img src='http://s0.wp.com/latex.php?latex=%7B%7B%5Cmathbb+R%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{{&#92;mathbb R}}&amp;fg=000000' title='{{&#92;mathbb R}}&amp;fg=000000' class='latex' />. The distribution function of <img src='http://s0.wp.com/latex.php?latex=%7Bp%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{p}&amp;fg=000000' title='{p}&amp;fg=000000' class='latex' /> is <img src='http://s0.wp.com/latex.php?latex=%7BF%28x%29%3D%5Cint_%7B-%5Cinfty%7D%5E%7Bx%7Dp%28t%29dt%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{F(x)=&#92;int_{-&#92;infty}^{x}p(t)dt}&amp;fg=000000' title='{F(x)=&#92;int_{-&#92;infty}^{x}p(t)dt}&amp;fg=000000' class='latex' />. Consider the <a class="zem_slink" title="Empirical distribution function" href="http://en.wikipedia.org/wiki/Empirical_distribution_function" rel="wikipedia">empirical distribution function</a></p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+F_%7Bn%7D%28x%29%3D%5Cfrac%7B1%7D%7Bn%7D%5Csum_%7Bi%3D1%7D%5E%7Bn%7DI%28X_%7Bi%7D%5Cleq+x%29.+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle F_{n}(x)=&#92;frac{1}{n}&#92;sum_{i=1}^{n}I(X_{i}&#92;leq x). &amp;fg=000000' title='&#92;displaystyle F_{n}(x)=&#92;frac{1}{n}&#92;sum_{i=1}^{n}I(X_{i}&#92;leq x). &amp;fg=000000' class='latex' /><!--more--></p>
<p>By the <a class="zem_slink" title="Law of large numbers" href="http://en.wikipedia.org/wiki/Law_of_large_numbers" rel="wikipedia">strong law of large numbers</a> we have <img src='http://s0.wp.com/latex.php?latex=%7BF_%7Bn%7D%28x%29%5Crightarrow+F%28x%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{F_{n}(x)&#92;rightarrow F(x)}&amp;fg=000000' title='{F_{n}(x)&#92;rightarrow F(x)}&amp;fg=000000' class='latex' /> for all <img src='http://s0.wp.com/latex.php?latex=%7Bx%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{x}&amp;fg=000000' title='{x}&amp;fg=000000' class='latex' /> in <img src='http://s0.wp.com/latex.php?latex=%7B%7B%5Cmathbb+R%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{{&#92;mathbb R}}&amp;fg=000000' title='{{&#92;mathbb R}}&amp;fg=000000' class='latex' /> almost surely as <img src='http://s0.wp.com/latex.php?latex=%7Bn%5Crightarrow%5Cinfty%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{n&#92;rightarrow&#92;infty}&amp;fg=000000' title='{n&#92;rightarrow&#92;infty}&amp;fg=000000' class='latex' />. Therefore, <img src='http://s0.wp.com/latex.php?latex=%7BF_%7Bn%7D%28x%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{F_{n}(x)}&amp;fg=000000' title='{F_{n}(x)}&amp;fg=000000' class='latex' /> is a consistent estimator for all <img src='http://s0.wp.com/latex.php?latex=%7Bx%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{x}&amp;fg=000000' title='{x}&amp;fg=000000' class='latex' /> in <img src='http://s0.wp.com/latex.php?latex=%7B%7B%5Cmathbb+R%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{{&#92;mathbb R}}&amp;fg=000000' title='{{&#92;mathbb R}}&amp;fg=000000' class='latex' />. The natural question is: How can we estimate <img src='http://s0.wp.com/latex.php?latex=%7Bp%3F%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{p?}&amp;fg=000000' title='{p?}&amp;fg=000000' class='latex' /></p>
<p>The following argument was one of the first solutions. For <img src='http://s0.wp.com/latex.php?latex=%7Bh%26%2362%3B0%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{h&gt;0}&amp;fg=000000' title='{h&gt;0}&amp;fg=000000' class='latex' /> small we have</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+p%28x%29%5Capprox%5Cfrac%7BF%28x%2Bh%29-F%28x-h%29%7D%7B2h%7D.+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle p(x)&#92;approx&#92;frac{F(x+h)-F(x-h)}{2h}. &amp;fg=000000' title='&#92;displaystyle p(x)&#92;approx&#92;frac{F(x+h)-F(x-h)}{2h}. &amp;fg=000000' class='latex' /></p>
<p>Replacing <img src='http://s0.wp.com/latex.php?latex=%7BF%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{F}&amp;fg=000000' title='{F}&amp;fg=000000' class='latex' /> by its estimator <img src='http://s0.wp.com/latex.php?latex=%7BF_%7Bn%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{F_{n}}&amp;fg=000000' title='{F_{n}}&amp;fg=000000' class='latex' /> , define</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Chat%7Bp%7D_%7Bn%7D%5E%7BR%7D%28x%29%3D%5Cfrac%7BF_%7Bn%7D%28x%2Bh%29-F_%7Bn%7D%28x-h%29%7D%7B2h%7D%2C+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;hat{p}_{n}^{R}(x)=&#92;frac{F_{n}(x+h)-F_{n}(x-h)}{2h}, &amp;fg=000000' title='&#92;displaystyle &#92;hat{p}_{n}^{R}(x)=&#92;frac{F_{n}(x+h)-F_{n}(x-h)}{2h}, &amp;fg=000000' class='latex' /></p>
<p>where <img src='http://s0.wp.com/latex.php?latex=%7B%5Chat%7Bp%7D_%7Bn%7D%5E%7BR%7D%28x%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;hat{p}_{n}^{R}(x)}&amp;fg=000000' title='{&#92;hat{p}_{n}^{R}(x)}&amp;fg=000000' class='latex' /> is the <em>Rosenblatt estimator. </em>We can rewrite it in the form</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Chat%7Bp%7D_%7Bn%7D%5E%7BR%7D%28x%29%3D%5Cfrac%7B1%7D%7B2nh%7D%5Csum_%7Bi%3D1%7D%5E%7Bn%7DI%28x-h%26%2360%3BX_%7Bi%7D%5Cleq+x%2Bh%29%3D%5Cfrac%7B1%7D%7Bnh%7D%5Csum_%7Bi%3D1%7D%5E%7Bn%7DK_%7B0%7D%5Cleft%28%5Cfrac%7BX_%7Bi%7D-x%7D%7Bh%7D%5Cright%29+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;hat{p}_{n}^{R}(x)=&#92;frac{1}{2nh}&#92;sum_{i=1}^{n}I(x-h&lt;X_{i}&#92;leq x+h)=&#92;frac{1}{nh}&#92;sum_{i=1}^{n}K_{0}&#92;left(&#92;frac{X_{i}-x}{h}&#92;right) &amp;fg=000000' title='&#92;displaystyle &#92;hat{p}_{n}^{R}(x)=&#92;frac{1}{2nh}&#92;sum_{i=1}^{n}I(x-h&lt;X_{i}&#92;leq x+h)=&#92;frac{1}{nh}&#92;sum_{i=1}^{n}K_{0}&#92;left(&#92;frac{X_{i}-x}{h}&#92;right) &amp;fg=000000' class='latex' /></p>
<p>where <img src='http://s0.wp.com/latex.php?latex=%7BK_%7B0%7D%28u%29%3D%5Cfrac%7B1%7D%7B2%7DI%28-1%26%2360%3Bu%5Cleq1%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{K_{0}(u)=&#92;frac{1}{2}I(-1&lt;u&#92;leq1)}&amp;fg=000000' title='{K_{0}(u)=&#92;frac{1}{2}I(-1&lt;u&#92;leq1)}&amp;fg=000000' class='latex' />, which is in fact the histogram estimator.</p>
<p>In general,</p>
<p style="text-align:center;"><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Chat%7Bp%7D_%7Bn%7D%28x%29%3D%5Cfrac%7B1%7D%7Bnh%7D%5Csum_%7Bi%3D1%7D%5E%7Bn%7DK%5Cleft%28%5Cfrac%7BX_%7Bi%7D-x%7D%7Bh%7D%5Cright%29+&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;hat{p}_{n}(x)=&#92;frac{1}{nh}&#92;sum_{i=1}^{n}K&#92;left(&#92;frac{X_{i}-x}{h}&#92;right) ' title='&#92;displaystyle &#92;hat{p}_{n}(x)=&#92;frac{1}{nh}&#92;sum_{i=1}^{n}K&#92;left(&#92;frac{X_{i}-x}{h}&#92;right) ' class='latex' /></p>
<p>where <img src='http://s0.wp.com/latex.php?latex=%7BK%3A%7B%5Cmathbb+R%7D%5Crightarrow%7B%5Cmathbb+R%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{K:{&#92;mathbb R}&#92;rightarrow{&#92;mathbb R}}&amp;fg=000000' title='{K:{&#92;mathbb R}&#92;rightarrow{&#92;mathbb R}}&amp;fg=000000' class='latex' />, <img src='http://s0.wp.com/latex.php?latex=%7B%5Cint+K%28u%29du%3D1%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;int K(u)du=1}&amp;fg=000000' title='{&#92;int K(u)du=1}&amp;fg=000000' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=%7BK%28u%29%5Cgeq0%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{K(u)&#92;geq0}&amp;fg=000000' title='{K(u)&#92;geq0}&amp;fg=000000' class='latex' />. The function <img src='http://s0.wp.com/latex.php?latex=%7BK%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{K}&amp;fg=000000' title='{K}&amp;fg=000000' class='latex' /> is the kernel and <img src='http://s0.wp.com/latex.php?latex=%7Bh%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{h}&amp;fg=000000' title='{h}&amp;fg=000000' class='latex' /> is the bandwidth which depends on <img src='http://s0.wp.com/latex.php?latex=%7Bn%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{n}&amp;fg=000000' title='{n}&amp;fg=000000' class='latex' />. The function <img src='http://s0.wp.com/latex.php?latex=%7Bx%5Crightarrow%5Chat%7Bp%7D%28x%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{x&#92;rightarrow&#92;hat{p}(x)}&amp;fg=000000' title='{x&#92;rightarrow&#92;hat{p}(x)}&amp;fg=000000' class='latex' /> is the kernel (<a href="http://en.wikipedia.org/wiki/Emanuel_Parzen">Parzen</a>-<a href="http://en.wikipedia.org/wiki/Murray_Rosenblatt">Ronsenblatt</a>) estimator.</p>
<p>Some classical examples of kernels are:</p>
<ul>
<li>Rectangular: <img src='http://s0.wp.com/latex.php?latex=%7BK%28u%29%3D%5Cfrac%7B1%7D%7B2%7DI%28+%26%23124%3Bu%26%23124%3B%5Cleq1%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{K(u)=&#92;frac{1}{2}I( &#124;u&#124;&#92;leq1)}&amp;fg=000000' title='{K(u)=&#92;frac{1}{2}I( &#124;u&#124;&#92;leq1)}&amp;fg=000000' class='latex' /> .</li>
<li>Triangular: <img src='http://s0.wp.com/latex.php?latex=%7BK%28u%29%3D%281-%26%23124%3B+u%26%23124%3B%29I%28%26%23124%3Bu%26%23124%3B%5Cleq1%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{K(u)=(1-&#124; u&#124;)I(&#124;u&#124;&#92;leq1)}&amp;fg=000000' title='{K(u)=(1-&#124; u&#124;)I(&#124;u&#124;&#92;leq1)}&amp;fg=000000' class='latex' /> .</li>
<li>Epanechnikov: <img src='http://s0.wp.com/latex.php?latex=%7BK%28u%29%3D%5Cfrac%7B3%7D%7B4%7D%281-u%5E%7B2%7D%29I%28%26%23124%3Bu%26%23124%3B%5Cleq1%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{K(u)=&#92;frac{3}{4}(1-u^{2})I(&#124;u&#124;&#92;leq1)}&amp;fg=000000' title='{K(u)=&#92;frac{3}{4}(1-u^{2})I(&#124;u&#124;&#92;leq1)}&amp;fg=000000' class='latex' /> .</li>
<li>Gaussian: <img src='http://s0.wp.com/latex.php?latex=%7BK%28u%29%3D%5Cfrac%7B1%7D%7B%5Csqrt%7B2%5Cpi%7D%7D%5Cexp%28-u%5E%7B2%7D%2F2%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{K(u)=&#92;frac{1}{&#92;sqrt{2&#92;pi}}&#92;exp(-u^{2}/2)}&amp;fg=000000' title='{K(u)=&#92;frac{1}{&#92;sqrt{2&#92;pi}}&#92;exp(-u^{2}/2)}&amp;fg=000000' class='latex' />.</li>
</ul>
<p><strong>2. <a class="zem_slink" title="Mean square error" href="http://en.wikipedia.org/wiki/Mean_square_error" rel="wikipedia">Mean Squared Error</a> (or Risk) of Kernel Estimators </strong></p>
<p>At an arbitrary point <img src='http://s0.wp.com/latex.php?latex=%7Bx_%7B0%7D%5Cin%7B%5Cmathbb+R%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{x_{0}&#92;in{&#92;mathbb R}}&amp;fg=000000' title='{x_{0}&#92;in{&#92;mathbb R}}&amp;fg=000000' class='latex' /> we have</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cmathrm%7BMSE%7D%3D%5Cmathrm%7BMSE%7D%28x_%7B0%7D%29%5Ctriangleq%5Cmathop%7B%5Cmathbb+E%7D_p%7B%5Cleft%28%5Chat%7Bp%7D_%7Bn%7D%28x_%7B0%7D%29-p%28x_%7B0%7D%29%5Cright%29%5E%7B2%7D%7D+%5C%5C+%3D%5Cint%5Cleft%28%5Chat%7Bp%7D_%7Bn%7D%28x_%7B0%7D%2Cx_%7B1%7D%2C%5Cldots%2Cx_%7Bn%7D%29-p%28x_%7B0%7D%29%5Cright%29%5E%7B2%7D%5Cprod_%7Bi%3D1%7D+%5E%7Bn%7D%5Cleft%5Bp%28x_%7Bi%7D%29dx_%7Bi%7D%5Cright%5D.+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;mathrm{MSE}=&#92;mathrm{MSE}(x_{0})&#92;triangleq&#92;mathop{&#92;mathbb E}_p{&#92;left(&#92;hat{p}_{n}(x_{0})-p(x_{0})&#92;right)^{2}} &#92;&#92; =&#92;int&#92;left(&#92;hat{p}_{n}(x_{0},x_{1},&#92;ldots,x_{n})-p(x_{0})&#92;right)^{2}&#92;prod_{i=1} ^{n}&#92;left[p(x_{i})dx_{i}&#92;right]. &amp;fg=000000' title='&#92;displaystyle &#92;mathrm{MSE}=&#92;mathrm{MSE}(x_{0})&#92;triangleq&#92;mathop{&#92;mathbb E}_p{&#92;left(&#92;hat{p}_{n}(x_{0})-p(x_{0})&#92;right)^{2}} &#92;&#92; =&#92;int&#92;left(&#92;hat{p}_{n}(x_{0},x_{1},&#92;ldots,x_{n})-p(x_{0})&#92;right)^{2}&#92;prod_{i=1} ^{n}&#92;left[p(x_{i})dx_{i}&#92;right]. &amp;fg=000000' class='latex' /></p>
<p>A simple calculation gives us</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cmathrm%7BMSE%7D%28x_%7B0%7D%29%3Db%5E%7B2%7D%28x_%7B0%7D%29%2B%5Csigma%5E%7B2%7D%28x_%7B0%7D%29+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;mathrm{MSE}(x_{0})=b^{2}(x_{0})+&#92;sigma^{2}(x_{0}) &amp;fg=000000' title='&#92;displaystyle &#92;mathrm{MSE}(x_{0})=b^{2}(x_{0})+&#92;sigma^{2}(x_{0}) &amp;fg=000000' class='latex' /></p>
<p>where <img src='http://s0.wp.com/latex.php?latex=%7Bb%28x_%7B0%7D%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{b(x_{0})}&amp;fg=000000' title='{b(x_{0})}&amp;fg=000000' class='latex' /> is the <strong>bias</strong> and <img src='http://s0.wp.com/latex.php?latex=%7B%5Csigma%5E%7B2%7D%28x_%7B0%7D%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;sigma^{2}(x_{0})}&amp;fg=000000' title='{&#92;sigma^{2}(x_{0})}&amp;fg=000000' class='latex' /> is the <strong>variance</strong> of the estimator <img src='http://s0.wp.com/latex.php?latex=%7B%5Chat%7Bp%7D_%7Bn%7D%28x_%7B0%7D%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;hat{p}_{n}(x_{0})}&amp;fg=000000' title='{&#92;hat{p}_{n}(x_{0})}&amp;fg=000000' class='latex' /> and we define as</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cbegin%7Barray%7D%7Brcl%7D+b%28x_%7B0%7D%29+%26%2338%3B+%3D+%26%2338%3B+%5Cmathop%7B%5Cmathbb+E%7D_p%7B%5Chat%7Bp%7D_%7Bn%7D+%28x_%7B0%7D%29%7D-p%28x_%7B0%7D%29%5C%5C+%5Csigma%5E%7B2%7D%28x_%7B0%7D%29+%26%2338%3B+%3D+%26%2338%3B+%5Cmathop%7B%5Cmathbb+E%7D_p%7B%5Cleft%28%5Chat%7Bp%7D_%7Bn%7D+%28x_%7B0%7D%29-%5Cmathop%7B%5Cmathbb+E%7D_p%7B%5Chat%7Bp%7D_%7Bn%7D+%28x_%7B0%7D%29%7D%5Cright%29%7D.+%5Cend%7Barray%7D+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;begin{array}{rcl} b(x_{0}) &amp; = &amp; &#92;mathop{&#92;mathbb E}_p{&#92;hat{p}_{n} (x_{0})}-p(x_{0})&#92;&#92; &#92;sigma^{2}(x_{0}) &amp; = &amp; &#92;mathop{&#92;mathbb E}_p{&#92;left(&#92;hat{p}_{n} (x_{0})-&#92;mathop{&#92;mathbb E}_p{&#92;hat{p}_{n} (x_{0})}&#92;right)}. &#92;end{array} &amp;fg=000000' title='&#92;displaystyle &#92;begin{array}{rcl} b(x_{0}) &amp; = &amp; &#92;mathop{&#92;mathbb E}_p{&#92;hat{p}_{n} (x_{0})}-p(x_{0})&#92;&#92; &#92;sigma^{2}(x_{0}) &amp; = &amp; &#92;mathop{&#92;mathbb E}_p{&#92;left(&#92;hat{p}_{n} (x_{0})-&#92;mathop{&#92;mathbb E}_p{&#92;hat{p}_{n} (x_{0})}&#92;right)}. &#92;end{array} &amp;fg=000000' class='latex' /></p>
<p>We shall analyze both terms separately.</p>
<p><strong> 2.1. Variance of the estimator <img src='http://s0.wp.com/latex.php?latex=%7B%5Chat%7Bp%7D_%7Bn%7D+%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;hat{p}_{n} }&amp;fg=000000' title='{&#92;hat{p}_{n} }&amp;fg=000000' class='latex' /> </strong></p>
<blockquote><p><strong>Proposition 1</strong> <em> Suppose that the density <img src='http://s0.wp.com/latex.php?latex=%7Bp%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{p}&amp;fg=000000' title='{p}&amp;fg=000000' class='latex' /> satisfies <img src='http://s0.wp.com/latex.php?latex=%7Bp%28x%29%5Cleq+p_%7B%5Cmax%7D%26%2360%3B%5Cinfty%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{p(x)&#92;leq p_{&#92;max}&lt;&#92;infty}&amp;fg=000000' title='{p(x)&#92;leq p_{&#92;max}&lt;&#92;infty}&amp;fg=000000' class='latex' /> for all <img src='http://s0.wp.com/latex.php?latex=%7Bx%5Cin%7B%5Cmathbb+R%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{x&#92;in{&#92;mathbb R}}&amp;fg=000000' title='{x&#92;in{&#92;mathbb R}}&amp;fg=000000' class='latex' />. Let <img src='http://s0.wp.com/latex.php?latex=%7BK%3A%7B%5Cmathbb+R%7D%5Crightarrow%7B%5Cmathbb+R%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{K:{&#92;mathbb R}&#92;rightarrow{&#92;mathbb R}}&amp;fg=000000' title='{K:{&#92;mathbb R}&#92;rightarrow{&#92;mathbb R}}&amp;fg=000000' class='latex' /> be a function such that<br />
</em></p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cint+K%5E%7B2%7D%28u%29du%26%2360%3B%5Cinfty.+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;int K^{2}(u)du&lt;&#92;infty. &amp;fg=000000' title='&#92;displaystyle &#92;int K^{2}(u)du&lt;&#92;infty. &amp;fg=000000' class='latex' /></p>
<p>Then for any <img src='http://s0.wp.com/latex.php?latex=%7Bx_%7B0%7D%5Cin%7B%5Cmathbb+R%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{x_{0}&#92;in{&#92;mathbb R}}&amp;fg=000000' title='{x_{0}&#92;in{&#92;mathbb R}}&amp;fg=000000' class='latex' />, <img src='http://s0.wp.com/latex.php?latex=%7Bh%26%2362%3B0%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{h&gt;0}&amp;fg=000000' title='{h&gt;0}&amp;fg=000000' class='latex' />, and <img src='http://s0.wp.com/latex.php?latex=%7Bn%5Cgeq1%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{n&#92;geq1}&amp;fg=000000' title='{n&#92;geq1}&amp;fg=000000' class='latex' /> we have</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Csigma%5E%7B2%7D%28x_%7B0%7D%29%5Cleq%5Cfrac%7BC_%7B1%7D%7D%7Bnh%7D+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;sigma^{2}(x_{0})&#92;leq&#92;frac{C_{1}}{nh} &amp;fg=000000' title='&#92;displaystyle &#92;sigma^{2}(x_{0})&#92;leq&#92;frac{C_{1}}{nh} &amp;fg=000000' class='latex' /></p>
<p>where <img src='http://s0.wp.com/latex.php?latex=%7BC_%7B1%7D%3Dp_%7B%5Cmax%7D%5Cint+K%5E%7B2%7D%28u%29du.%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{C_{1}=p_{&#92;max}&#92;int K^{2}(u)du.}&amp;fg=000000' title='{C_{1}=p_{&#92;max}&#92;int K^{2}(u)du.}&amp;fg=000000' class='latex' /></p></blockquote>
<p><em>Proof:</em> We have</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cbegin%7Barray%7D%7Brl%7D+%5Cmathop%7B%5Cmathrm%7BVar%7D%7D%28%5Chat%7Bp%7D_%7Bn%7D+%28x_%7B0%7D%29%29%3D+%26%2338%3B+%5Cdisplaystyle%5Cfrac%7B1%7D%7Bnh%5E%7B2%7D%7D%5Cmathop%7B%5Cmathrm%7BVar%7D%7D%5Cleft%28K%5Cleft%28%5Cfrac%7BX_%7B1%7D-x_%7B0%7D%7D%7Bh%7D%5Cright%29%5Cright%29%5C%5C+%5Cleq+%26%2338%3B+%5Cdisplaystyle%5Cfrac%7B1%7D%7Bnh%5E%7B2%7D%7D%5Cmathop%7B%5Cmathbb+E%7D_p%7BK%5E%7B2%7D%5Cleft%28%5Cfrac%7BX_%7B1%7D-x_%7B0%7D%7D%7Bh%7D%5Cright%29%7D%5C%5C+%3D+%26%2338%3B+%5Cdisplaystyle%5Cfrac%7B1%7D%7Bnh%7D%5Cint+K%5E%7B2%7D%28u%29p%28uh%2Bx_%7B0%7D%29du%5C%5C+%5Cleq+%26%2338%3B+%5Cdisplaystyle%5Cfrac%7B1%7D%7Bnh%7Dp_%7B%5Cmax%7D%5Cint+K%5E%7B2%7D%28u%29du.+%5Cend%7Barray%7D+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;begin{array}{rl} &#92;mathop{&#92;mathrm{Var}}(&#92;hat{p}_{n} (x_{0}))= &amp; &#92;displaystyle&#92;frac{1}{nh^{2}}&#92;mathop{&#92;mathrm{Var}}&#92;left(K&#92;left(&#92;frac{X_{1}-x_{0}}{h}&#92;right)&#92;right)&#92;&#92; &#92;leq &amp; &#92;displaystyle&#92;frac{1}{nh^{2}}&#92;mathop{&#92;mathbb E}_p{K^{2}&#92;left(&#92;frac{X_{1}-x_{0}}{h}&#92;right)}&#92;&#92; = &amp; &#92;displaystyle&#92;frac{1}{nh}&#92;int K^{2}(u)p(uh+x_{0})du&#92;&#92; &#92;leq &amp; &#92;displaystyle&#92;frac{1}{nh}p_{&#92;max}&#92;int K^{2}(u)du. &#92;end{array} &amp;fg=000000' title='&#92;displaystyle &#92;begin{array}{rl} &#92;mathop{&#92;mathrm{Var}}(&#92;hat{p}_{n} (x_{0}))= &amp; &#92;displaystyle&#92;frac{1}{nh^{2}}&#92;mathop{&#92;mathrm{Var}}&#92;left(K&#92;left(&#92;frac{X_{1}-x_{0}}{h}&#92;right)&#92;right)&#92;&#92; &#92;leq &amp; &#92;displaystyle&#92;frac{1}{nh^{2}}&#92;mathop{&#92;mathbb E}_p{K^{2}&#92;left(&#92;frac{X_{1}-x_{0}}{h}&#92;right)}&#92;&#92; = &amp; &#92;displaystyle&#92;frac{1}{nh}&#92;int K^{2}(u)p(uh+x_{0})du&#92;&#92; &#92;leq &amp; &#92;displaystyle&#92;frac{1}{nh}p_{&#92;max}&#92;int K^{2}(u)du. &#92;end{array} &amp;fg=000000' class='latex' /></p>
<p style="text-align:right;"><img src='http://s0.wp.com/latex.php?latex=%5CBox%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;Box&amp;fg=000000' title='&#92;Box&amp;fg=000000' class='latex' /></p>
<p>Finally if <img src='http://s0.wp.com/latex.php?latex=%7Bnh%5Crightarrow%5Cinfty%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{nh&#92;rightarrow&#92;infty}&amp;fg=000000' title='{nh&#92;rightarrow&#92;infty}&amp;fg=000000' class='latex' /> as <img src='http://s0.wp.com/latex.php?latex=%7Bn%5Crightarrow%5Cinfty%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{n&#92;rightarrow&#92;infty}&amp;fg=000000' title='{n&#92;rightarrow&#92;infty}&amp;fg=000000' class='latex' /> then the variance <img src='http://s0.wp.com/latex.php?latex=%7B%5Csigma%5E%7B2%7D%28x_%7B0%7D%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;sigma^{2}(x_{0})}&amp;fg=000000' title='{&#92;sigma^{2}(x_{0})}&amp;fg=000000' class='latex' /> goes to 0 as <img src='http://s0.wp.com/latex.php?latex=%7Bn%5Crightarrow%5Cinfty%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{n&#92;rightarrow&#92;infty}&amp;fg=000000' title='{n&#92;rightarrow&#92;infty}&amp;fg=000000' class='latex' />.</p>
<p><strong> 2.2. Bias of the estimator <img src='http://s0.wp.com/latex.php?latex=%7B%5Chat%7Bp%7D_%7Bn%7D+%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;hat{p}_{n} }&amp;fg=000000' title='{&#92;hat{p}_{n} }&amp;fg=000000' class='latex' /> </strong></p>
<p>The bias has the form</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+b%28x_%7B0%7D%29%3D%5Cmathop%7B%5Cmathbb+E%7D_p%7B%5Chat%7Bp%7D_%7Bn%7D+%28x_%7B0%7D%29%7D-p%28x_%7B0%7D%29%3D%5Cfrac%7B1%7D%7Bh%7D%5Cint+K%5Cleft%28%5Cfrac%7Bz-x_%7B0%7D%7D%7Bh%7D%5Cright%29p%28z%29dz-p%28x_%7B0%7D%29.+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle b(x_{0})=&#92;mathop{&#92;mathbb E}_p{&#92;hat{p}_{n} (x_{0})}-p(x_{0})=&#92;frac{1}{h}&#92;int K&#92;left(&#92;frac{z-x_{0}}{h}&#92;right)p(z)dz-p(x_{0}). &amp;fg=000000' title='&#92;displaystyle b(x_{0})=&#92;mathop{&#92;mathbb E}_p{&#92;hat{p}_{n} (x_{0})}-p(x_{0})=&#92;frac{1}{h}&#92;int K&#92;left(&#92;frac{z-x_{0}}{h}&#92;right)p(z)dz-p(x_{0}). &amp;fg=000000' class='latex' /></p>
<p>To analyze the behavior of <img src='http://s0.wp.com/latex.php?latex=%7Bb%28x_%7B0%7D%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{b(x_{0})}&amp;fg=000000' title='{b(x_{0})}&amp;fg=000000' class='latex' /> as function of <img src='http://s0.wp.com/latex.php?latex=%7Bh%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{h}&amp;fg=000000' title='{h}&amp;fg=000000' class='latex' />, we need some regularity conditions on the density <img src='http://s0.wp.com/latex.php?latex=%7Bp%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{p}&amp;fg=000000' title='{p}&amp;fg=000000' class='latex' /> and the kernel <img src='http://s0.wp.com/latex.php?latex=%7BK%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{K}&amp;fg=000000' title='{K}&amp;fg=000000' class='latex' />. Denote as <img src='http://s0.wp.com/latex.php?latex=%7B%5Clfloor%5Cbeta%5Crfloor%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;lfloor&#92;beta&#92;rfloor}&amp;fg=000000' title='{&#92;lfloor&#92;beta&#92;rfloor}&amp;fg=000000' class='latex' /> the integer part of the real number <img src='http://s0.wp.com/latex.php?latex=%7B%5Cbeta%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;beta}&amp;fg=000000' title='{&#92;beta}&amp;fg=000000' class='latex' />.</p>
<blockquote><p><strong>Definition 2</strong> <em> Let <img src='http://s0.wp.com/latex.php?latex=%7BT%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{T}&amp;fg=000000' title='{T}&amp;fg=000000' class='latex' /> be an interval in <img src='http://s0.wp.com/latex.php?latex=%7B%7B%5Cmathbb+R%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{{&#92;mathbb R}}&amp;fg=000000' title='{{&#92;mathbb R}}&amp;fg=000000' class='latex' /> and let <img src='http://s0.wp.com/latex.php?latex=%7B%5Cbeta%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;beta}&amp;fg=000000' title='{&#92;beta}&amp;fg=000000' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=%7BL%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{L}&amp;fg=000000' title='{L}&amp;fg=000000' class='latex' /> be two positives numbers. The <strong>Holder class</strong> <img src='http://s0.wp.com/latex.php?latex=%7B%5CSigma%28%5Cbeta%2CL%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;Sigma(&#92;beta,L)}&amp;fg=000000' title='{&#92;Sigma(&#92;beta,L)}&amp;fg=000000' class='latex' /> on <img src='http://s0.wp.com/latex.php?latex=%7BT%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{T}&amp;fg=000000' title='{T}&amp;fg=000000' class='latex' /> is the set of <img src='http://s0.wp.com/latex.php?latex=%7Bl%3D%5Clfloor%5Cbeta%5Crfloor%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{l=&#92;lfloor&#92;beta&#92;rfloor}&amp;fg=000000' title='{l=&#92;lfloor&#92;beta&#92;rfloor}&amp;fg=000000' class='latex' /> times differential functions <img src='http://s0.wp.com/latex.php?latex=%7Bf%3AT%5Crightarrow%7B%5Cmathbb+R%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{f:T&#92;rightarrow{&#92;mathbb R}}&amp;fg=000000' title='{f:T&#92;rightarrow{&#92;mathbb R}}&amp;fg=000000' class='latex' /> whose derivative <img src='http://s0.wp.com/latex.php?latex=%7Bf%5E%7B%28l%29%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{f^{(l)}}&amp;fg=000000' title='{f^{(l)}}&amp;fg=000000' class='latex' /> satisfies<br />
</em></p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%26%23124%3Bf%5E%7B%28l%29%7D%28x%29-f%5E%7B%28l%29%7D%28x%5E%7B%5Cprime%7D%29%26%23124%3B%5Cleq+L%26%23124%3Bx-x%5E%7B%5Cprime%7D%26%23124%3B%5E%7B%5Cbeta-l%7D%2C%5Cquad%5Cforall%5C+x%2Cx%5E%7B%5Cprime%7D%5Cin+T+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#124;f^{(l)}(x)-f^{(l)}(x^{&#92;prime})&#124;&#92;leq L&#124;x-x^{&#92;prime}&#124;^{&#92;beta-l},&#92;quad&#92;forall&#92; x,x^{&#92;prime}&#92;in T &amp;fg=000000' title='&#92;displaystyle &#124;f^{(l)}(x)-f^{(l)}(x^{&#92;prime})&#124;&#92;leq L&#124;x-x^{&#92;prime}&#124;^{&#92;beta-l},&#92;quad&#92;forall&#92; x,x^{&#92;prime}&#92;in T &amp;fg=000000' class='latex' /></p></blockquote>
<blockquote><p><strong>Definition 3</strong> <em> Let <img src='http://s0.wp.com/latex.php?latex=%7Bl%5Cgeq1%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{l&#92;geq1}&amp;fg=000000' title='{l&#92;geq1}&amp;fg=000000' class='latex' /> be an integer. We say that <img src='http://s0.wp.com/latex.php?latex=%7BK%3A%7B%5Cmathbb+R%7D%5Crightarrow%7B%5Cmathbb+R%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{K:{&#92;mathbb R}&#92;rightarrow{&#92;mathbb R}}&amp;fg=000000' title='{K:{&#92;mathbb R}&#92;rightarrow{&#92;mathbb R}}&amp;fg=000000' class='latex' /> is a <strong>kernel of order <img src='http://s0.wp.com/latex.php?latex=%7Bl%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{l}&amp;fg=000000' title='{l}&amp;fg=000000' class='latex' /></strong> if the functions <img src='http://s0.wp.com/latex.php?latex=%7Bu%5Cmapsto+u%5E%7Bj%7DK%28u%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{u&#92;mapsto u^{j}K(u)}&amp;fg=000000' title='{u&#92;mapsto u^{j}K(u)}&amp;fg=000000' class='latex' />, <img src='http://s0.wp.com/latex.php?latex=%7Bj%3D1%2C%5Cldots+l%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{j=1,&#92;ldots l}&amp;fg=000000' title='{j=1,&#92;ldots l}&amp;fg=000000' class='latex' />, are integrable and satisfy<br />
</em></p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cint+K%28u%29du%3D1%2C%5Cqquad%5Cint+u%5E%7Bj%7DK%28u%29du%3D0%2C%5Cquad+j%3D1%2C%5Cldots%2Cl.+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;int K(u)du=1,&#92;qquad&#92;int u^{j}K(u)du=0,&#92;quad j=1,&#92;ldots,l. &amp;fg=000000' title='&#92;displaystyle &#92;int K(u)du=1,&#92;qquad&#92;int u^{j}K(u)du=0,&#92;quad j=1,&#92;ldots,l. &amp;fg=000000' class='latex' /></p></blockquote>
<p>Often in the literature exists an alternative definition of an order for a kernel. A kernel is of order <img src='http://s0.wp.com/latex.php?latex=%7Bl%2B1%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{l+1}&amp;fg=000000' title='{l+1}&amp;fg=000000' class='latex' /> if Definition <a href="#deforder_kernel">2</a> holds and <img src='http://s0.wp.com/latex.php?latex=%7B%5Cint+u%5E%7Bl%2B1%7DK%28u%29du%5Cneq0%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;int u^{l+1}K(u)du&#92;neq0}&amp;fg=000000' title='{&#92;int u^{l+1}K(u)du&#92;neq0}&amp;fg=000000' class='latex' /> which is more restrictive. Define the class of densities <img src='http://s0.wp.com/latex.php?latex=%7B%5Cmathop%7B%5Cmathbb+P%7D%3D%5Cmathop%7B%5Cmathbb+P%7D%28%5Cbeta%2CL%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;mathop{&#92;mathbb P}=&#92;mathop{&#92;mathbb P}(&#92;beta,L)}&amp;fg=000000' title='{&#92;mathop{&#92;mathbb P}=&#92;mathop{&#92;mathbb P}(&#92;beta,L)}&amp;fg=000000' class='latex' /> as follows:</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cmathop%7B%5Cmathbb+P%7D%28%5Cbeta%2CL%29%3D%5Cleft%5C%7B+p%5Cleft%26%23124%3Bp%5Cgeq0%2C%5Cint+p%28x%29dx%3D1%2C%5Ctext%7B+and+%7Dp%5Cin%5CSigma%28%5Cbeta%2CL%29%5Ctext%7B+on+%7D%7B%5Cmathbb+R%7D%5Cright.%5Cright%5C%7D+.+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;mathop{&#92;mathbb P}(&#92;beta,L)=&#92;left&#92;{ p&#92;left&#124;p&#92;geq0,&#92;int p(x)dx=1,&#92;text{ and }p&#92;in&#92;Sigma(&#92;beta,L)&#92;text{ on }{&#92;mathbb R}&#92;right.&#92;right&#92;} . &amp;fg=000000' title='&#92;displaystyle &#92;mathop{&#92;mathbb P}(&#92;beta,L)=&#92;left&#92;{ p&#92;left&#124;p&#92;geq0,&#92;int p(x)dx=1,&#92;text{ and }p&#92;in&#92;Sigma(&#92;beta,L)&#92;text{ on }{&#92;mathbb R}&#92;right.&#92;right&#92;} . &amp;fg=000000' class='latex' /></p>
<blockquote><p><strong>Proposition 2</strong> <em> Assume that <img src='http://s0.wp.com/latex.php?latex=%7Bp%5Cin%5Cmathop%7B%5Cmathbb+P%7D%28%5Cbeta%2CL%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{p&#92;in&#92;mathop{&#92;mathbb P}(&#92;beta,L)}&amp;fg=000000' title='{p&#92;in&#92;mathop{&#92;mathbb P}(&#92;beta,L)}&amp;fg=000000' class='latex' /> and let <img src='http://s0.wp.com/latex.php?latex=%7BK%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{K}&amp;fg=000000' title='{K}&amp;fg=000000' class='latex' /> be a kernel of order <img src='http://s0.wp.com/latex.php?latex=%7Bl%3D%5Clfloor%5Cbeta%5Crfloor%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{l=&#92;lfloor&#92;beta&#92;rfloor}&amp;fg=000000' title='{l=&#92;lfloor&#92;beta&#92;rfloor}&amp;fg=000000' class='latex' /> satisfying<br />
</em></p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cint%26%23124%3B+u%26%23124%3B%5E%7B%5Cbeta%7D%26%23124%3BK%28u%29%26%23124%3Bdu%26%2360%3B%5Cinfty.+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;int&#124; u&#124;^{&#92;beta}&#124;K(u)&#124;du&lt;&#92;infty. &amp;fg=000000' title='&#92;displaystyle &#92;int&#124; u&#124;^{&#92;beta}&#124;K(u)&#124;du&lt;&#92;infty. &amp;fg=000000' class='latex' /></p>
<p>Then for all <img src='http://s0.wp.com/latex.php?latex=%7Bx_%7B0%7D%5Cin%7B%5Cmathbb+R%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{x_{0}&#92;in{&#92;mathbb R}}&amp;fg=000000' title='{x_{0}&#92;in{&#92;mathbb R}}&amp;fg=000000' class='latex' />, <img src='http://s0.wp.com/latex.php?latex=%7Bh%26%2362%3B0%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{h&gt;0}&amp;fg=000000' title='{h&gt;0}&amp;fg=000000' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=%7Bn%5Cgeq1%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{n&#92;geq1}&amp;fg=000000' title='{n&#92;geq1}&amp;fg=000000' class='latex' /> we have</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%26%23124%3Bb%28x_%7B0%7D%29%26%23124%3B%5Cleq+C_%7B2%7Dh%5E%7B%5Cbeta%7D+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#124;b(x_{0})&#124;&#92;leq C_{2}h^{&#92;beta} &amp;fg=000000' title='&#92;displaystyle &#124;b(x_{0})&#124;&#92;leq C_{2}h^{&#92;beta} &amp;fg=000000' class='latex' /></p>
<p>where</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+C_%7B2%7D%3D%5Cfrac%7BL%7D%7Bl%21%7D%5Cint%26%23124%3B+u%26%23124%3B%5E%7B%5Cbeta%7D%26%23124%3BK%28u%29%26%23124%3Bdu.+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle C_{2}=&#92;frac{L}{l!}&#92;int&#124; u&#124;^{&#92;beta}&#124;K(u)&#124;du. &amp;fg=000000' title='&#92;displaystyle C_{2}=&#92;frac{L}{l!}&#92;int&#124; u&#124;^{&#92;beta}&#124;K(u)&#124;du. &amp;fg=000000' class='latex' /></p></blockquote>
<p><em>Proof:</em> We have</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cbegin%7Barray%7D%7Brl%7D+b%28x_%7B0%7D%29+%26%2338%3B+%3D%5Cdisplaystyle%5Cfrac%7B1%7D%7Bh%7D%5Cint+K%5Cleft%28%5Cfrac%7Bz-x_%7B0%7D%7D%7Bh%7D%5Cright%29p%28z%29dz-p%28x_%7B0%7D%29%5C%5C+%26%2338%3B+%3D%5Cdisplaystyle%5Cint+K%28u%29%5Cleft%5Bp%28x_%7B0%7D%2Buh%29-p%28x_%7B0%7D%29%5Cright%5Ddu.+%5Cend%7Barray%7D+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;begin{array}{rl} b(x_{0}) &amp; =&#92;displaystyle&#92;frac{1}{h}&#92;int K&#92;left(&#92;frac{z-x_{0}}{h}&#92;right)p(z)dz-p(x_{0})&#92;&#92; &amp; =&#92;displaystyle&#92;int K(u)&#92;left[p(x_{0}+uh)-p(x_{0})&#92;right]du. &#92;end{array} &amp;fg=000000' title='&#92;displaystyle &#92;begin{array}{rl} b(x_{0}) &amp; =&#92;displaystyle&#92;frac{1}{h}&#92;int K&#92;left(&#92;frac{z-x_{0}}{h}&#92;right)p(z)dz-p(x_{0})&#92;&#92; &amp; =&#92;displaystyle&#92;int K(u)&#92;left[p(x_{0}+uh)-p(x_{0})&#92;right]du. &#92;end{array} &amp;fg=000000' class='latex' /></p>
<p>Next, since <img src='http://s0.wp.com/latex.php?latex=%7BK%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{K}&amp;fg=000000' title='{K}&amp;fg=000000' class='latex' /> is a kernel of order <img src='http://s0.wp.com/latex.php?latex=%7Bl%3D%5Clfloor%5Cbeta%5Crfloor%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{l=&#92;lfloor&#92;beta&#92;rfloor}&amp;fg=000000' title='{l=&#92;lfloor&#92;beta&#92;rfloor}&amp;fg=000000' class='latex' /> and making a Taylor&#8217;s development of <img src='http://s0.wp.com/latex.php?latex=%7Bp%28x_%7B0%7D%2Buh%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{p(x_{0}+uh)}&amp;fg=000000' title='{p(x_{0}+uh)}&amp;fg=000000' class='latex' />, for <img src='http://s0.wp.com/latex.php?latex=%7B0%5Cleq%5Ctau%5Cleq1%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{0&#92;leq&#92;tau&#92;leq1}&amp;fg=000000' title='{0&#92;leq&#92;tau&#92;leq1}&amp;fg=000000' class='latex' />, we obtain,</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cbegin%7Barray%7D%7Brl%7D+b%28x_%7B0%7D%29+%26%2338%3B+%3D%5Cdisplaystyle%5Cint+K%28u%29%5Cleft%5Bp%28x_%7B0%7D%2Buh%29-p%28x_%7B0%7D%29%5Cright%5Ddu%5C%5C+%26%2338%3B+%3D+%5Cdisplaystyle%5Cint+K%28u%29%5Cleft%5Bp%28x_%7B0%7D%29%2Bp%5E%7B%5Cprime%7D%28x_%7B0%7D%29uh%2B%5Ccdots%2B%5Cfrac%7B%5Cleft%28uh%5Cright%29%5E%7Bl%7D%7D%7Bl%21%7Dp%5E%7B+%28l%29%7D%28x_%7B0%7D%2B%5Ctau+uh%29-p%28x_%7B0%7D%29%5Cright%5Ddu%5C%5C+%26%2338%3B+%3D%5Cdisplaystyle%5Cint+K%28u%29%5Cfrac%7B%5Cleft%28uh%5Cright%29%5E%7Bl%7D%7D%7Bl%21%7D%5Cleft%5Bp%5E%7B%28l%29%7D%28x_%7B0%7D%2B%5Ctau+uh%29-p%5E%7B%28l%29%7D%28x_%7B0%7D%29%5Cright%5Ddu+%5Cend%7Barray%7D+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;begin{array}{rl} b(x_{0}) &amp; =&#92;displaystyle&#92;int K(u)&#92;left[p(x_{0}+uh)-p(x_{0})&#92;right]du&#92;&#92; &amp; = &#92;displaystyle&#92;int K(u)&#92;left[p(x_{0})+p^{&#92;prime}(x_{0})uh+&#92;cdots+&#92;frac{&#92;left(uh&#92;right)^{l}}{l!}p^{ (l)}(x_{0}+&#92;tau uh)-p(x_{0})&#92;right]du&#92;&#92; &amp; =&#92;displaystyle&#92;int K(u)&#92;frac{&#92;left(uh&#92;right)^{l}}{l!}&#92;left[p^{(l)}(x_{0}+&#92;tau uh)-p^{(l)}(x_{0})&#92;right]du &#92;end{array} &amp;fg=000000' title='&#92;displaystyle &#92;begin{array}{rl} b(x_{0}) &amp; =&#92;displaystyle&#92;int K(u)&#92;left[p(x_{0}+uh)-p(x_{0})&#92;right]du&#92;&#92; &amp; = &#92;displaystyle&#92;int K(u)&#92;left[p(x_{0})+p^{&#92;prime}(x_{0})uh+&#92;cdots+&#92;frac{&#92;left(uh&#92;right)^{l}}{l!}p^{ (l)}(x_{0}+&#92;tau uh)-p(x_{0})&#92;right]du&#92;&#92; &amp; =&#92;displaystyle&#92;int K(u)&#92;frac{&#92;left(uh&#92;right)^{l}}{l!}&#92;left[p^{(l)}(x_{0}+&#92;tau uh)-p^{(l)}(x_{0})&#92;right]du &#92;end{array} &amp;fg=000000' class='latex' /></p>
<p>and</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cbegin%7Barray%7D%7Brl%7D+%26%23124%3B%7Bb%28x_%7B0%7D%29%7D%26%23124%3B+%26%2338%3B+%5Cdisplaystyle%5Cleq%5Cint%26%23124%3BK%28u%29%26%23124%3B%5Cfrac%7B%26%23124%3Buh%26%23124%3B%5E%7Bl%7D%7D%7Bl%21%7D%26%23124%3Bp%5E%7B%28l%29%7D%28x_%7B0%7D%2B%5Ctau+uh%29-p%5E%7B%28l%29%7D%28x_%7B0%7D%29%26%23124%3Bdu%5C%5C+%26%2338%3B+%5Cdisplaystyle%5Cleq+L%5Cint%26%23124%3B%7BK%28u%29%7D%26%23124%3B%5Cfrac%7B%26%23124%3B%7Buh%7D%26%23124%3B%5E%7Bl%7D%7D%7Bl%21%7D%26%23124%3B%7B%5Ctau+uh%7D%26%23124%3B%5E%7B%5Cbeta-l%7Ddu%5Cleq+C_%7B2%7Dh%5E%7B%5Cbeta%7D.+%5Cend%7Barray%7D+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;begin{array}{rl} &#124;{b(x_{0})}&#124; &amp; &#92;displaystyle&#92;leq&#92;int&#124;K(u)&#124;&#92;frac{&#124;uh&#124;^{l}}{l!}&#124;p^{(l)}(x_{0}+&#92;tau uh)-p^{(l)}(x_{0})&#124;du&#92;&#92; &amp; &#92;displaystyle&#92;leq L&#92;int&#124;{K(u)}&#124;&#92;frac{&#124;{uh}&#124;^{l}}{l!}&#124;{&#92;tau uh}&#124;^{&#92;beta-l}du&#92;leq C_{2}h^{&#92;beta}. &#92;end{array} &amp;fg=000000' title='&#92;displaystyle &#92;begin{array}{rl} &#124;{b(x_{0})}&#124; &amp; &#92;displaystyle&#92;leq&#92;int&#124;K(u)&#124;&#92;frac{&#124;uh&#124;^{l}}{l!}&#124;p^{(l)}(x_{0}+&#92;tau uh)-p^{(l)}(x_{0})&#124;du&#92;&#92; &amp; &#92;displaystyle&#92;leq L&#92;int&#124;{K(u)}&#124;&#92;frac{&#124;{uh}&#124;^{l}}{l!}&#124;{&#92;tau uh}&#124;^{&#92;beta-l}du&#92;leq C_{2}h^{&#92;beta}. &#92;end{array} &amp;fg=000000' class='latex' /></p>
<p style="text-align:right;"><img src='http://s0.wp.com/latex.php?latex=%5CBox%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;Box&amp;fg=000000' title='&#92;Box&amp;fg=000000' class='latex' /></p>
<p><strong> 2.3. Upper bound on the mean squared risk </strong></p>
<p>We see from before that the bias and variance behave in opposite direction. If we chose a small <img src='http://s0.wp.com/latex.php?latex=%7Bh%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{h}&amp;fg=000000' title='{h}&amp;fg=000000' class='latex' /> we get a large variance and for consequence an <em>undersmoothed</em> estimator. By the other side, if <img src='http://s0.wp.com/latex.php?latex=%7Bh%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{h}&amp;fg=000000' title='{h}&amp;fg=000000' class='latex' /> is large the bias cannot be controlled which lead to an <em>oversmoothed</em> estimator.</p>
<p>If the assumptions of Propositions <a href="#propbound_var_MSE">2</a> and <a href="#propbound_bias_MSE">2</a> hold, we get</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cmathrm%7BMSE%7D%5Cleq%5Cfrac%7BC_%7B1%7D%7D%7Bnh%7D%2BC_%7B2%7D%5E%7B2%7Dh%5E%7B2%5Cbeta%7D.+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;mathrm{MSE}&#92;leq&#92;frac{C_{1}}{nh}+C_{2}^{2}h^{2&#92;beta}. &amp;fg=000000' title='&#92;displaystyle &#92;mathrm{MSE}&#92;leq&#92;frac{C_{1}}{nh}+C_{2}^{2}h^{2&#92;beta}. &amp;fg=000000' class='latex' /></p>
<p>The minimum with respect to <img src='http://s0.wp.com/latex.php?latex=%7Bh%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{h}&amp;fg=000000' title='{h}&amp;fg=000000' class='latex' /> is attained at</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+h_%7Bn%7D%5E%7B%2A%7D%3D%5Cleft%28%5Cfrac%7BC_%7B1%7D%7D%7B2%5Cbeta+C_%7B2%7D%5E%7B2%7D%7D%5Cright%29%5E%7B%5Cfrac%7B1%7D%7B2%5Cbeta%2B1%7D%7Dn%5E%7B-%5Cfrac%7B1%7D%7B2%5Cbeta%2B1%7D%7D+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle h_{n}^{*}=&#92;left(&#92;frac{C_{1}}{2&#92;beta C_{2}^{2}}&#92;right)^{&#92;frac{1}{2&#92;beta+1}}n^{-&#92;frac{1}{2&#92;beta+1}} &amp;fg=000000' title='&#92;displaystyle h_{n}^{*}=&#92;left(&#92;frac{C_{1}}{2&#92;beta C_{2}^{2}}&#92;right)^{&#92;frac{1}{2&#92;beta+1}}n^{-&#92;frac{1}{2&#92;beta+1}} &amp;fg=000000' class='latex' /></p>
<p>and for <img src='http://s0.wp.com/latex.php?latex=%7Bh%3Dh_%7Bn%7D%5E%7B%2A%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{h=h_{n}^{*}}&amp;fg=000000' title='{h=h_{n}^{*}}&amp;fg=000000' class='latex' /></p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cmathrm%7BMSE%7D%28x_%7B0%7D%29%3DO%5Cleft%28n%5E%7B-%5Cfrac%7B2%5Cbeta%7D%7B2%5Cbeta%2B1%7D%7D%5Cright%29%2C%5Cquad+n%5Crightarrow%5Cinfty%2C+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;mathrm{MSE}(x_{0})=O&#92;left(n^{-&#92;frac{2&#92;beta}{2&#92;beta+1}}&#92;right),&#92;quad n&#92;rightarrow&#92;infty, &amp;fg=000000' title='&#92;displaystyle &#92;mathrm{MSE}(x_{0})=O&#92;left(n^{-&#92;frac{2&#92;beta}{2&#92;beta+1}}&#92;right),&#92;quad n&#92;rightarrow&#92;infty, &amp;fg=000000' class='latex' /></p>
<p>uniformly in <img src='http://s0.wp.com/latex.php?latex=%7Bx_%7B0%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{x_{0}}&amp;fg=000000' title='{x_{0}}&amp;fg=000000' class='latex' />. For <img src='http://s0.wp.com/latex.php?latex=%7B%5Calpha%26%2362%3B0%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;alpha&gt;0}&amp;fg=000000' title='{&#92;alpha&gt;0}&amp;fg=000000' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=%7Bh%3D%5Calpha+n%5E%7B-%5Cfrac%7B1%7D%7B2%5Cbeta%2B1%7D%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{h=&#92;alpha n^{-&#92;frac{1}{2&#92;beta+1}}}&amp;fg=000000' title='{h=&#92;alpha n^{-&#92;frac{1}{2&#92;beta+1}}}&amp;fg=000000' class='latex' />, those observations lead us to get</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Csup_%7Bx_%7B0%7D%5Cin%7B%5Cmathbb+R%7D%7D%5Csup_%7Bp%5Cin%5Cmathop%7B%5Cmathbb+P%7D%28%5Cbeta%2CL%29%7D%5Cmathop%7B%5Cmathbb+E%7D_p%7B%5Cleft%28%5Chat%7Bp%7D_%7Bn%7D+%28x_%7B0%7D%29-%28x_%7B0%7D%29%5Cright%29%5E%7B2%7D%7D+p%5Cleq+Cn%5E%7B-%5Cfrac%7B2%5Cbeta%7D%7B2%5Cbeta%2B1%7D%7D%2C+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;sup_{x_{0}&#92;in{&#92;mathbb R}}&#92;sup_{p&#92;in&#92;mathop{&#92;mathbb P}(&#92;beta,L)}&#92;mathop{&#92;mathbb E}_p{&#92;left(&#92;hat{p}_{n} (x_{0})-(x_{0})&#92;right)^{2}} p&#92;leq Cn^{-&#92;frac{2&#92;beta}{2&#92;beta+1}}, &amp;fg=000000' title='&#92;displaystyle &#92;sup_{x_{0}&#92;in{&#92;mathbb R}}&#92;sup_{p&#92;in&#92;mathop{&#92;mathbb P}(&#92;beta,L)}&#92;mathop{&#92;mathbb E}_p{&#92;left(&#92;hat{p}_{n} (x_{0})-(x_{0})&#92;right)^{2}} p&#92;leq Cn^{-&#92;frac{2&#92;beta}{2&#92;beta+1}}, &amp;fg=000000' class='latex' /></p>
<p>where <img src='http://s0.wp.com/latex.php?latex=%7BC%26%2362%3B0%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{C&gt;0}&amp;fg=000000' title='{C&gt;0}&amp;fg=000000' class='latex' /> is a constant depending only on <img src='http://s0.wp.com/latex.php?latex=%7B%5Cbeta%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;beta}&amp;fg=000000' title='{&#92;beta}&amp;fg=000000' class='latex' />, <img src='http://s0.wp.com/latex.php?latex=%7BL%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{L}&amp;fg=000000' title='{L}&amp;fg=000000' class='latex' />, <img src='http://s0.wp.com/latex.php?latex=%7B%5Calpha%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;alpha}&amp;fg=000000' title='{&#92;alpha}&amp;fg=000000' class='latex' /> and the kernel <img src='http://s0.wp.com/latex.php?latex=%7BK%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{K}&amp;fg=000000' title='{K}&amp;fg=000000' class='latex' />.</p>
<p>The next time, we will formalize the MISE (Mean Integrated Squared Error) and study its properties.</p>
<p><strong>If you have any comments/suggestions/improvements please let me know below.</strong></p>
<hr />
<p><strong>Source</strong>: Tsybakov, A. (2009). <em>Introduction to nonparametric estimation</em>. Springer.</p>
<p>&#160;</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Density Estimation by Histograms (Part IV)]]></title>
<link>http://maikolsolis.wordpress.com/2011/11/07/histograms-iv/</link>
<pubDate>Mon, 07 Nov 2011 10:11:43 +0000</pubDate>
<dc:creator>Maikol Solís</dc:creator>
<guid>http://maikolsolis.wordpress.com/2011/11/07/histograms-iv/</guid>
<description><![CDATA[Today we will apply the ideas of the others post by a simple example. Before, we are going to answer]]></description>
<content:encoded><![CDATA[<p>Today we will apply the ideas of the others post by a simple example. Before, we are going to answer the question of the last week.</p>
<p>What is exactly the <img src='http://s0.wp.com/latex.php?latex=%7Bh_%7Bopt%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{h_{opt}}&amp;fg=000000' title='{h_{opt}}&amp;fg=000000' class='latex' /> if we assume that</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cdisplaystyle+f%28x%29+%3D+%5Cfrac%7B1%7D%7B%5Csqrt%7B2%5Cpi%7D%7D+%5Ctext%7Bexp%7D%5Cleft%28%5Cfrac%7B-x%5E2%7D%7B2%7D%5Cright%29%3F+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;displaystyle f(x) = &#92;frac{1}{&#92;sqrt{2&#92;pi}} &#92;text{exp}&#92;left(&#92;frac{-x^2}{2}&#92;right)? &amp;fg=000000' title='&#92;displaystyle &#92;displaystyle f(x) = &#92;frac{1}{&#92;sqrt{2&#92;pi}} &#92;text{exp}&#92;left(&#92;frac{-x^2}{2}&#92;right)? &amp;fg=000000' class='latex' /></p>
<p>How <img src='http://s0.wp.com/latex.php?latex=%7Bf%28x%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{f(x)}&amp;fg=000000' title='{f(x)}&amp;fg=000000' class='latex' /> is the density of <a class="zem_slink" title="Normal distribution" href="http://en.wikipedia.org/wiki/Normal_distribution" rel="wikipedia">standard normal distribution</a>. It is easy to see that <img src='http://s0.wp.com/latex.php?latex=%7Bf%5E%5Cprime%28x%29%3D%28-x%29f%28x%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{f^&#92;prime(x)=(-x)f(x)}&amp;fg=000000' title='{f^&#92;prime(x)=(-x)f(x)}&amp;fg=000000' class='latex' />, so we have</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cdisplaystyle+%5C%26%23124%3Bf%5E%5Cprime+%5C%26%23124%3B_2%5E2%3D%5Cfrac%7B1%7D%7B%5Csqrt%7B4%5Cpi%7D%7D%5Cint+x%5E2+%5Cfrac%7B1%7D%7B%5Csqrt%7B2%5Cpi%7D%7D+%5Cfrac%7B1%7D%7B%5Csqrt%7B%5Cfrac%7B1%7D%7B2%7D%7D%7D+%5Ctext%7Bexp%7D%28-x%5E2%29dx.+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;displaystyle &#92;&#124;f^&#92;prime &#92;&#124;_2^2=&#92;frac{1}{&#92;sqrt{4&#92;pi}}&#92;int x^2 &#92;frac{1}{&#92;sqrt{2&#92;pi}} &#92;frac{1}{&#92;sqrt{&#92;frac{1}{2}}} &#92;text{exp}(-x^2)dx. &amp;fg=000000' title='&#92;displaystyle &#92;displaystyle &#92;&#124;f^&#92;prime &#92;&#124;_2^2=&#92;frac{1}{&#92;sqrt{4&#92;pi}}&#92;int x^2 &#92;frac{1}{&#92;sqrt{2&#92;pi}} &#92;frac{1}{&#92;sqrt{&#92;frac{1}{2}}} &#92;text{exp}(-x^2)dx. &amp;fg=000000' class='latex' /><!--more--></p>
<p>We recognized the integral as the variance of a random variable with mean equal to 0 and variance equal to <img src='http://s0.wp.com/latex.php?latex=%7B%5Csqrt%7B1%2F2%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;sqrt{1/2}}&amp;fg=000000' title='{&#92;sqrt{1/2}}&amp;fg=000000' class='latex' />. Then we get,</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cdisplaystyle+%5C%26%23124%3Bf%5E%5Cprime+%5C%26%23124%3B_2%5E2+%3D+%5Cfrac%7B1%7D%7B%5Csqrt%7B4%5Cpi%7D%7D+%5Cfrac%7B1%7D%7B2%7D+%3D+%5Cfrac%7B1%7D%7B4%5Csqrt%7B%5Cpi%7D%7D.+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;displaystyle &#92;&#124;f^&#92;prime &#92;&#124;_2^2 = &#92;frac{1}{&#92;sqrt{4&#92;pi}} &#92;frac{1}{2} = &#92;frac{1}{4&#92;sqrt{&#92;pi}}. &amp;fg=000000' title='&#92;displaystyle &#92;displaystyle &#92;&#124;f^&#92;prime &#92;&#124;_2^2 = &#92;frac{1}{&#92;sqrt{4&#92;pi}} &#92;frac{1}{2} = &#92;frac{1}{4&#92;sqrt{&#92;pi}}. &amp;fg=000000' class='latex' /></p>
<p>Finally, using the results of <a title="Density Estimation by Histograms (Part III)" href="http://maikolsolis.wordpress.com/2011/11/01/density-estimation-by-histograms-part-iii/">Part III</a>,</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cdisplaystyle+h_%7Bopt%7D%3D+%5Cleft%28%5Cfrac%7B6%7D%7Bn%5C%26%23124%3Bf%5E%5Cprime+%5C%26%23124%3B_2%5E2%7D%5Cright%29+%3D+%5Cleft%28%5Cfrac%7B24%5Csqrt%7B%5Cpi%7D%7D%7Bn%7D%5Cright%29%5Capprox+3.5+n%5E%7B-1%2F3%7D.+%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;displaystyle h_{opt}= &#92;left(&#92;frac{6}{n&#92;&#124;f^&#92;prime &#92;&#124;_2^2}&#92;right) = &#92;left(&#92;frac{24&#92;sqrt{&#92;pi}}{n}&#92;right)&#92;approx 3.5 n^{-1/3}. &amp;fg=000000' title='&#92;displaystyle &#92;displaystyle h_{opt}= &#92;left(&#92;frac{6}{n&#92;&#124;f^&#92;prime &#92;&#124;_2^2}&#92;right) = &#92;left(&#92;frac{24&#92;sqrt{&#92;pi}}{n}&#92;right)&#92;approx 3.5 n^{-1/3}. &amp;fg=000000' class='latex' /></p>
<p>This <img src='http://s0.wp.com/latex.php?latex=%7Bh_%7Bopt%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{h_{opt}}&amp;fg=000000' title='{h_{opt}}&amp;fg=000000' class='latex' /> is not useful in many cases, but it could work as a rule-of-thumb bindwidth.</p>
<p>For the example, we will use a sample of 1000 normal distributed numbers and we will try to use all the theory seen before. I did this little script in <a href="http://en.wikipedia.org/wiki/R_(programming_language)">R</a> to compute the MSE, MISE and plot the bias-variance trade-offs  between them (Note: this is my very first script in R so I will appreciate any comment ).</p>
<pre class="brush: r; title: ; notranslate" title="">
&#60;pre&#62;# Function to get the breaks
breaks &#60;- function(x,h){
  b = floor(min(x)/h) : ceiling(max(x)/h)
  b = b*h
  return(b)
}
#Generate 1000 numbers with normal standard distribution
x=rnorm(1000)
n=length(x);
# Point to evaluate the MSE
x0 = 0
# Real value of f(x0)
f_x0 = dnorm(x0);
# &#124;&#124; f' &#124;&#124;_2^2 for a normal standard distribution
norm_f_prime = 1/(4*sqrt(pi));
#Sequence of bins
hvec = seq(0.1,0.7,by=0.0005)
Bias = numeric();
Var = numeric();
MSE = numeric();
Bias_MISE = numeric();
Var_MISE = numeric();
MISE = numeric();

par(mfrow=c(1,2))
hist(x,breaks=breaks(x,0.001),freq=F,xlab =&#34;Bandwidth with h=0.001&#34;)
hist(x,breaks=breaks(x,2),freq=F,xlab =&#34;Bandwidth with h=2&#34;)
par(mfrow=c(1,1))

for(h in hvec){
  xhist = hist(x,breaks=breaks(x,h),plot=FALSE)
  # Average of bins near x0
  bins_near_x0 = xhist$breaks=x0-h;
  p = mean(xhist$counts[bins_near_x0])/n;
  # Expectation of \hat{f} in x0
  E_fhat_x0 = p/h;
  #Compute of Bias, Var, MSE and MISE
  Bias = c(Bias,E_fhat_x0 - f_x0);
  Var = c(Var,(p*(1-p))/(n*h^2))
  MSE = c(MSE,tail(Var,1) + tail(Bias,1)^2)
  Var_MISE = c(Var_MISE,1/(n*h))
  Bias_MISE = c(Bias_MISE,h^2 * norm_f_prime /12)
  MISE = c(MISE,tail(Var_MISE,1) + tail(Bias_MISE,1))
}

#Tradeoff MSE in x0
max_range = range(Bias^2,Var,MSE)
plot(hvec,MSE,ylim=max_range,type=&#34;l&#34;,col=&#34;blue&#34;,lwd=3, xlab=&#34;Bandwidth h&#34;)
lines(hvec,Bias^2,type=&#34;l&#34;,lty=2,col=&#34;red&#34;,lwd=3)
lines(hvec,Var,type=&#34;S&#34;,lty=6,col=&#34;black&#34;,lwd=3)
legend(x=&#34;topleft&#34;,max_range,legend=c(&#34;MSE&#34;,&#34;Bias^2&#34;,&#34;Var&#34;),col=c(&#34;blue&#34;,&#34;red&#34;,&#34;black&#34;),lwd=3,lty=c(1,2,6))

# Tradeoff MISE
max_range = range(Bias_MISE,Var_MISE,MISE)
plot(hvec,MISE,ylim=max_range,type=&#34;l&#34;,col=&#34;blue&#34;,lwd=3,xlab=&#34;Bandwidth h&#34;)
lines(hvec,Bias_MISE,type=&#34;l&#34;,lty=5,col=&#34;red&#34;,lwd=3)
lines(hvec,Var_MISE,type=&#34;S&#34;,lty=6,col=&#34;black&#34;,lwd=3)
legend(x=&#34;topleft&#34;,max_range,legend=c(&#34;MISE&#34;,&#34;Bias^2_MISE&#34;,&#34;Var_MISE&#34;),col=c(&#34;blue&#34;,&#34;red&#34;,&#34;black&#34;),lwd=3,lty=c(1,2,6))

# h optimal for the point x0
h_x0=hvec[MSE==min(MSE)]

# h optimal for any point using the minimal MISE
h_opt_MISE=hvec[MISE==min(MISE)]

# h optimal for any point using the rule-of-thumb
h_opt = (6/(n*norm_f_prime))^(1/3)# histogram with h_opt
breaks = floor((min(x))/h_opt):ceiling((max(x))/h_opt)
breaks = breaks*h_opt#Plot the histogram
h=hist(x,breaks=breaks,freq=FALSE,ylim=c(0,0.5))
lines(sort(x),dnorm(sort(x)),type=&#34;l&#34;)
</pre>
<p>To start notice that if the binwidth is too small or too large we will get a very bad approximation. As we can see, in the next plot, we will get a terrible fitting of the histogram for <img src='http://s0.wp.com/latex.php?latex=h%3D0.01&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='h=0.01' title='h=0.01' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=h%3D2&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='h=2' title='h=2' class='latex' />:</p>
<div id="attachment_191" class="wp-caption aligncenter" style="width: 310px"><a href="http://maikolsolis.files.wordpress.com/2011/11/comparison_hist.png"><img class="size-medium wp-image-191 " title="Comparison of Histograms with diferents Binwidths" alt="Comparison of Histograms with diferents Binwidths" src="http://maikolsolis.files.wordpress.com/2011/11/comparison_hist.png?w=300&#038;h=189" width="300" height="189" /></a><p class="wp-caption-text">Comparison of Histograms with different Binwidths</p></div>
<p>The plots of the MSE and MISE trade-offs are respectively</p>
<div id="attachment_172" class="wp-caption aligncenter" style="width: 310px"><a href="http://maikolsolis.files.wordpress.com/2011/11/hist_mse.png"><img class="size-medium wp-image-172  " title="Trade-off Bias-Variance of Histogram in x0=0" alt="Trade-off Bias-Variance of Histogram in x0=0" src="http://maikolsolis.files.wordpress.com/2011/11/hist_mse.png?w=300&#038;h=189" width="300" height="189" /></a><p class="wp-caption-text">Trade-off Bias-Variance of Histogram in x0=0</p></div>
<div id="attachment_174" class="wp-caption aligncenter" style="width: 310px"><a href="http://maikolsolis.files.wordpress.com/2011/11/hist_mise.png"><img class="size-medium wp-image-174" title="Trade-off Bias-Variance for MISE" alt="Trade-off Bias-Variance for MISE" src="http://maikolsolis.files.wordpress.com/2011/11/hist_mise.png?w=300&#038;h=189" width="300" height="189" /></a><p class="wp-caption-text">Trade-off Bias-Variance for MISE</p></div>
<p>For the MSE in <img src='http://s0.wp.com/latex.php?latex=x_0%3D0&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_0=0' title='x_0=0' class='latex' /> the optimal value is <img src='http://s0.wp.com/latex.php?latex=h%3D0.255&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='h=0.255' title='h=0.255' class='latex' />. In the case of MISE the optimal value by minimization is <img src='http://s0.wp.com/latex.php?latex=h%3D0.349&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='h=0.349' title='h=0.349' class='latex' /> and the real value, using the formula seen before, is <img src='http://s0.wp.com/latex.php?latex=h%3D0.349083021225025&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='h=0.349083021225025' title='h=0.349083021225025' class='latex' />. Finally, we get the best fitted histogram:</p>
<div id="attachment_173" class="wp-caption aligncenter" style="width: 310px"><a href="http://maikolsolis.files.wordpress.com/2011/11/hist_final.png"><img class="size-medium wp-image-173" title="Final Histogram" alt="Final Histogram" src="http://maikolsolis.files.wordpress.com/2011/11/hist_final.png?w=300&#038;h=189" width="300" height="189" /></a><p class="wp-caption-text">Histogram of 1000 numbers normal standard distributed with h=0.349083021225025</p></div>
<p>The next week, we will start with <a class="zem_slink" title="Density estimation" href="http://en.wikipedia.org/wiki/Density_estimation" rel="wikipedia">density estimation</a> using kernels more generals and we will see that the histogram is, in fact, a particular case of one of them.</p>
<h6 class="zemanta-related-title" style="font-size:1em;">Related articles</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://maikolsolis.wordpress.com/2011/10/23/density-estimation-by-histograms-part-ii/">Density Estimation by Histograms (Part II)</a> (maikolsolis.wordpress.com)</li>
<li class="zemanta-article-ul-li"><a href="http://maikolsolis.wordpress.com/2011/11/01/density-estimation-by-histograms-part-iii/">Density Estimation by Histograms (Part III)</a> (maikolsolis.wordpress.com)</li>
</ul>
<div></div>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Importance of nonparametric statistics in regression.]]></title>
<link>http://maikolsolis.wordpress.com/2011/10/09/nonparametric-regression/</link>
<pubDate>Sun, 09 Oct 2011 21:19:53 +0000</pubDate>
<dc:creator>Maikol Solís</dc:creator>
<guid>http://maikolsolis.wordpress.com/2011/10/09/nonparametric-regression/</guid>
<description><![CDATA[Example of linear regression I would like to start with some basic ideas about density estimation an]]></description>
<content:encoded><![CDATA[<div class="zemanta-img">
<div class="wp-caption alignleft" style="width: 250px"><a href="http://en.wikipedia.org/wiki/File:Linear_regression.png"><img class="zemanta-img-configured " title="Example of linear regression with one independ..." alt="Example of linear regression with one independ..." src="http://upload.wikimedia.org/wikipedia/en/thumb/1/13/Linear_regression.png/300px-Linear_regression.png" height="166" width="240" /></a><p class="wp-caption-text">Example of linear regression</p></div>
</div>
<p>I would like to start with some basic ideas about <a class="zem_slink" title="Density estimation" href="http://en.wikipedia.org/wiki/Density_estimation" rel="wikipedia">density estimation</a> and nonparametric regression.</p>
<p>The study of the <a class="zem_slink" title="Probability density function" href="http://en.wikipedia.org/wiki/Probability_density_function" rel="wikipedia">probability density function</a> (pdf) is called <a class="zem_slink" title="Non-parametric statistics" href="http://en.wikipedia.org/wiki/Non-parametric_statistics" rel="wikipedia">nonparametric estimation</a>. This kind of estimation can serve as a block building in nonparametric regression.<!--more--></p>
<p>Let us the typical <a class="zem_slink" title="Linear regression" href="http://en.wikipedia.org/wiki/Linear_regression" rel="wikipedia">linear regression</a> problem. Assume that we have a set of explanatory variables <img src='http://s0.wp.com/latex.php?latex=%7BX_1%2C%5Cldots%2CX_d%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X_1,&#92;ldots,X_d}&amp;fg=000000' title='{X_1,&#92;ldots,X_d}&amp;fg=000000' class='latex' /> and an <a class="zem_slink" title="Dependent and independent variables" href="http://en.wikipedia.org/wiki/Dependent_and_independent_variables" rel="wikipedia">explained variable</a> <img src='http://s0.wp.com/latex.php?latex=%7BY%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{Y}&amp;fg=000000' title='{Y}&amp;fg=000000' class='latex' /> related in the following way:</p>
<p>&#160;</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+Y%3DX%5E%7B%5Ctop%7D%5Cbeta%2B%5Cvarepsilon+%5C+%5C+%5C+%5C+%5C+%281%29%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle Y=X^{&#92;top}&#92;beta+&#92;varepsilon &#92; &#92; &#92; &#92; &#92; (1)&amp;fg=000000' title='&#92;displaystyle Y=X^{&#92;top}&#92;beta+&#92;varepsilon &#92; &#92; &#92; &#92; &#92; (1)&amp;fg=000000' class='latex' /></p>
<p>&#160;</p>
<p>where <img src='http://s0.wp.com/latex.php?latex=%7B%5Cvarepsilon%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;varepsilon}&amp;fg=000000' title='{&#92;varepsilon}&amp;fg=000000' class='latex' /> is independent of <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=%7B%5Cmathop%7B%5Cmathbb+E%7D%5B%5Cvarepsilon%5D%3D0%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;mathop{&#92;mathbb E}[&#92;varepsilon]=0}&amp;fg=000000' title='{&#92;mathop{&#92;mathbb E}[&#92;varepsilon]=0}&amp;fg=000000' class='latex' />. We can also see the model <a href="#eqlinear_model">1</a> as,</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cmathop%7B%5Cmathbb+E%7D%5BX%5Cvert+Y%5D%3DX_%7B1%7D%5Cbeta_%7B1%7D%2B%5Ccdots%2BX_%7Bd%7D%5Cbeta_%7Bd%7D%3D%5Cmathbf%7BX%7D%5E%7B%5Ctop%7D%5Cbeta.+%5C+%5C+%5C+%5C+%5C+%282%29%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;mathop{&#92;mathbb E}[X&#92;vert Y]=X_{1}&#92;beta_{1}+&#92;cdots+X_{d}&#92;beta_{d}=&#92;mathbf{X}^{&#92;top}&#92;beta. &#92; &#92; &#92; &#92; &#92; (2)&amp;fg=000000' title='&#92;displaystyle &#92;mathop{&#92;mathbb E}[X&#92;vert Y]=X_{1}&#92;beta_{1}+&#92;cdots+X_{d}&#92;beta_{d}=&#92;mathbf{X}^{&#92;top}&#92;beta. &#92; &#92; &#92; &#92; &#92; (2)&amp;fg=000000' class='latex' /></p>
<p>&#160;</p>
<p>Here <img src='http://s0.wp.com/latex.php?latex=%7B%5Cmathop%7B%5Cmathbb+E%7D%5BX%5Cvert+Y%5D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;mathop{&#92;mathbb E}[X&#92;vert Y]}&amp;fg=000000' title='{&#92;mathop{&#92;mathbb E}[X&#92;vert Y]}&amp;fg=000000' class='latex' /> is the conditional expectation of <img src='http://s0.wp.com/latex.php?latex=%7BY%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{Y}&amp;fg=000000' title='{Y}&amp;fg=000000' class='latex' /> given <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' />. It is clear  that the model <a href="#eqcond_model">2</a> is not enough for many complex problems. To tackle this situation we will transform <a href="#eqcond_model">2</a> into the model:</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cmathop%7B%5Cmathbb+E%7D%5BX%5Cvert+Y%5D%3Dm%28%5Cmathbf%7BX%7D%29%2C+%5C+%5C+%5C+%5C+%5C+%283%29%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;mathop{&#92;mathbb E}[X&#92;vert Y]=m(&#92;mathbf{X}), &#92; &#92; &#92; &#92; &#92; (3)&amp;fg=000000' title='&#92;displaystyle &#92;mathop{&#92;mathbb E}[X&#92;vert Y]=m(&#92;mathbf{X}), &#92; &#92; &#92; &#92; &#92; (3)&amp;fg=000000' class='latex' /></p>
<p>Here <img src='http://s0.wp.com/latex.php?latex=%7Bm%28%5Cmathbf%7BX%7D%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{m(&#92;mathbf{X})}&amp;fg=000000' title='{m(&#92;mathbf{X})}&amp;fg=000000' class='latex' /> is the true, unknown <a class="zem_slink" title="Regression analysis" href="http://en.wikipedia.org/wiki/Regression_analysis" rel="wikipedia">regression function</a>.</p>
<p>Just to get the things in perspective. Let <img src='http://s0.wp.com/latex.php?latex=%7B%5Cmathbf%7BX%7D%3D%28X_1%2CX_2%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;mathbf{X}=(X_1,X_2)}&amp;fg=000000' title='{&#92;mathbf{X}=(X_1,X_2)}&amp;fg=000000' class='latex' /> and assume that the model is</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%5Cmathop%7B%5Cmathbb+E%7D%5BX%5Cvert+Y%5D%3D%5Cbeta_1+X_1+%2B+%5Cbeta_2+X_2+%2B%5Cbeta_3+X_2%5E2.+%5C+%5C+%5C+%5C+%5C+%284%29%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle &#92;mathop{&#92;mathbb E}[X&#92;vert Y]=&#92;beta_1 X_1 + &#92;beta_2 X_2 +&#92;beta_3 X_2^2. &#92; &#92; &#92; &#92; &#92; (4)&amp;fg=000000' title='&#92;displaystyle &#92;mathop{&#92;mathbb E}[X&#92;vert Y]=&#92;beta_1 X_1 + &#92;beta_2 X_2 +&#92;beta_3 X_2^2. &#92; &#92; &#92; &#92; &#92; (4)&amp;fg=000000' class='latex' /></p>
<p>&#160;</p>
<p>Now given a data sample, suppose that you have to estimate <img src='http://s0.wp.com/latex.php?latex=%7B%5Cmathop%7B%5Cmathbb+E%7D%5BX%5Cvert+Y%5D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;mathop{&#92;mathbb E}[X&#92;vert Y]}&amp;fg=000000' title='{&#92;mathop{&#92;mathbb E}[X&#92;vert Y]}&amp;fg=000000' class='latex' /> as accurately as possible in <em><strong>one</strong></em> trial. That means, you can not change the model if the data does not fit well.</p>
<p>Of course you can simply use <a href="#eqexample">4</a>, but in general you do not know any information about the function <img src='http://s0.wp.com/latex.php?latex=%7Bm%28%5Ccdot%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{m(&#92;cdot)}&amp;fg=000000' title='{m(&#92;cdot)}&amp;fg=000000' class='latex' /> (except that it is a <a class="zem_slink" title="Smooth function" href="http://en.wikipedia.org/wiki/Smooth_function" rel="wikipedia">smooth function</a>). We call this type of regression like nonparametric and we will return to it later.</p>
<p>In the next post we will study the nonparametric density estimation and its importance.</p>
<p>&#160;</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Random Fourier Features for Kernel Density Estimation]]></title>
<link>http://mlstat.wordpress.com/2010/10/04/random-fourier-features-for-kernel-density-estimation/</link>
<pubDate>Mon, 04 Oct 2010 22:41:17 +0000</pubDate>
<dc:creator>mlstat</dc:creator>
<guid>http://mlstat.wordpress.com/2010/10/04/random-fourier-features-for-kernel-density-estimation/</guid>
<description><![CDATA[The NIPS paper Random Fourier Features for Large-scale Kernel Machines, by Rahimi and Recht presents]]></description>
<content:encoded><![CDATA[<p>The NIPS paper <a href="http://pages.cs.wisc.edu/~brecht/papers/07.rah.rec.nips.pdf" target="_blank">Random Fourier Features for Large-scale Kernel Machines</a>, by Rahimi and Recht presents a method for randomized feature mapping where dot products in the transformed feature space approximate (a certain class of) positive definite (p.d.) kernels in the original space.</p>
<p>We know that for any p.d. kernel there exists a <em>deterministic</em> map that has the aforementioned property but it may be infinite dimensional. The paper presents results indicating that with the randomized map we  can get away with only a &#8220;small&#8221; number of features (at least for a classification setting).</p>
<p>Before applying the method to density estimation let us review the relevant section of the paper briefly.</p>
<p><strong>Bochner&#8217;s Theorem and Random Fourier Features</strong></p>
<p>Assume that we have data in <img src='http://s0.wp.com/latex.php?latex=R%5Ed&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='R^d' title='R^d' class='latex' /> and a continuous p.d. kernel <img src='http://s0.wp.com/latex.php?latex=K%28x%2Cy%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='K(x,y)' title='K(x,y)' class='latex' /> defined for every pair of points <img src='http://s0.wp.com/latex.php?latex=x%2Cy+%5Cin+R%5Ed&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x,y &#92;in R^d' title='x,y &#92;in R^d' class='latex' />. Assume further that the kernel is shift-invariant, i.e., <img src='http://s0.wp.com/latex.php?latex=K%28x%2Cy%29+%3D+K%28x-y%29+%5Ctriangleq+K%28%5Cdelta%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='K(x,y) = K(x-y) &#92;triangleq K(&#92;delta)' title='K(x,y) = K(x-y) &#92;triangleq K(&#92;delta)' class='latex' /> and that the kernel is scaled so that <img src='http://s0.wp.com/latex.php?latex=K%280%29+%3D+1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='K(0) = 1' title='K(0) = 1' class='latex' />.</p>
<p>The theorem by Bochner states that under the above conditions <img src='http://s0.wp.com/latex.php?latex=K%28%5Cdelta%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='K(&#92;delta)' title='K(&#92;delta)' class='latex' /> must be the Fourier transform of a non-negative measure on <img src='http://s0.wp.com/latex.php?latex=R%5Ed&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='R^d' title='R^d' class='latex' />. In other words, there exists a probability density function <img src='http://s0.wp.com/latex.php?latex=p%28%5Cdelta%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(&#92;delta)' title='p(&#92;delta)' class='latex' /> for <img src='http://s0.wp.com/latex.php?latex=%5Cdelta+%5Cin+R%5Ed&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;delta &#92;in R^d' title='&#92;delta &#92;in R^d' class='latex' /> such that <img src='http://s0.wp.com/latex.php?latex=K%28%5Cdelta%29+%3D+%5Cmathcal%7BF%7D%28p%28%5Cdelta%29%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='K(&#92;delta) = &#92;mathcal{F}(p(&#92;delta))' title='K(&#92;delta) = &#92;mathcal{F}(p(&#92;delta))' class='latex' />.</p>
<p style="text-align:center;"><a href="http://mlstat.files.wordpress.com/2010/10/fourier1.png"></a><a href="http://mlstat.files.wordpress.com/2010/10/fourier11.png"><img class="aligncenter size-full wp-image-477" title="fourier1" src="http://mlstat.files.wordpress.com/2010/10/fourier11.png?w=545&#038;h=198" alt="" width="545" height="198" /></a></p>
<p>where (1) is because <img src='http://s0.wp.com/latex.php?latex=K%28.%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='K(.)' title='K(.)' class='latex' /> is real. Equation (2) says that if we draw a random vector <img src='http://s0.wp.com/latex.php?latex=w&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='w' title='w' class='latex' /> according to <img src='http://s0.wp.com/latex.php?latex=p%28w%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(w)' title='p(w)' class='latex' /> and form two vectors <img src='http://s0.wp.com/latex.php?latex=%5Cphi%28x%29+%3D+%28cos%28w%5ET+x%29%2C+sin%28w%5ET+x%29%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;phi(x) = (cos(w^T x), sin(w^T x))' title='&#92;phi(x) = (cos(w^T x), sin(w^T x))' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=%5Cphi%28y%29+%3D+%28cos%28w%5ET+y%29%2C+sin%28w%5ET+y%29%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;phi(y) = (cos(w^T y), sin(w^T y))' title='&#92;phi(y) = (cos(w^T y), sin(w^T y))' class='latex' />, then the expected value of <img src='http://s0.wp.com/latex.php?latex=%26%2360%3B%5Cphi%28x%29%2C%5Cphi%28y%29%26%2362%3B&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&lt;&#92;phi(x),&#92;phi(y)&gt;' title='&lt;&#92;phi(x),&#92;phi(y)&gt;' class='latex' /> is <img src='http://s0.wp.com/latex.php?latex=K%28x-y%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='K(x-y)' title='K(x-y)' class='latex' />.</p>
<p>Therefore, for <img src='http://s0.wp.com/latex.php?latex=x+%5Cin+R%5Ed&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x &#92;in R^d' title='x &#92;in R^d' class='latex' />, if we choose the transformation</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cphi%28x%29+%3D+%5Cfrac%7B1%7D%7B%5Csqrt%7BD%7D%7D+%28cos%28w_1%5ET+x%29%2C+sin%28w_1%5ET+x%29%2C+cos%28w_2%5ET+x%29%2C+sin%28w_2%5ET+x%29%2C+%5Cldots%2C+cos%28w_D%5ET+x%29%2C+sin%28w_D%5ET+x%29%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;phi(x) = &#92;frac{1}{&#92;sqrt{D}} (cos(w_1^T x), sin(w_1^T x), cos(w_2^T x), sin(w_2^T x), &#92;ldots, cos(w_D^T x), sin(w_D^T x))' title='&#92;phi(x) = &#92;frac{1}{&#92;sqrt{D}} (cos(w_1^T x), sin(w_1^T x), cos(w_2^T x), sin(w_2^T x), &#92;ldots, cos(w_D^T x), sin(w_D^T x))' class='latex' /></p>
<p>with <img src='http://s0.wp.com/latex.php?latex=w_1%2C%5Cldots%2C+w_D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='w_1,&#92;ldots, w_D' title='w_1,&#92;ldots, w_D' class='latex' /> drawn according to <img src='http://s0.wp.com/latex.php?latex=p%28w%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p(w)' title='p(w)' class='latex' />, linear inner products in this transformed space will approximate <img src='http://s0.wp.com/latex.php?latex=K%28.%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='K(.)' title='K(.)' class='latex' />.</p>
<p><strong>Gaussian RBF Kernel</strong></p>
<p>The Gaussian radial basis function kernel satisfies all the above conditions and we know that the Fourier transform of the Gaussian is another Gaussian (with the reciprocal variance). Therefore for &#8220;linearizing&#8221; the Gaussian r.b.f. kernel, we draw <img src='http://s0.wp.com/latex.php?latex=D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='D' title='D' class='latex' /> samples from a Gaussian distribution for the transformation.</p>
<p><strong>Parzen Window Density Estimation</strong></p>
<p>Given a data  set  <img src='http://s0.wp.com/latex.php?latex=%5C%7Bx_1%2C+x_2%2C+%5Cldots%2C+x_N%5C%7D+%5Csubset+R%5Ed&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;{x_1, x_2, &#92;ldots, x_N&#92;} &#92;subset R^d' title='&#92;{x_1, x_2, &#92;ldots, x_N&#92;} &#92;subset R^d' class='latex' />, the the so-called Parzen window probability density estimator is defined as follows</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Chat%7Bp%7D%28x%29+%5Cpropto+%5Cfrac%7B1%7D%7BN%7D+%5Csum_i+K%28%28x-x_i%29%2Fh%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;hat{p}(x) &#92;propto &#92;frac{1}{N} &#92;sum_i K((x-x_i)/h)' title='&#92;hat{p}(x) &#92;propto &#92;frac{1}{N} &#92;sum_i K((x-x_i)/h)' class='latex' /></p>
<p>where <img src='http://s0.wp.com/latex.php?latex=K%28.%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='K(.)' title='K(.)' class='latex' /> is often a positive, symmetric, shift-invariant kernel and <img src='http://s0.wp.com/latex.php?latex=h&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='h' title='h' class='latex' /> is the bandwidth parameter that controls the scale of influence of the data points.</p>
<p>A common kernel that is used for Parzen window density estimation is the Gaussian density. If we make the same choice we can apply our feature transformation to linearize the procedure. We have</p>
<p><a href="http://mlstat.files.wordpress.com/2010/10/fourier2.png"><img class="aligncenter size-full wp-image-495" title="fourier2" src="http://mlstat.files.wordpress.com/2010/10/fourier2.png?w=294&#038;h=175" alt="" width="294" height="175" /></a></p>
<p>where <img src='http://s0.wp.com/latex.php?latex=h&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='h' title='h' class='latex' /> has been absorbed into the kernel variance.</p>
<p>Therefore all we need to do is take the mean of the transformed data points and estimate the pdf at a new point to be (proportional to) the inner product its transformed feature vector with the mean.</p>
<p>Of course since the kernel value is only approximated by the inner product of the random Fourier features we expect that the estimate pdf will differ from a plain unadorned Parzen window estimate.  But different how?</p>
<p><strong>Experiments</strong></p>
<p>Below are some pictures showing how the method performs on some synthetic data. I generated a few dozen points from a mixture of Gaussians and plotted contours of the estimated pdf for the region around the points. I did this for several choices of <img src='http://s0.wp.com/latex.php?latex=D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='D' title='D' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=%5Cgamma&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;gamma' title='&#92;gamma' class='latex' /> (the scale parameter for the Gaussian kernel).</p>
<p>First let us check that the method performs as expected for large values of <img src='http://s0.wp.com/latex.php?latex=D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='D' title='D' class='latex' /> because the kernel value is well approximated by the inner product of the Fourier features. The first 3 pictures are for <img src='http://s0.wp.com/latex.php?latex=D+%3D+10000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='D = 10000' title='D = 10000' class='latex' /> for various values of <img src='http://s0.wp.com/latex.php?latex=%5Cgamma&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;gamma' title='&#92;gamma' class='latex' />.</p>
<div id="attachment_503" class="wp-caption aligncenter" style="width: 460px"><a href="http://mlstat.files.wordpress.com/2010/10/k2d10000.png"><img class="size-full wp-image-503" title="K2D10000" src="http://mlstat.files.wordpress.com/2010/10/k2d10000.png?w=450&#038;h=339" alt="" width="450" height="339" /></a><p class="wp-caption-text">D = 10000 and gamma = 2.0</p></div>
<div id="attachment_501" class="wp-caption aligncenter" style="width: 460px"><a href="http://mlstat.files.wordpress.com/2010/10/k1d10000.png"><img class="size-full wp-image-501  " title="k1D10000" src="http://mlstat.files.wordpress.com/2010/10/k1d10000.png?w=450&#038;h=339" alt="" width="450" height="339" /></a><p class="wp-caption-text"> D = 10000 and gamma = 1.0</p></div>
<div id="attachment_502" class="wp-caption aligncenter" style="width: 460px"><a href="http://mlstat.files.wordpress.com/2010/10/kp5d10000.png"><img class="size-full wp-image-502" title="Kp5D10000" src="http://mlstat.files.wordpress.com/2010/10/kp5d10000.png?w=450&#038;h=339" alt="" width="450" height="339" /></a><p class="wp-caption-text">D = 10000  and gamma = 0.5</p></div>
<p>—————————————————————————</p>
<p>—————————————————————————</p>
<p>Now let us see what happens when we decrease <img src='http://s0.wp.com/latex.php?latex=D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='D' title='D' class='latex' />. We expect the error in approximating the kernel would lead to obviously erroneous pdf.  This is clearly evident for the case of <img src='http://s0.wp.com/latex.php?latex=D%3D100&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='D=100' title='D=100' class='latex' />.</p>
<div id="attachment_506" class="wp-caption aligncenter" style="width: 460px"><a href="http://mlstat.files.wordpress.com/2010/10/k1d1000.png"><img class="size-full wp-image-506" title="k1D1000" src="http://mlstat.files.wordpress.com/2010/10/k1d1000.png?w=450&#038;h=339" alt="" width="450" height="339" /></a><p class="wp-caption-text">D=1000 and gamma = 1.0</p></div>
<div id="attachment_508" class="wp-caption aligncenter" style="width: 460px"><a href="http://mlstat.files.wordpress.com/2010/10/k1d1001.png"><img class="size-full wp-image-508" title="k1D100" src="http://mlstat.files.wordpress.com/2010/10/k1d1001.png?w=450&#038;h=339" alt="" width="450" height="339" /></a><p class="wp-caption-text">D=100 and gamma = 1.0</p></div>
<p>—————————————————————————</p>
<p>—————————————————————————</p>
<p>The following picture for  <img src='http://s0.wp.com/latex.php?latex=D+%3D+1000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='D = 1000' title='D = 1000' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=%5Cgamma+%3D+2.0&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;gamma = 2.0' title='&#92;gamma = 2.0' class='latex' /> is even stranger.</p>
<div id="attachment_509" class="wp-caption aligncenter" style="width: 460px"><a href="http://mlstat.files.wordpress.com/2010/10/k2d1000.png"><img class="size-full wp-image-509" title="K2D1000" src="http://mlstat.files.wordpress.com/2010/10/k2d1000.png?w=450&#038;h=339" alt="" width="450" height="339" /></a><p class="wp-caption-text">D = 1000 and gamma = 2.0</p></div>
<p>—————————————————————————</p>
<p>—————————————————————————</p>
<p><strong>Discussion</strong></p>
<p>It seems that even for a simple 2D example, we seem to need to compute a very large number of random Fourier features to make the estimated pdf accurate. (For this small example this is very wasteful, since a plain Parzen window estimate would require less memory and computation.)</p>
<p>However, the pictures do indicate that if the approach is to be used for outlier detection (aka novelty detection) <em>from a given data set, </em>we might be able get away with much smaller <img src='http://s0.wp.com/latex.php?latex=D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='D' title='D' class='latex' />. That is, even if the estimated pdf has a big error on the entire space, on the points from the data it seems to be reasonably accurate.</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[[ArXiv] Voronoi Tessellations]]></title>
<link>http://hyunsook.wordpress.com/2009/11/16/arxiv-voronoi-tessellations/</link>
<pubDate>Mon, 16 Nov 2009 16:51:19 +0000</pubDate>
<dc:creator>HLee</dc:creator>
<guid>http://hyunsook.wordpress.com/2009/11/16/arxiv-voronoi-tessellations/</guid>
<description><![CDATA[As a part of exploring spatial distribution of particles/objects, not to approximate via Poisson pro]]></description>
<content:encoded><![CDATA[<p>As a part of exploring spatial distribution of particles/objects, not to approximate via Poisson process or Gaussian process (parametric), nor to impose hypotheses such as homogenous, isotropic, or uniform, various <b>nonparametric</b> methods somewhat dragged my attention for data exploration and preliminary analysis. Among various nonparametric methods, the one that I fell in love with is tessellation (state space approaches are excluded here). Computational speed wise, I believe <a href="http://en.wikipedia.org/wiki/Tessellation">tessellation</a> is faster than kernel density estimation to estimate level sets for multivariate data. Furthermore, conceptually constructing polygons from  tessellation is intuitively simple. However, coding and improving algorithms is beyond statistical research (check books titled or key-worded partially by <b>computational geometry</b>). Good news is that for computation and getting results, there are some freely available softwares, packages, and modules in various forms. <!--more--></p>
<p>As a part of introducing nonparametric statistics, I wanted to write about applications of computation geometry from the nonparametric 2/3 dimensional density estimation perspective. Also, the following article came along when I just began to collect statistical applications in astronomy (my [ArXiv] series). This [arXiv] paper, in fact, initiated me to investigate Voronoi Tessellations in astronomy in general. </p>
<blockquote><p>
 <a href="http://arxiv.org/abs/0707.2877">[arxiv/astro-ph:0707.2877]</a><br />
<strong>Voronoi Tessellations and the Cosmic Web: Spatial Patterns and Clustering across the Universe</strong><br />
 by <a href="http://www.astro.rug.nl/~weygaert/">Rien van de Weygaert</a>
</p></blockquote>
<p>Since then, quite time has passed. In the mean time, I found more publications in astronomy specifically using <b>tessellation</b> as a main tool of nonparametric density estimation and for data analysis. Nonetheless, in general, topics in spatial statistics tend to be unrecognized or almost ignored in analyzing astronomical spatial data (I mean data points with coordinate information). Many seem only utilizing statistics partially or not at all. Some might want to know how often <b>Voronoi tessellation</b> is applied in astronomy. Here, I listed results from my ADS search by limiting tessellation in title key words. :</p>
<ul>
<li><a href="http://arxiv.org/abs/astro-ph/0110259">[arxiv/astro-ph:0110259]</a><br />
Detecting Clusters of Galaxies in the Sloan Digital Sky Survey I : Monte Carlo Comparison of Cluster Detection Algorithms<br />
by Kim, R.S.J. et al.  (2002) AJ, 123, pp.20-36.
</li>
<li><a href="http://arxiv.org/abs/0906.1905">[arxiv/astro-ph:0906.1905]</a><br />
The VOISE Algorithm: a Versatile Toll for Automatic Segmentation of Astronomical Images<br />
by <em>Guio, P. and Achilleos, N.</em> (2009)
</li>
<li><a href="http://adsabs.harvard.edu/abs/1998IrAJ...25...37W">Using Voronoi Techniques to determine the shapes of photon sources</a><br />
by <em>Wilkinson and Meurs</em> Irish Astronomical Journal, 1998, 25(1), 37
</li>
<li><a href="http://adsabs.harvard.edu/abs/2009MNRAS.394.1409E">High-order 3D Voronoi tessellation for identifying isolated galaxies, pairs and triplets</a><br />
by <em>Elyiv, A.; Melnyk, O.; Vavilova, I. </em> 2009..MNRAS..394..1409E
</li>
<li><a href="http://adsabs.harvard.edu/abs/2007IAUS..235..223M">3-D Voronoi&#8217;s Tessellation as a Tool for Identifying Galaxy Groups</a><br />
by <em>Melnyk, Olga V.; Elyiv, Andrii A.; Vavilova, Iryna B.</em> 2007..IAUS..235..223M
</li>
<li><a href="http://adsabs.harvard.edu/abs/2006MNRAS.368..497D">Adaptive binning of X-ray data with weighted Voronoi tessellations</a><br />
by	<em>Diehl, Steven; Statler, Thomas S.</em> 2006..MNRAS..368..497D
</li>
<li><a href="http://adsabs.harvard.edu/abs/2003MNRAS.342..345C">Adaptive spatial binning of integral-field spectroscopic data using Voronoi tessellations</a><br />
by <em>Cappellari, M. and Copin, Y.</em> 2003..MNRAS..342..345C
</li>
<li><a href="http://adsabs.harvard.edu/abs/2002ASPC..282..515C">Adaptive Spatial Binning of 2D Spectra and Images Using Voronoi Tessellations</a><br />
by <em>Cappellari, M.; Copin, Y.</em> 2002..ASPC..282..515CA
</li>
<li><a href="http://adsabs.harvard.edu/abs/2001A%26A...368..776R">Finding galaxy clusters using Voronoi tessellations</a><br />
by <em>Ramella, M.; Boschin, W.; Fadda, D.; Nonino, M.</em>  2001..A&#38;A&#8230;368..776R
</li>
<li><a href="http://adsabs.harvard.edu/cabs/1999ApJS..124....1Y">The Forest Method as a New Parallel Tree Method with the Sectional Voronoi Tessellation</a><br />
by <em>Yahagi, Hideki; Mori, Masao; Yoshii, Yuzuru</em> 1999..ApJS..124..1</li>
<li><a href="http://adsabs.harvard.edu/abs/1999ASPC..176..108R">Cluster Identification via Voronoi Tesselation</a> ..1999..ASPC..176..108
</li>
<li><a href="http://adsabs.harvard.edu/abs/1997A%26AS..123..495D">The accuracy of parameters determined with the core-sampling method: Application to Voronoi tessellations</a> 1997..A&#38;AS..123..495
</li>
<li><a href="http://adsabs.harvard.edu/abs/1995A%26AS..109...71Z">Dynamical Voronoi tessellation. V. Thickness and incompleteness.</a><br />
by <i>Zaninetti, L</i> 1995..A&#38;AS..109..71
</li>
<li><a href="http://adsabs.harvard.edu/abs/1994A%26A...283..361V">Fragmenting the Universe. 3: The constructions and statistics of 3-D Voronoi tessellations</a><br />
by <em>van de Weygaert, Rien</em> 1994..A&#38;A..283..361
</li>
<li><a href="http://adsabs.harvard.edu/abs/1993A%26A...276..255Z"> Dynamical Voronoi tessellation. IV. The distribution of the asteroids</a><br />
by <i>Zaninetti, L</i> 1993..A&#38;A..276..255
</li>
<li><a href="http://adsabs.harvard.edu/abs/1991MNRAS.250..519I">Quasi-periodic structures in the large-scale galaxy distribution and three-dimensional Voronoi tessellation</a><br />
  1991..MNRAS..250..519
</li>
<li><a href="http://adsabs.harvard.edu/abs/1991A%26A...246..291Z">Dynamical Voronoi tessellation. III &#8211; The distribution of galaxies</a><br />
by <i>Zaninetti, L</i> 1991..A&#38;A..246..291
</li>
<li><a href="http://adsabs.harvard.edu/abs/1990A%26A...233..293Z">Dynamical Voronoi tessellation. II &#8211; The three-dimensional case</a><br />
by <i>Zaninetti, L</i> 1990..A&#38;A..233..293
</li>
<li><a href="http://adsabs.harvard.edu/abs/1989A%26A...224..345Z">Dynamical Voronoi tessellation. I &#8211; The two-dimensional case</a><br />
by <i>Zaninetti, L</i> 1989..A&#38;A..224..345
</li>
</ul>
<p>Then, the topic has been forgotten for a while until this recent [arXiv] paper, which reminded me my old intention for introducing <b>tessellation</b> for density estimation and for understanding large scale structures or clusters (astronomers&#8217; jargon, not the term in machine or statistical learning).</p>
<blockquote><p>
<a href="http://arxiv.org/abs/0910.1473">[arxiv:stat.ME:0910.1473]</a> <strong>Moment Analysis of the Delaunay Tessellation Field Estimator</strong><br />
by <i>M.N.M van Lieshout</i>
</p></blockquote>
<p>Looking into plots of the papers by van de Weygaert or van Lieshout, without mathematical jargon and abstraction, one can immediately understand what <b>Voronoi and Delaunay Tessellation</b> is (Delaunay Tessellation is also called as <a href="http://en.wikipedia.org/wiki/Delaunay_triangulation">Delaunay Triangulation (wiki).</a> Perhaps, you want to check out <a href="http://en.wikipedia.org/wiki/Delaunay_tessellation_field_estimator">wiki:Delaunay Tessellation Field Estimator</a> as well). <a href="http://en.wikipedia.org/wiki/Voronoi_tessellation">Voronoi tessellations</a> have been adopted in many scientific/engineering fields to describe the spatial distribution.  Astronomy is not an exception. Voronoi Tessellation has been used for field interpolation. </p>
<p>van de Weygaert described Voronoi tessellations as follows:</p>
<ol>
<li>the asymptotic frame for the ultimate matter distribution,</li>
<li>the skeleton of the cosmic matter distribution,</li>
<li>a versatile and flexible mathematical model for weblike spatial pattern, and</li>
<li>a natural asymptotic result of an evolution in which low-density expanding void regions dictate the spatial organization of the Megaparsec universe, while matter assembles in high-density filamentary and wall-like interstices between the voids.</li>
</ol>
<p>van Lieshout derived explicit expressions for the mean and variance of Delaunay Tessellatoin Field Estimator (DTFE) and showed that for stationary Poisson processes, the DTFE is asymptotically unbiased with a variance that is proportional to the square intensity. </p>
<p>We&#8217;ve observed voids and filaments of cosmic matters with patterns of which theory hasn&#8217;t been discovered. In general, those patterns are manifested via observed galaxies, both directly and indirectly. Individual observed objects, I believe, can be matched to points that construct Voronoi polygons. They represent each polygon and investigating its distributional properly helps to understand the formation rules and theories of those patterns. For that matter, probably, various topics in stochastic geometry, not just Voronoi tessellation,  can be adopted.</p>
<p>There are plethora information available on Voronoi Tessellation such as the website of International Symposium on Voronoi Diagrams in Science and Engineering. Two recent meeting websites are <a href="http://www2.imm.dtu.dk/projects/ISVD/">ISVD09</a> and <a href="http://www.imath.kiev.ua/~voronoi/">ISVD08.</a> Also, the following review paper is interesting.</p>
<blockquote><p><i>Centroidal Voronoi Tessellations: Applications and Algorithms</i> (1999) Du, Faber, and Gunzburger in SIAM Review, vol. 41(4), pp. 637-676</p></blockquote>
<p>By the way, you may have noticed my preference for Voronoi Tessellation over Delaunay owing to the characteristics of this centroidal Voronoi that each observation is the center of each Voronoi cell as opposed to the property of Delaunay triangulation that multiple simplices are associated one observation/point. However, from the perspective of understanding the distribution of observations as a whole, both approaches offer summaries and insights in a nonparametric fashion, which I put the most value on.  </p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Các khái niệm trong Học máy (Machine Learning) (5) - Ước lượng mật độ xác suất có điều kiện]]></title>
<link>http://csstudyfun.wordpress.com/2008/07/29/cac-khai-ni%e1%bb%87m-trong-h%e1%bb%8dc-may-machine-learning-5-%c6%b0%e1%bb%9bc-l%c6%b0%e1%bb%a3ng-m%e1%ba%adt-d%e1%bb%99-xac-su%e1%ba%a5t-co-di%e1%bb%81u-ki%e1%bb%87n/</link>
<pubDate>Wed, 30 Jul 2008 00:31:46 +0000</pubDate>
<dc:creator>tqlong</dc:creator>
<guid>http://csstudyfun.wordpress.com/2008/07/29/cac-khai-ni%e1%bb%87m-trong-h%e1%bb%8dc-may-machine-learning-5-%c6%b0%e1%bb%9bc-l%c6%b0%e1%bb%a3ng-m%e1%ba%adt-d%e1%bb%99-xac-su%e1%ba%a5t-co-di%e1%bb%81u-ki%e1%bb%87n/</guid>
<description><![CDATA[Qua phân tích về quyết định tối ưu, ta thấy bài toán phân lớp có thể giải quyết bằng cách xác định/ư]]></description>
<content:encoded><![CDATA[Qua phân tích về quyết định tối ưu, ta thấy bài toán phân lớp có thể giải quyết bằng cách xác định/ư]]></content:encoded>
</item>
<item>
<title><![CDATA[Các khái niệm trong Học máy (Machine Learning) (4) - Bài toán phân lớp, quyết định tối ưu, hàm phân lớp]]></title>
<link>http://csstudyfun.wordpress.com/2008/07/29/cac-khai-ni%e1%bb%87m-trong-h%e1%bb%8dc-may-machine-learning-4-bai-toan-phan-l%e1%bb%9bp-quy%e1%ba%bft-d%e1%bb%8bnh-t%e1%bb%91i-%c6%b0u-ham-phan-l%e1%bb%9bp/</link>
<pubDate>Tue, 29 Jul 2008 14:31:42 +0000</pubDate>
<dc:creator>tqlong</dc:creator>
<guid>http://csstudyfun.wordpress.com/2008/07/29/cac-khai-ni%e1%bb%87m-trong-h%e1%bb%8dc-may-machine-learning-4-bai-toan-phan-l%e1%bb%9bp-quy%e1%ba%bft-d%e1%bb%8bnh-t%e1%bb%91i-%c6%b0u-ham-phan-l%e1%bb%9bp/</guid>
<description><![CDATA[Bài toán phân lớp (classification): Ví dụ: Trong một dây chuyền lựa chọn hoa quả, cụ thể là quả cam,]]></description>
<content:encoded><![CDATA[Bài toán phân lớp (classification): Ví dụ: Trong một dây chuyền lựa chọn hoa quả, cụ thể là quả cam,]]></content:encoded>
</item>
<item>
<title><![CDATA[[ArXiv] SDSS DR6, July 23, 2007]]></title>
<link>http://hyunsook.wordpress.com/2007/07/25/arxiv-sdss-dr6/</link>
<pubDate>Wed, 25 Jul 2007 17:46:38 +0000</pubDate>
<dc:creator>HLee</dc:creator>
<guid>http://hyunsook.wordpress.com/2007/07/25/arxiv-sdss-dr6/</guid>
<description><![CDATA[From arxiv/astro-ph:0707.3413 The Sixth Data Release of the Sloan Digital Sky Survey by &#8230; many]]></description>
<content:encoded><![CDATA[<p>From <a href="http://arxiv.org/abs/0707.3413">arxiv/astro-ph:0707.3413</a><br />
<strong>The Sixth Data Release of the Sloan Digital Sky Survey</strong> by &#8230; many people &#8230;</p>
<p>The sixth data release of the Sloan Digital Sky Survey (SDSS DR6) is available at <a href="http://www.sdss.org/dr6">http://www.sdss.org/dr6</a>. Additionally, <a href="http://cas.sdss.org/astrodr6/en/"> Catalog Archive Service</a> (CAS) and<br />
<a href="http://cas.sdss.org/astrodr6/en/tools/search/sql.asp"> SQL interface to access the catalog</a> would be useful to data searching statisticians. Simple SQL commends, which are well documented, could narrow down the size of data and the spatial coverage.<br />
<!--more--></p>
<p>Part of my dissertation was about creating nonparametric multivariate analysis tools with convex hull peeling and I used SDSS DR4 to apply those convex hull peeling tools to explore celestial objects in the multidimensional color space without projections (dimension reduction). SDSS CAS might fulfill the needs of those who are looking for data sets to conduct</p>
<ul>
<li> massive multivariate data analysis,</li>
<li> streaming data analysis (strictly, SDSS is not streaming but the data base is updated yearly by adding new observations and depending on memory, streaming data analysis can be easily simulated) and</li>
<li> application of his/her new machine learning and statistical multivariate analysis tools for new discoveries.</li>
</ul>
<p>Particularly, thanks to whole northern hemisphere survey, interesting spatial statistics can be developed such as  voronoi tessellation for spatial density estimation. It also provides a vast image reservoir as well as the catalog of massive multivariate spatial data.</p>
<p>Oh, by the way, the paper discusses changes and improvement in the recent data release. The SDSS DR6 includes the complete imaging of the Northern Galactic Cap and contains images and parameters of 287 million objects over 9583 deg^2, and 1.27 million spectra over 7425 deg^2. The photometric calibration has improved with uncertainties of 1% in g,r,i and 2% in u, significantly better than previous data releases. The method of spectrophotometric calibration has changed and resulted 0.35 mags brighter in the spectrophotometric scale. Two independent codes for spectral classifications and redshifts are available as well.</p>
]]></content:encoded>
</item>

</channel>
</rss>
