<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress.com" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>mathst &amp;laquo; WordPress.com Tag Feed</title>
	<link>http://en.wordpress.com/tag/mathst/</link>
	<description>Feed of posts on WordPress.com tagged "mathst"</description>
	<pubDate>Tue, 21 May 2013 01:41:05 +0000</pubDate>

	<generator>http://en.wordpress.com/tags/</generator>
	<language>en</language>

<item>
<title><![CDATA[Benford's law, Zipf's law, and the Pareto distribution]]></title>
<link>http://terrytao.wordpress.com/2009/07/03/benfords-law-zipfs-law-and-the-pareto-distribution/</link>
<pubDate>Sat, 04 Jul 2009 03:27:18 +0000</pubDate>
<dc:creator>Terence Tao</dc:creator>
<guid>http://terrytao.wordpress.com/2009/07/03/benfords-law-zipfs-law-and-the-pareto-distribution/</guid>
<description><![CDATA[A remarkable phenomenon in probability theory is that of universality &#8211; that many seemingly un]]></description>
<content:encoded><![CDATA[<p>
 A remarkable phenomenon in probability theory is that of <em>universality</em> &#8211; that many seemingly unrelated probability distributions, which ostensibly involve large numbers of unknown parameters, can end up converging to a universal law that may only depend on a small handful of parameters. One of the most famous examples of the universality phenomenon is the <a href="http://en.wikipedia.org/wiki/Central_limit_theorem">central limit theorem</a>; another rich source of examples comes from <a href="http://en.wikipedia.org/wiki/Random_matrix_theory">random matrix theory</a>, which is one of the areas of my own research.
</p>
<p>
Analogous universality phenomena also show up in <em>empirical</em> distributions &#8211; the distributions of a statistic <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> from a large population of &#8220;real-world&#8221; objects. Examples include <a href="http://en.wikipedia.org/wiki/Benford&#037;27s_law">Benford&#8217;s law</a>, <a href="http://en.wikipedia.org/wiki/Zipf&#037;27s_law">Zipf&#8217;s law</a>, and the <a href="http://en.wikipedia.org/wiki/Pareto_distribution">Pareto distribution</a> (of which the <a href="http://en.wikipedia.org/wiki/Pareto_principle">Pareto principle</a> or <em>80-20 law</em> is a special case). These laws govern the asymptotic distribution of many statistics <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> which </p>
<ul>
<li> (i) take values as positive numbers; </li>
<li> (ii) range over many different orders of magnitude; </li>
<li> (iiii) arise from a complicated combination of largely independent factors (with different samples of <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> arising from different independent factors); and </li>
<li> (iv) have not been artificially rounded, truncated, or otherwise constrained in size.
</li>
</ul>
<p>
Examples here include the population of countries or cities, the frequency of occurrence of words in a language, the mass of astronomical objects, or the net worth of individuals or corporations. The laws are then as follows:
</p>
<p><ul>
<li> <b>Benford&#8217;s law:</b> For <img src='http://s0.wp.com/latex.php?latex=%7Bk%3D1%2C%5Cldots%2C9%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{k=1,&#92;ldots,9}&amp;fg=000000' title='{k=1,&#92;ldots,9}&amp;fg=000000' class='latex' />, the proportion of <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> whose first digit is <img src='http://s0.wp.com/latex.php?latex=%7Bk%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{k}&amp;fg=000000' title='{k}&amp;fg=000000' class='latex' /> is approximately <img src='http://s0.wp.com/latex.php?latex=%7B%5Clog_%7B10%7D+%5Cfrac%7Bk%2B1%7D%7Bk%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;log_{10} &#92;frac{k+1}{k}}&amp;fg=000000' title='{&#92;log_{10} &#92;frac{k+1}{k}}&amp;fg=000000' class='latex' />. Thus, for instance, <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> should have a first digit of <img src='http://s0.wp.com/latex.php?latex=%7B1%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{1}&amp;fg=000000' title='{1}&amp;fg=000000' class='latex' /> about <img src='http://s0.wp.com/latex.php?latex=%7B30%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{30&#92;%}&amp;fg=000000' title='{30&#92;%}&amp;fg=000000' class='latex' /> of the time, but a first digit of <img src='http://s0.wp.com/latex.php?latex=%7B9%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{9}&amp;fg=000000' title='{9}&amp;fg=000000' class='latex' /> only about <img src='http://s0.wp.com/latex.php?latex=%7B5%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{5&#92;%}&amp;fg=000000' title='{5&#92;%}&amp;fg=000000' class='latex' /> of the time. </li>
<li> <b>Zipf&#8217;s law:</b> The <img src='http://s0.wp.com/latex.php?latex=%7Bn%5E%7Bth%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{n^{th}}&amp;fg=000000' title='{n^{th}}&amp;fg=000000' class='latex' /> largest value of <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> should obey an approximate power law, i.e. it should be approximately <img src='http://s0.wp.com/latex.php?latex=%7BC+n%5E%7B-%5Calpha%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{C n^{-&#92;alpha}}&amp;fg=000000' title='{C n^{-&#92;alpha}}&amp;fg=000000' class='latex' /> for the first few <img src='http://s0.wp.com/latex.php?latex=%7Bn%3D1%2C2%2C3%2C%5Cldots%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{n=1,2,3,&#92;ldots}&amp;fg=000000' title='{n=1,2,3,&#92;ldots}&amp;fg=000000' class='latex' /> and some parameters <img src='http://s0.wp.com/latex.php?latex=%7BC%2C+%5Calpha+%26%2362%3B+0%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{C, &#92;alpha &gt; 0}&amp;fg=000000' title='{C, &#92;alpha &gt; 0}&amp;fg=000000' class='latex' />. In many cases, <img src='http://s0.wp.com/latex.php?latex=%7B%5Calpha%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;alpha}&amp;fg=000000' title='{&#92;alpha}&amp;fg=000000' class='latex' /> is close to <img src='http://s0.wp.com/latex.php?latex=%7B1%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{1}&amp;fg=000000' title='{1}&amp;fg=000000' class='latex' />. </li>
<li> <b>Pareto distribution:</b> The proportion of <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> with at least <img src='http://s0.wp.com/latex.php?latex=%7Bm%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{m}&amp;fg=000000' title='{m}&amp;fg=000000' class='latex' /> digits (before the decimal point), where <img src='http://s0.wp.com/latex.php?latex=%7Bm%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{m}&amp;fg=000000' title='{m}&amp;fg=000000' class='latex' /> is above the median number of digits, should obey an approximate exponential law, i.e. be approximately of the form <img src='http://s0.wp.com/latex.php?latex=%7Bc+10%5E%7B-m%2F%5Calpha%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{c 10^{-m/&#92;alpha}}&amp;fg=000000' title='{c 10^{-m/&#92;alpha}}&amp;fg=000000' class='latex' /> for some <img src='http://s0.wp.com/latex.php?latex=%7Bc%2C+%5Calpha+%26%2362%3B+0%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{c, &#92;alpha &gt; 0}&amp;fg=000000' title='{c, &#92;alpha &gt; 0}&amp;fg=000000' class='latex' />. Again, in many cases <img src='http://s0.wp.com/latex.php?latex=%7B%5Calpha%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;alpha}&amp;fg=000000' title='{&#92;alpha}&amp;fg=000000' class='latex' /> is close to <img src='http://s0.wp.com/latex.php?latex=%7B1%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{1}&amp;fg=000000' title='{1}&amp;fg=000000' class='latex' />.
</li>
</ul>
<p>
Benford&#8217;s law and Pareto distribution are stated here for base <img src='http://s0.wp.com/latex.php?latex=%7B10%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{10}&amp;fg=000000' title='{10}&amp;fg=000000' class='latex' />, which is what we are most familiar with, but the laws hold for any base (after replacing all the occurrences of <img src='http://s0.wp.com/latex.php?latex=%7B10%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{10}&amp;fg=000000' title='{10}&amp;fg=000000' class='latex' /> in the above laws with the new base, of course). The laws tend to break down if the hypotheses (i)-(iv) are dropped. For instance, if the statistic <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> concentrates around its mean (as opposed to being spread over many orders of magnitude), then the <a href="http://en.wikipedia.org/wiki/Normal_distribution">normal distribution</a> tends to be a much better model (as indicated by such results as the central limit theorem). If instead the various samples of the statistics are highly correlated with each other, then other laws can arise (for instance, the eigenvalues of a random matrix, as well as many empirically observed matrices, are correlated to each other, with the behaviour of the largest eigenvalues being governed by laws such as the <em>Tracy-Widom law</em> rather than Zipf&#8217;s law, and the bulk distribution being governed by laws such as the <a href="http://en.wikipedia.org/wiki/Wigner_semicircle_distribution">semicircular law</a> rather than the normal or Pareto distributions).
</p>
<p>
To illustrate these laws, let us take as a data set the populations of 235 countries and regions of the world in 2007 (using the <a href="http://www.umsl.edu/services/govdocs/wofact2007/index.html">CIA world factbook</a>); I have put the raw data <a href="http://spreadsheets.google.com/pub?key=rj_3TkLJrrVuvOXkijCHelQ&#38;output=html">here</a>. This is a relatively small sample (cf. <a href="http://terrytao.wordpress.com/2008/10/10/small-samples-and-the-margin-of-error/">my previous post</a>), but is already enough to discern these laws in action. For instance, here is how the data set tracks with Benford&#8217;s law (rounded to three significant figures):
</p>
<p><table align="center">
<tr>
<td align="left"> <img src='http://s0.wp.com/latex.php?latex=%7Bk%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{k}&amp;fg=000000' title='{k}&amp;fg=000000' class='latex' /> </td>
<td align="left"> Countries </td>
<td align="left"> Number </td>
<td align="left"> Benford prediction </td>
</tr>
<tr>
<td align="left"> 1 </td>
<td align="left"> Angola, Anguilla, Aruba, Bangladesh, Belgium, Botswana, Brazil, Burkina Faso, Cambodia, Cameroon, Chad, Chile, China, Christmas Island, Cook Islands, Cuba, Czech Republic, Ecuador, Estonia, Gabon, (The) Gambia, Greece, Guam, Guatemala, Guinea-Bissau, India, Japan, Kazakhstan, Kiribati, Malawi, Mali, Mauritius, Mexico, (Federated States of) Micronesia, Nauru, Netherlands, Niger, Nigeria, Niue, Pakistan, Portugal, Russia, Rwanda, Saint Lucia, Saint Vincent and the Grenadines, Senegal, Serbia, Swaziland, Syria, Timor-Leste (East-Timor), Tokelau, Tonga, Trinidad and Tobago, Tunisia, Tuvalu, (U.S.) Virgin Islands, Wallis and Futuna, Zambia, Zimbabwe </td>
<td align="left"> 59 (<img src='http://s0.wp.com/latex.php?latex=%7B25.1%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{25.1&#92;%}&amp;fg=000000' title='{25.1&#92;%}&amp;fg=000000' class='latex' />) </td>
<td align="left"> 71 (<img src='http://s0.wp.com/latex.php?latex=%7B30.1%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{30.1&#92;%}&amp;fg=000000' title='{30.1&#92;%}&amp;fg=000000' class='latex' />) </td>
</tr>
<tr>
<td align="left"> 2 </td>
<td align="left"> Armenia, Australia, Barbados, British Virgin Islands, Cote d&#8217;Ivoire, French Polynesia, Ghana, Gibraltar, Indonesia, Iraq, Jamaica, (North) Korea, Kosovo, Kuwait, Latvia, Lesotho, Macedonia, Madagascar, Malaysia, Mayotte, Mongolia, Mozambique, Namibia, Nepal, Netherlands Antilles, New Caledonia Norfolk Island, Palau, Peru, Romania, Saint Martin, Samoa, San Marino, Sao Tome and Principe, Saudi Arabia, Slovenia, Sri Lanka, Svalbard, Taiwan, Turks and Caicos Islands, Uzbekistan, Vanuatu, Venezuela, Yemen </td>
<td align="left"> 44 (<img src='http://s0.wp.com/latex.php?latex=%7B18.7%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{18.7&#92;%}&amp;fg=000000' title='{18.7&#92;%}&amp;fg=000000' class='latex' />) </td>
<td align="left"> 41 (<img src='http://s0.wp.com/latex.php?latex=%7B17.6%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{17.6&#92;%}&amp;fg=000000' title='{17.6&#92;%}&amp;fg=000000' class='latex' />)</td>
</tr>
<tr>
<td align="left"> 3 </td>
<td align="left"> Afghanistan, Albania, Algeria, (The) Bahamas, Belize, Brunei, Canada, (Rep. of the) Congo, Falkland Islands (Islas Malvinas), Iceland, Kenya, Lebanon, Liberia, Liechtenstein, Lithuania, Maldives, Mauritania, Monaco, Morocco, Oman, (Occupied) Palestinian Territory, Panama, Poland, Puerto Rico, Saint Kitts and Nevis, Uganda, United States of America, Uruguay, Western Sahara </td>
<td align="left"> 29 (<img src='http://s0.wp.com/latex.php?latex=%7B12.3%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{12.3&#92;%}&amp;fg=000000' title='{12.3&#92;%}&amp;fg=000000' class='latex' />) </td>
<td align="left"> 29 (<img src='http://s0.wp.com/latex.php?latex=%7B12.5%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{12.5&#92;%}&amp;fg=000000' title='{12.5&#92;%}&amp;fg=000000' class='latex' />)</td>
</tr>
<tr>
<td align="left"> 4 </td>
<td align="left"> Argentina, Bosnia and Herzegovina, Burma (Myanmar), Cape Verde, Cayman Islands, Central African Republic, Colombia, Costa Rica, Croatia, Faroe Islands, Georgia, Ireland, (South) Korea, Luxembourg, Malta, Moldova, New Zealand, Norway, Pitcairn Islands, Singapore, South Africa, Spain, Sudan, Suriname, Tanzania, Ukraine, United Arab Emirates </td>
<td align="left"> 27 (<img src='http://s0.wp.com/latex.php?latex=%7B11.4%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{11.4&#92;%}&amp;fg=000000' title='{11.4&#92;%}&amp;fg=000000' class='latex' />) </td>
<td align="left"> 22 (<img src='http://s0.wp.com/latex.php?latex=%7B9.7%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{9.7&#92;%}&amp;fg=000000' title='{9.7&#92;%}&amp;fg=000000' class='latex' />)</td>
</tr>
<tr>
<td align="left"> 5 </td>
<td align="left"> (Macao SAR) China, Cocos Islands, Denmark, Djibouti, Eritrea, Finland, Greenland, Italy, Kyrgyzstan, Montserrat, Nicaragua, Papua New Guinea, Slovakia, Solomon Islands, Togo, Turkmenistan </td>
<td align="left"> 16 (<img src='http://s0.wp.com/latex.php?latex=%7B6.8%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{6.8&#92;%}&amp;fg=000000' title='{6.8&#92;%}&amp;fg=000000' class='latex' />) </td>
<td align="left"> 19 (<img src='http://s0.wp.com/latex.php?latex=%7B7.9%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{7.9&#92;%}&amp;fg=000000' title='{7.9&#92;%}&amp;fg=000000' class='latex' />)</td>
</tr>
<tr>
<td align="left"> 6 </td>
<td align="left"> American Samoa, Bermuda, Bhutan, (Dem. Rep. of the) Congo, Equatorial Guinea, France, Guernsey, Iran, Jordan, Laos, Libya, Marshall Islands, Montenegro, Paraguay, Sierra Leone, Thailand, United Kingdom </td>
<td align="left"> 17 (<img src='http://s0.wp.com/latex.php?latex=%7B7.2%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{7.2&#92;%}&amp;fg=000000' title='{7.2&#92;%}&amp;fg=000000' class='latex' />) </td>
<td align="left"> 16 (<img src='http://s0.wp.com/latex.php?latex=%7B6.7%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{6.7&#92;%}&amp;fg=000000' title='{6.7&#92;%}&amp;fg=000000' class='latex' />)</td>
</tr>
<tr>
<td align="left"> 7 </td>
<td align="left"> Bahrain, Bulgaria, (Hong Kong SAR) China, Comoros, Cyprus, Dominica, El Salvador, Guyana, Honduras, Israel, (Isle of) Man, Saint Barthelemy, Saint Helena, Saint Pierre and Miquelon, Switzerland, Tajikistan, Turkey </td>
<td align="left"> 17 (<img src='http://s0.wp.com/latex.php?latex=%7B7.2%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{7.2&#92;%}&amp;fg=000000' title='{7.2&#92;%}&amp;fg=000000' class='latex' />) </td>
<td align="left"> 14 (<img src='http://s0.wp.com/latex.php?latex=%7B5.8%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{5.8&#92;%}&amp;fg=000000' title='{5.8&#92;%}&amp;fg=000000' class='latex' />)</td>
</tr>
<tr>
<td align="left"> 8 </td>
<td align="left"> Andorra, Antigua and Barbuda, Austria, Azerbaijan, Benin, Burundi, Egypt, Ethiopia, Germany, Haiti, Holy See (Vatican City), Northern Mariana Islands, Qatar, Seychelles, Vietnam </td>
<td align="left"> 15 (<img src='http://s0.wp.com/latex.php?latex=%7B6.4%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{6.4&#92;%}&amp;fg=000000' title='{6.4&#92;%}&amp;fg=000000' class='latex' />) </td>
<td align="left"> 12 (<img src='http://s0.wp.com/latex.php?latex=%7B5.1%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{5.1&#92;%}&amp;fg=000000' title='{5.1&#92;%}&amp;fg=000000' class='latex' />)</td>
</tr>
<tr>
<td align="left"> 9 </td>
<td align="left"> Belarus, Bolivia, Dominican Republic, Fiji, Grenada, Guinea, Hungary, Jersey, Philippines, Somalia, Sweden </td>
<td align="left"> 11 (<img src='http://s0.wp.com/latex.php?latex=%7B4.5%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{4.5&#92;%}&amp;fg=000000' title='{4.5&#92;%}&amp;fg=000000' class='latex' />) </td>
<td align="left"> 11 (<img src='http://s0.wp.com/latex.php?latex=%7B4.6%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{4.6&#92;%}&amp;fg=000000' title='{4.6&#92;%}&amp;fg=000000' class='latex' />) </td>
</tr>
</table>
<p>
Here is how the same data tracks Zipf&#8217;s law for the first twenty values of <img src='http://s0.wp.com/latex.php?latex=%7Bn%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{n}&amp;fg=000000' title='{n}&amp;fg=000000' class='latex' />, with the parameters <img src='http://s0.wp.com/latex.php?latex=%7BC+%5Capprox+1.28+%5Ctimes+10%5E9%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{C &#92;approx 1.28 &#92;times 10^9}&amp;fg=000000' title='{C &#92;approx 1.28 &#92;times 10^9}&amp;fg=000000' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=%7B%5Calpha+%5Capprox+1.03%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;alpha &#92;approx 1.03}&amp;fg=000000' title='{&#92;alpha &#92;approx 1.03}&amp;fg=000000' class='latex' /> (selected by log-linear regression), again rounding to three significant figures:
</p>
<p><table align="center">
<tr>
<td align="left"> <img src='http://s0.wp.com/latex.php?latex=%7Bn%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{n}&amp;fg=000000' title='{n}&amp;fg=000000' class='latex' /> </td>
<td align="left"> Country </td>
<td align="left"> Population </td>
<td align="left"> Zipf prediction </td>
<td align="left"> Deviation from prediction </td>
</tr>
<tr>
<td align="left"> 1 </td>
<td align="left"> China </td>
<td align="left"> 1,330,000,000	</td>
<td align="left"> 1,280,000,000 </td>
<td align="left"> <img src='http://s0.wp.com/latex.php?latex=%7B%2B4.1%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{+4.1&#92;%}&amp;fg=000000' title='{+4.1&#92;%}&amp;fg=000000' class='latex' /> </td>
</tr>
<tr>
<td align="left"> 2	</td>
<td align="left"> India </td>
<td align="left">	1,150,000,000	</td>
<td align="left"> 626,000,000 </td>
<td align="left"> <img src='http://s0.wp.com/latex.php?latex=%7B%2B83.5%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{+83.5&#92;%}&amp;fg=000000' title='{+83.5&#92;%}&amp;fg=000000' class='latex' /> </td>
</tr>
<tr>
<td align="left"> 3	</td>
<td align="left"> USA </td>
<td align="left"> 304,000,000 </td>
<td align="left"> 412,000,000 </td>
<td align="left"> <img src='http://s0.wp.com/latex.php?latex=%7B-26.3%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{-26.3&#92;%}&amp;fg=000000' title='{-26.3&#92;%}&amp;fg=000000' class='latex' /> </td>
</tr>
<tr>
<td align="left"> 4	</td>
<td align="left"> Indonesia </td>
<td align="left">	238,000,000	</td>
<td align="left"> 307,000,000 </td>
<td align="left"> <img src='http://s0.wp.com/latex.php?latex=%7B-22.5%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{-22.5&#92;%}&amp;fg=000000' title='{-22.5&#92;%}&amp;fg=000000' class='latex' /> </td>
</tr>
<tr>
<td align="left"> 5	</td>
<td align="left"> Brazil </td>
<td align="left"> 196,000,000 </td>
<td align="left"> 244,000,000 </td>
<td align="left"> <img src='http://s0.wp.com/latex.php?latex=%7B-19.4%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{-19.4&#92;%}&amp;fg=000000' title='{-19.4&#92;%}&amp;fg=000000' class='latex' /> </td>
</tr>
<tr>
<td align="left"> 6	</td>
<td align="left"> Pakistan </td>
<td align="left"> 173,000,000 </td>
<td align="left"> 202,000,000 </td>
<td align="left"> <img src='http://s0.wp.com/latex.php?latex=%7B-14.4%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{-14.4&#92;%}&amp;fg=000000' title='{-14.4&#92;%}&amp;fg=000000' class='latex' /> </td>
</tr>
<tr>
<td align="left"> 7 </td>
<td align="left">	Bangladesh </td>
<td align="left"> 154,000,000 </td>
<td align="left"> 172,000,000 </td>
<td align="left"> <img src='http://s0.wp.com/latex.php?latex=%7B-10.9%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{-10.9&#92;%}&amp;fg=000000' title='{-10.9&#92;%}&amp;fg=000000' class='latex' /></td>
</tr>
<tr>
<td align="left"> 8	</td>
<td align="left"> Nigeria </td>
<td align="left"> 146,000,000 </td>
<td align="left"> 150,000,000 </td>
<td align="left"> <img src='http://s0.wp.com/latex.php?latex=%7B-2.6%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{-2.6&#92;%}&amp;fg=000000' title='{-2.6&#92;%}&amp;fg=000000' class='latex' /> </td>
</tr>
<tr>
<td align="left"> 9	</td>
<td align="left"> Russia </td>
<td align="left"> 141,000,000 </td>
<td align="left"> 133,000,000 </td>
<td align="left"> <img src='http://s0.wp.com/latex.php?latex=%7B%2B5.8%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{+5.8&#92;%}&amp;fg=000000' title='{+5.8&#92;%}&amp;fg=000000' class='latex' /> </td>
</tr>
<tr>
<td align="left"> 10 </td>
<td align="left"> Japan </td>
<td align="left"> 128,000,000 </td>
<td align="left"> 120,000,000 </td>
<td align="left"> <img src='http://s0.wp.com/latex.php?latex=%7B%2B6.7%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{+6.7&#92;%}&amp;fg=000000' title='{+6.7&#92;%}&amp;fg=000000' class='latex' /></td>
</tr>
<tr>
<td align="left"> 11 </td>
<td align="left"> Mexico </td>
<td align="left"> 110,000,000 </td>
<td align="left"> 108,000,000 </td>
<td align="left"> <img src='http://s0.wp.com/latex.php?latex=%7B%2B1.7%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{+1.7&#92;%}&amp;fg=000000' title='{+1.7&#92;%}&amp;fg=000000' class='latex' /> </td>
</tr>
<tr>
<td align="left"> 12 </td>
<td align="left"> Philippines </td>
<td align="left"> 96,100,000 </td>
<td align="left">	98,900,000 </td>
<td align="left"> <img src='http://s0.wp.com/latex.php?latex=%7B-2.9%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{-2.9&#92;%}&amp;fg=000000' title='{-2.9&#92;%}&amp;fg=000000' class='latex' /> </td>
</tr>
<tr>
<td align="left"> 13 </td>
<td align="left"> Vietnam </td>
<td align="left"> 86,100,000 </td>
<td align="left"> 91,100,000 </td>
<td align="left"> <img src='http://s0.wp.com/latex.php?latex=%7B-5.4%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{-5.4&#92;%}&amp;fg=000000' title='{-5.4&#92;%}&amp;fg=000000' class='latex' /> </td>
</tr>
<tr>
<td align="left"> 14 </td>
<td align="left"> Ethiopia </td>
<td align="left">	82,600,000 </td>
<td align="left"> 84,400,000 </td>
<td align="left"> <img src='http://s0.wp.com/latex.php?latex=%7B-2.1%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{-2.1&#92;%}&amp;fg=000000' title='{-2.1&#92;%}&amp;fg=000000' class='latex' /> </td>
</tr>
<tr>
<td align="left"> 15 </td>
<td align="left"> Germany </td>
<td align="left"> 82,400,000 </td>
<td align="left">	78,600,000 </td>
<td align="left"> <img src='http://s0.wp.com/latex.php?latex=%7B%2B4.8%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{+4.8&#92;%}&amp;fg=000000' title='{+4.8&#92;%}&amp;fg=000000' class='latex' /> </td>
</tr>
<tr>
<td align="left"> 16 </td>
<td align="left"> Egypt </td>
<td align="left"> 81,700,000 </td>
<td align="left">	73,500,000 </td>
<td align="left"> <img src='http://s0.wp.com/latex.php?latex=%7B%2B11.1%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{+11.1&#92;%}&amp;fg=000000' title='{+11.1&#92;%}&amp;fg=000000' class='latex' /> </td>
</tr>
<tr>
<td align="left"> 17 </td>
<td align="left"> Turkey </td>
<td align="left">	71,900,000 </td>
<td align="left"> 69,100,000 </td>
<td align="left"> <img src='http://s0.wp.com/latex.php?latex=%7B%2B4.1%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{+4.1&#92;%}&amp;fg=000000' title='{+4.1&#92;%}&amp;fg=000000' class='latex' /> </td>
</tr>
<tr>
<td align="left"> 18 </td>
<td align="left"> Congo </td>
<td align="left"> 66,500,000 </td>
<td align="left">	65,100,000 </td>
<td align="left"> <img src='http://s0.wp.com/latex.php?latex=%7B%2B2.2%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{+2.2&#92;%}&amp;fg=000000' title='{+2.2&#92;%}&amp;fg=000000' class='latex' /> </td>
</tr>
<tr>
<td align="left"> 19 </td>
<td align="left"> Iran </td>
<td align="left">	65,900,000 </td>
<td align="left"> 61,600,000 </td>
<td align="left"> <img src='http://s0.wp.com/latex.php?latex=%7B%2B6.9%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{+6.9&#92;%}&amp;fg=000000' title='{+6.9&#92;%}&amp;fg=000000' class='latex' /> </td>
</tr>
<tr>
<td align="left"> 20 </td>
<td align="left"> Thailand </td>
<td align="left"> 65,500,000 </td>
<td align="left"> 58,400,000 </td>
<td align="left"> <img src='http://s0.wp.com/latex.php?latex=%7B%2B12.1%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{+12.1&#92;%}&amp;fg=000000' title='{+12.1&#92;%}&amp;fg=000000' class='latex' /> </td>
</tr>
</table>
<p>
As one sees, Zipf&#8217;s law is not particularly precise at the extreme edge of the statistics (when <img src='http://s0.wp.com/latex.php?latex=%7Bn%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{n}&amp;fg=000000' title='{n}&amp;fg=000000' class='latex' /> is very small), but becomes reasonably accurate (given the small sample size, and given that we are fitting twenty data points using only two parameters) for moderate sizes of <img src='http://s0.wp.com/latex.php?latex=%7Bn%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{n}&amp;fg=000000' title='{n}&amp;fg=000000' class='latex' />.
</p>
<p>
This data set has too few scales in base <img src='http://s0.wp.com/latex.php?latex=%7B10%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{10}&amp;fg=000000' title='{10}&amp;fg=000000' class='latex' /> to illustrate the Pareto distribution effectively &#8211; over half of the country populations are either seven or eight digits in that base. But if we instead work in base <img src='http://s0.wp.com/latex.php?latex=%7B2%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{2}&amp;fg=000000' title='{2}&amp;fg=000000' class='latex' />, then country populations range in a decent number of scales (the majority of countries have population between <img src='http://s0.wp.com/latex.php?latex=%7B2%5E%7B23%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{2^{23}}&amp;fg=000000' title='{2^{23}}&amp;fg=000000' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=%7B2%5E%7B32%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{2^{32}}&amp;fg=000000' title='{2^{32}}&amp;fg=000000' class='latex' />), and we begin to see the law emerge, where <img src='http://s0.wp.com/latex.php?latex=%7Bm%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{m}&amp;fg=000000' title='{m}&amp;fg=000000' class='latex' /> is now the number of digits in binary, the best-fit parameters are <img src='http://s0.wp.com/latex.php?latex=%7B%5Calpha+%5Capprox+1.18%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;alpha &#92;approx 1.18}&amp;fg=000000' title='{&#92;alpha &#92;approx 1.18}&amp;fg=000000' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=%7Bc+%5Capprox+1.7+%5Ctimes+2%5E%7B26%7D+%2F+235%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{c &#92;approx 1.7 &#92;times 2^{26} / 235}&amp;fg=000000' title='{c &#92;approx 1.7 &#92;times 2^{26} / 235}&amp;fg=000000' class='latex' />:
</p>
<p><table align="center">
<tr>
<td align="left"> <img src='http://s0.wp.com/latex.php?latex=%7Bm%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{m}&amp;fg=000000' title='{m}&amp;fg=000000' class='latex' /> </td>
<td align="left"> Countries with <img src='http://s0.wp.com/latex.php?latex=%7B%5Cgeq+m%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;geq m}&amp;fg=000000' title='{&#92;geq m}&amp;fg=000000' class='latex' /> binary digit populations </td>
<td align="left"> Number </td>
<td align="left"> Pareto prediction </td>
</tr>
<tr>
<td align="left"> 31 </td>
<td align="left"> China, India </td>
<td align="left"> 2 </td>
<td align="left"> 1 </td>
</tr>
<tr>
<td align="left"> 30 </td>
<td align="left"> &#8221; </td>
<td align="left"> 2 </td>
<td align="left"> 2 </td>
</tr>
<tr>
<td align="left"> 29 </td>
<td align="left"> &#8220;, United States of America </td>
<td align="left"> 3 </td>
<td align="left"> 5 </td>
</tr>
<tr>
<td align="left"> 28 </td>
<td align="left"> &#8220;, Indonesia, Brazil, Pakistan, Bangladesh, Nigeria, Russia </td>
<td align="left"> 9 </td>
<td align="left"> 8 </td>
</tr>
<tr>
<td align="left"> 27 </td>
<td align="left"> &#8220;, Japan, Mexico, Philippines, Vietnam, Ethiopia, Germany, Egypt, Turkey </td>
<td align="left"> 17 </td>
<td align="left"> 15</td>
</tr>
<tr>
<td align="left"> 26 </td>
<td align="left"> &#8220;, (Dem. Rep. of the) Congo, Iran, Thailand, France, United Kingdom, Italy, South Africa, (South) Korea, Burma (Myanmar), Ukraine, Colombia, Spain, Argentina, Sudan, Tanzania, Poland, Kenya, Morocco, Algeria </td>
<td align="left"> 36 </td>
<td align="left"> 27 </td>
</tr>
<tr>
<td align="left"> 25 </td>
<td align="left"> &#8220;, Canada, Afghanistan, Uganda, Nepal, Peru, Iraq, Saudi Arabia, Uzbekistan, Venezuela, Malaysia, (North) Korea, Ghana, Yemen, Taiwan, Romania, Mozambique, Sri Lanka, Australia, Cote d&#8217;Ivoire, Madagascar, Syria, Cameroon </td>
<td align="left"> 58 </td>
<td align="left"> 49</td>
</tr>
<tr>
<td align="left"> 24 </td>
<td align="left"> &#8220;, Netherlands, Chile, Kazakhstan, Burkina Faso, Cambodia, Malawi, Ecuador, Niger, Guatemala, Senegal, Angola, Mali, Zambia, Cuba, Zimbabwe, Greece, Portugal, Belgium, Tunisia, Czech Republic, Rwanda, Serbia, Chad, Hungary, Guinea, Belarus, Somalia, Dominican Republic, Bolivia, Sweden, Haiti, Burundi, Benin </td>
<td align="left"> 91 </td>
<td align="left"> 88 </td>
</tr>
<tr>
<td align="left"> 23 </td>
<td align="left"> &#8220;, Austria, Azerbaijan, Honduras, Switzerland, Bulgaria, Tajikistan, Israel, El Salvador, (Hong Kong SAR) China, Paraguay, Laos, Sierra Leone, Jordan, Libya, Papua New Guinea, Togo, Nicaragua, Eritrea, Denmark, Slovakia, Kyrgyzstan, Finland, Turkmenistan, Norway, Georgia, United Arab Emirates, Singapore, Bosnia and Herzegovina, Croatia, Central African Republic, Moldova, Costa Rica </td>
<td align="left"> 123 </td>
<td align="left"> 159 </td>
</tr>
</table>
<p> Thus, with each new scale, the number of countries introduced increases by a factor of a little less than <img src='http://s0.wp.com/latex.php?latex=%7B2%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{2}&amp;fg=000000' title='{2}&amp;fg=000000' class='latex' />, on the average. This approximate doubling of countries with each new scale begins to falter at about the population <img src='http://s0.wp.com/latex.php?latex=%7B2%5E%7B23%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{2^{23}}&amp;fg=000000' title='{2^{23}}&amp;fg=000000' class='latex' /> (i.e. at around <img src='http://s0.wp.com/latex.php?latex=%7B4%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{4}&amp;fg=000000' title='{4}&amp;fg=000000' class='latex' /> million), for the simple reason that one has begun to run out of countries. (Note that the median-population country in this set, Singapore, has a population with <img src='http://s0.wp.com/latex.php?latex=%7B23%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{23}&amp;fg=000000' title='{23}&amp;fg=000000' class='latex' /> binary digits.)
</p>
<p>
These laws are not merely interesting statistical curiosities; for instance, Benford&#8217;s law is often used to help detect fraudulent statistics (such as those arising from accounting fraud), as many such statistics are invented by choosing digits at random, and will therefore deviate significantly from Benford&#8217;s law. (This is nicely discussed in Robert Matthews&#8217; New Scientist article &#8220;<a href="http://www.newscientist.com/article/mg16321944.600">The power of one</a>&#8220;; this article can also be found on the web at a number of other places.) In a somewhat analogous spirit, Zipf&#8217;s law and the Pareto distribution can be used to mathematically test various models of real-world systems (e.g. formation of astronomical objects, accumulation of wealth, population growth of countries, etc.), without necessarily having to fit all the parameters of that model with the actual data.
</p>
<p>
Being empirically observed phenomena rather than abstract mathematical facts, Benford&#8217;s law, Zipf&#8217;s law, and the Pareto distribution cannot be &#8220;proved&#8221; the same way a mathematical theorem can be proved. However, one can still <em>support</em> these laws mathematically in a number of ways, for instance showing how these laws are compatible with each other, and with other plausible hypotheses on the source of the data. In this post I would like to describe a number of ways (both technical and non-technical) in which one can do this; these arguments do not fully explain these laws (in particular, the empirical fact that the exponent <img src='http://s0.wp.com/latex.php?latex=%7B%5Calpha%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;alpha}&amp;fg=000000' title='{&#92;alpha}&amp;fg=000000' class='latex' /> in Zipf&#8217;s law or the Pareto distribution is often close to <img src='http://s0.wp.com/latex.php?latex=%7B1%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{1}&amp;fg=000000' title='{1}&amp;fg=000000' class='latex' /> is still quite a mysterious phenomenon), and do not always have the same universal range of applicability as these laws seem to have, but I hope that they do demonstrate that these laws are not completely arbitrary, and ought to have a satisfactory basis of mathematical support. <!--more-->
</p>
</p>
<p align="center"><b> &#8212;  1. Scale invariance  &#8212; </b></p>
<p>
One consistency check that is enjoyed by all of these laws is that of <em>scale invariance</em> &#8211; they are invariant under rescalings of the data (for instance, by changing the units).
</p>
<p>
For example, suppose for sake of argument that the country populations <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> of the world in 2007 obey Benford&#8217;s law, thus for instance about <img src='http://s0.wp.com/latex.php?latex=%7B30.1%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{30.1&#92;%}&amp;fg=000000' title='{30.1&#92;%}&amp;fg=000000' class='latex' /> of the countries have population with first digit <img src='http://s0.wp.com/latex.php?latex=%7B1%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{1}&amp;fg=000000' title='{1}&amp;fg=000000' class='latex' />, <img src='http://s0.wp.com/latex.php?latex=%7B17.6%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{17.6&#92;%}&amp;fg=000000' title='{17.6&#92;%}&amp;fg=000000' class='latex' /> have population with first digit <img src='http://s0.wp.com/latex.php?latex=%7B2%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{2}&amp;fg=000000' title='{2}&amp;fg=000000' class='latex' />, and so forth. Now, imagine that several decades in the future, say in 2067, all of the countries in the world double their population, from <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> to a new population <img src='http://s0.wp.com/latex.php?latex=%7B%5Ctilde+X+%3A%3D+2X%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;tilde X := 2X}&amp;fg=000000' title='{&#92;tilde X := 2X}&amp;fg=000000' class='latex' />. (This makes the somewhat implausible assumption that growth rates are uniform across all countries; I will talk about what happens when one omits this hypothesis later.) To further simplify the experiment, suppose that no countries are created or dissolved during this time period. What happens to Benford&#8217;s law when passing from <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> to <img src='http://s0.wp.com/latex.php?latex=%7B%5Ctilde+X%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;tilde X}&amp;fg=000000' title='{&#92;tilde X}&amp;fg=000000' class='latex' />?
</p>
<p>
The key observation here, of course, is that the first digit of <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> is linked to the first digit of <img src='http://s0.wp.com/latex.php?latex=%7B%5Ctilde+X+%3D+2X%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;tilde X = 2X}&amp;fg=000000' title='{&#92;tilde X = 2X}&amp;fg=000000' class='latex' />. If, for instance, the first digit of <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> is <img src='http://s0.wp.com/latex.php?latex=%7B1%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{1}&amp;fg=000000' title='{1}&amp;fg=000000' class='latex' />, then the first digit of <img src='http://s0.wp.com/latex.php?latex=%7B%5Ctilde+X%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;tilde X}&amp;fg=000000' title='{&#92;tilde X}&amp;fg=000000' class='latex' /> is either <img src='http://s0.wp.com/latex.php?latex=%7B2%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{2}&amp;fg=000000' title='{2}&amp;fg=000000' class='latex' /> or <img src='http://s0.wp.com/latex.php?latex=%7B3%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{3}&amp;fg=000000' title='{3}&amp;fg=000000' class='latex' />; conversely, if the first digit of <img src='http://s0.wp.com/latex.php?latex=%7B%5Ctilde+X%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;tilde X}&amp;fg=000000' title='{&#92;tilde X}&amp;fg=000000' class='latex' /> is <img src='http://s0.wp.com/latex.php?latex=%7B2%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{2}&amp;fg=000000' title='{2}&amp;fg=000000' class='latex' /> or <img src='http://s0.wp.com/latex.php?latex=%7B3%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{3}&amp;fg=000000' title='{3}&amp;fg=000000' class='latex' />, then the first digit of <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> is <img src='http://s0.wp.com/latex.php?latex=%7B1%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{1}&amp;fg=000000' title='{1}&amp;fg=000000' class='latex' />. As a consequence, the proportion of <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' />&#8216;s with first digit <img src='http://s0.wp.com/latex.php?latex=%7B1%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{1}&amp;fg=000000' title='{1}&amp;fg=000000' class='latex' /> is equal to the proportion of <img src='http://s0.wp.com/latex.php?latex=%7B%5Ctilde+X%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;tilde X}&amp;fg=000000' title='{&#92;tilde X}&amp;fg=000000' class='latex' />&#8216;s with first digit <img src='http://s0.wp.com/latex.php?latex=%7B2%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{2}&amp;fg=000000' title='{2}&amp;fg=000000' class='latex' />, plus the proportion of <img src='http://s0.wp.com/latex.php?latex=%7B%5Ctilde+X%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;tilde X}&amp;fg=000000' title='{&#92;tilde X}&amp;fg=000000' class='latex' />&#8216;s with first digit <img src='http://s0.wp.com/latex.php?latex=%7B3%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{3}&amp;fg=000000' title='{3}&amp;fg=000000' class='latex' />. This is consistent with Benford&#8217;s law holding for both <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=%7B%5Ctilde+X%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;tilde X}&amp;fg=000000' title='{&#92;tilde X}&amp;fg=000000' class='latex' />, since </p>
<p align="center"><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle++%5Clog_%7B10%7D+%5Cfrac%7B2%7D%7B1%7D+%3D+%5Clog_%7B10%7D+%5Cfrac%7B3%7D%7B2%7D+%2B+%5Clog_%7B10%7D+%5Cfrac%7B4%7D%7B3%7D+%28+%3D+%5Clog_%7B10%7D+%5Cfrac%7B4%7D%7B2%7D+%29%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle  &#92;log_{10} &#92;frac{2}{1} = &#92;log_{10} &#92;frac{3}{2} + &#92;log_{10} &#92;frac{4}{3} ( = &#92;log_{10} &#92;frac{4}{2} )&amp;fg=000000' title='&#92;displaystyle  &#92;log_{10} &#92;frac{2}{1} = &#92;log_{10} &#92;frac{3}{2} + &#92;log_{10} &#92;frac{4}{3} ( = &#92;log_{10} &#92;frac{4}{2} )&amp;fg=000000' class='latex' /></p>
<p> (or numerically, <img src='http://s0.wp.com/latex.php?latex=%7B30.1%5C%25+%3D+17.6%5C%25+%2B+12.5%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{30.1&#92;% = 17.6&#92;% + 12.5&#92;%}&amp;fg=000000' title='{30.1&#92;% = 17.6&#92;% + 12.5&#92;%}&amp;fg=000000' class='latex' /> after rounding). Indeed one can check the other digit ranges also and that conclude that Benford&#8217;s law for <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> is compatible with Benford&#8217;s law for <img src='http://s0.wp.com/latex.php?latex=%7B%5Ctilde+X%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;tilde X}&amp;fg=000000' title='{&#92;tilde X}&amp;fg=000000' class='latex' />; to pick a contrasting example, a uniformly distributed model in which each digit from <img src='http://s0.wp.com/latex.php?latex=%7B1%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{1}&amp;fg=000000' title='{1}&amp;fg=000000' class='latex' /> to <img src='http://s0.wp.com/latex.php?latex=%7B9%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{9}&amp;fg=000000' title='{9}&amp;fg=000000' class='latex' /> is the first digit of <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> occurs with probability <img src='http://s0.wp.com/latex.php?latex=%7B1%2F9%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{1/9}&amp;fg=000000' title='{1/9}&amp;fg=000000' class='latex' /> totally fails to be preserved under doubling.</p>
<p>
One can be even more precise. Observe (through telescoping series) that Benford&#8217;s law implies that <a name="benfo">
<p align="center"><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle++%7B%5CBbb+P%7D%28+%5Calpha+10%5En+%5Cleq+X+%26%2360%3B+%5Cbeta+10%5En+%5Chbox%7B+for+some+integer+%7D+n+%29+%3D+%5Clog_%7B10%7D+%5Cfrac%7B%5Cbeta%7D%7B%5Calpha%7D+%5C+%5C+%5C+%5C+%5C+%281%29%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle  {&#92;Bbb P}( &#92;alpha 10^n &#92;leq X &lt; &#92;beta 10^n &#92;hbox{ for some integer } n ) = &#92;log_{10} &#92;frac{&#92;beta}{&#92;alpha} &#92; &#92; &#92; &#92; &#92; (1)&amp;fg=000000' title='&#92;displaystyle  {&#92;Bbb P}( &#92;alpha 10^n &#92;leq X &lt; &#92;beta 10^n &#92;hbox{ for some integer } n ) = &#92;log_{10} &#92;frac{&#92;beta}{&#92;alpha} &#92; &#92; &#92; &#92; &#92; (1)&amp;fg=000000' class='latex' /></p>
<p></a> for all integers <img src='http://s0.wp.com/latex.php?latex=%7B1+%5Cleq+%5Calpha+%5Cleq+%5Cbeta+%26%2360%3B+10%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{1 &#92;leq &#92;alpha &#92;leq &#92;beta &lt; 10}&amp;fg=000000' title='{1 &#92;leq &#92;alpha &#92;leq &#92;beta &lt; 10}&amp;fg=000000' class='latex' />, where the left-hand side denotes the proportion of data for which <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> lies between <img src='http://s0.wp.com/latex.php?latex=%7B%5Calpha+10%5En%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;alpha 10^n}&amp;fg=000000' title='{&#92;alpha 10^n}&amp;fg=000000' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=%7B%5Cbeta+10%5En%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;beta 10^n}&amp;fg=000000' title='{&#92;beta 10^n}&amp;fg=000000' class='latex' /> for some integer <img src='http://s0.wp.com/latex.php?latex=%7Bn%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{n}&amp;fg=000000' title='{n}&amp;fg=000000' class='latex' />. Suppose now that we generalise Benford&#8217;s law to the <em>continuous Benford&#8217;s law</em>, which asserts that <a href="#benfo">(1)</a> is true for all <em>real</em> numbers <img src='http://s0.wp.com/latex.php?latex=%7B1+%5Cleq+%5Calpha+%5Cleq+%5Cbeta+%26%2360%3B+10%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{1 &#92;leq &#92;alpha &#92;leq &#92;beta &lt; 10}&amp;fg=000000' title='{1 &#92;leq &#92;alpha &#92;leq &#92;beta &lt; 10}&amp;fg=000000' class='latex' />. Then it is not hard to show that a statistic <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> obeys the continuous Benford&#8217;s law if and only if its dilate <img src='http://s0.wp.com/latex.php?latex=%7B%5Ctilde+X+%3D2X%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;tilde X =2X}&amp;fg=000000' title='{&#92;tilde X =2X}&amp;fg=000000' class='latex' /> does, and similarly with <img src='http://s0.wp.com/latex.php?latex=%7B2%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{2}&amp;fg=000000' title='{2}&amp;fg=000000' class='latex' /> replaced by any other constant growth factor. (This is easiest seen by observing that <a href="#benfo">(1)</a> is equivalent to asserting that the fractional part of <img src='http://s0.wp.com/latex.php?latex=%7B%5Clog_%7B10%7D+X%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;log_{10} X}&amp;fg=000000' title='{&#92;log_{10} X}&amp;fg=000000' class='latex' /> is uniformly distributed.) In fact, the continuous Benford law is the <em>only</em> distribution for the quantities on the left-hand side of <a href="#benfo">(1)</a> with this scale-invariance property; this fact is a special case of the general fact that Haar measures are unique (see e.g. <a href="http://terrytao.wordpress.com/2009/04/06/the-fourier-transform/">these lecture notes</a>).
</p>
<p>
It is also easy to see that Zipf&#8217;s law and the Pareto distribution also enjoy this sort of scale-invariance property, as long as one generalises the Pareto distribution <a name="pareto">
<p align="center"><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle++%7B%5CBbb+P%7D%28+X+%5Cgeq+10%5Em+%29+%3D+c+10%5E%7B-m%2F%5Calpha%7D+%5C+%5C+%5C+%5C+%5C+%282%29%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle  {&#92;Bbb P}( X &#92;geq 10^m ) = c 10^{-m/&#92;alpha} &#92; &#92; &#92; &#92; &#92; (2)&amp;fg=000000' title='&#92;displaystyle  {&#92;Bbb P}( X &#92;geq 10^m ) = c 10^{-m/&#92;alpha} &#92; &#92; &#92; &#92; &#92; (2)&amp;fg=000000' class='latex' /></p>
<p></a> from integer <img src='http://s0.wp.com/latex.php?latex=%7Bm%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{m}&amp;fg=000000' title='{m}&amp;fg=000000' class='latex' /> to real <img src='http://s0.wp.com/latex.php?latex=%7Bm%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{m}&amp;fg=000000' title='{m}&amp;fg=000000' class='latex' />, just as with Benford&#8217;s law. Once one does that, one can phrase the Pareto distribution law independently of any base as <a name="pareto">
<p align="center"><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle++%7B%5CBbb+P%7D%28+X+%5Cgeq+x+%29+%3D+c+x%5E%7B-1%2F%5Calpha%7D+%5C+%5C+%5C+%5C+%5C+%283%29%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle  {&#92;Bbb P}( X &#92;geq x ) = c x^{-1/&#92;alpha} &#92; &#92; &#92; &#92; &#92; (3)&amp;fg=000000' title='&#92;displaystyle  {&#92;Bbb P}( X &#92;geq x ) = c x^{-1/&#92;alpha} &#92; &#92; &#92; &#92; &#92; (3)&amp;fg=000000' class='latex' /></p>
<p></a> for any <img src='http://s0.wp.com/latex.php?latex=%7Bx%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{x}&amp;fg=000000' title='{x}&amp;fg=000000' class='latex' /> much larger than the median value of <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' />, at which point the scale-invariance is easily seen.
</p>
<p>
One may object that the above thought-experiment was too idealised, because it assumed uniform growth rates for all the statistics at once. What happens if there are non-uniform growth rates? To keep the computations simple, let us consider the following toy model, where we take the same 2007 population statistics <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> as before, and assume that half of the countries (the &#8220;high-growth&#8221; countries) will experience a population doubling by 2067, while the other half (the &#8220;zero-growth&#8221; countries) will keep their population constant, thus the 2067 population statistic <img src='http://s0.wp.com/latex.php?latex=%7B%5Ctilde+X%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;tilde X}&amp;fg=000000' title='{&#92;tilde X}&amp;fg=000000' class='latex' /> is equal to <img src='http://s0.wp.com/latex.php?latex=%7B2X%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{2X}&amp;fg=000000' title='{2X}&amp;fg=000000' class='latex' /> half the time and <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> half the time. (We will assume that our sample sizes are large enough that the <a href="http://en.wikipedia.org/wiki/Law_of_large_numbers">law of large numbers</a> kicks in, and we will therefore ignore issues such as what happens to this &#8220;half the time&#8221; if the number of samples is odd.) Furthermore, we make the plausible but crucial assumption that the event that a country is a high-growth or a zero-growth country is <em>independent</em> of the first digit of the 2007 population; thus, for instance, a country whose population begins with <img src='http://s0.wp.com/latex.php?latex=%7B3%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{3}&amp;fg=000000' title='{3}&amp;fg=000000' class='latex' /> is assumed to be just as likely to be high-growth as one whose population begins with <img src='http://s0.wp.com/latex.php?latex=%7B7%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{7}&amp;fg=000000' title='{7}&amp;fg=000000' class='latex' />.
</p>
<p>
Now let&#8217;s have a look again at the proportion of countries whose 2067 population <img src='http://s0.wp.com/latex.php?latex=%7B%5Ctilde+X%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;tilde X}&amp;fg=000000' title='{&#92;tilde X}&amp;fg=000000' class='latex' /> begins with either <img src='http://s0.wp.com/latex.php?latex=%7B2%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{2}&amp;fg=000000' title='{2}&amp;fg=000000' class='latex' /> or <img src='http://s0.wp.com/latex.php?latex=%7B3%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{3}&amp;fg=000000' title='{3}&amp;fg=000000' class='latex' />. There are exactly two ways in which a country can fall into this category: either it is a zero-growth country whose 2007 population <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> also began with either <img src='http://s0.wp.com/latex.php?latex=%7B2%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{2}&amp;fg=000000' title='{2}&amp;fg=000000' class='latex' /> or <img src='http://s0.wp.com/latex.php?latex=%7B3%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{3}&amp;fg=000000' title='{3}&amp;fg=000000' class='latex' />, or it was a high-growth country whose population in 2007 began with <img src='http://s0.wp.com/latex.php?latex=%7B1%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{1}&amp;fg=000000' title='{1}&amp;fg=000000' class='latex' />. Since all countries have a probability <img src='http://s0.wp.com/latex.php?latex=%7B1%2F2%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{1/2}&amp;fg=000000' title='{1/2}&amp;fg=000000' class='latex' /> of being high-growth regardless of the first digit of their population, we conclude the identity <a name="px">
<p align="center"><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle++%7B%5CBbb+P%7D%28+%5Ctilde+X+%5Chbox%7B+has+first+digit+%7D+2%2C+3+%29+%3D+%5Cfrac%7B1%7D%7B2%7D+%7B%5CBbb+P%7D%28+X+%5Chbox%7B+has+first+digit+%7D+2%2C+3+%29+%5C+%5C+%5C+%5C+%5C+%284%29%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle  {&#92;Bbb P}( &#92;tilde X &#92;hbox{ has first digit } 2, 3 ) = &#92;frac{1}{2} {&#92;Bbb P}( X &#92;hbox{ has first digit } 2, 3 ) &#92; &#92; &#92; &#92; &#92; (4)&amp;fg=000000' title='&#92;displaystyle  {&#92;Bbb P}( &#92;tilde X &#92;hbox{ has first digit } 2, 3 ) = &#92;frac{1}{2} {&#92;Bbb P}( X &#92;hbox{ has first digit } 2, 3 ) &#92; &#92; &#92; &#92; &#92; (4)&amp;fg=000000' class='latex' /></p>
<p></a> </p>
<p align="center"><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle++%2B+%5Cfrac%7B1%7D%7B2%7D+%7B%5CBbb+P%7D%28+X+%5Chbox%7B+has+first+digit+%7D+1+%29%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle  + &#92;frac{1}{2} {&#92;Bbb P}( X &#92;hbox{ has first digit } 1 )&amp;fg=000000' title='&#92;displaystyle  + &#92;frac{1}{2} {&#92;Bbb P}( X &#92;hbox{ has first digit } 1 )&amp;fg=000000' class='latex' /></p>
<p> which is once again compatible with Benford&#8217;s law for <img src='http://s0.wp.com/latex.php?latex=%7B%5Ctilde+X%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;tilde X}&amp;fg=000000' title='{&#92;tilde X}&amp;fg=000000' class='latex' /> since
<p align="center"><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle++%5Clog_%7B10%7D+%5Cfrac%7B4%7D%7B2%7D+%3D+%5Cfrac%7B1%7D%7B2%7D+%5Clog_%7B10%7D+%5Cfrac%7B4%7D%7B2%7D+%2B+%5Cfrac%7B1%7D%7B2%7D+%5Clog+%5Cfrac%7B2%7D%7B1%7D.%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle  &#92;log_{10} &#92;frac{4}{2} = &#92;frac{1}{2} &#92;log_{10} &#92;frac{4}{2} + &#92;frac{1}{2} &#92;log &#92;frac{2}{1}.&amp;fg=000000' title='&#92;displaystyle  &#92;log_{10} &#92;frac{4}{2} = &#92;frac{1}{2} &#92;log_{10} &#92;frac{4}{2} + &#92;frac{1}{2} &#92;log &#92;frac{2}{1}.&amp;fg=000000' class='latex' /></p>
<p> More generally, it is not hard to show that if <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> obeys the continuous Benford&#8217;s law <a href="#benfo">(1)</a>, and one multiplies <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> by some positive multiplier <img src='http://s0.wp.com/latex.php?latex=%7BY%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{Y}&amp;fg=000000' title='{Y}&amp;fg=000000' class='latex' /> which is independent of the first digit of <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> (and, <em>a fortiori</em>, is independent of the fractional part of <img src='http://s0.wp.com/latex.php?latex=%7B%5Clog_%7B10%7D+X%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;log_{10} X}&amp;fg=000000' title='{&#92;log_{10} X}&amp;fg=000000' class='latex' />), one obtains another quantity <img src='http://s0.wp.com/latex.php?latex=%7B%5Ctilde+X%3DXY%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;tilde X=XY}&amp;fg=000000' title='{&#92;tilde X=XY}&amp;fg=000000' class='latex' /> which also obeys the continuous Benford&#8217;s law. (Indeed, we have already seen this to be the case when <img src='http://s0.wp.com/latex.php?latex=%7BY%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{Y}&amp;fg=000000' title='{Y}&amp;fg=000000' class='latex' /> is a deterministic constant, and the case when <img src='http://s0.wp.com/latex.php?latex=%7BY%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{Y}&amp;fg=000000' title='{Y}&amp;fg=000000' class='latex' /> is random then follows simply by conditioning <img src='http://s0.wp.com/latex.php?latex=%7BY%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{Y}&amp;fg=000000' title='{Y}&amp;fg=000000' class='latex' /> to be fixed.)</p>
<p>
In particular, we see an absorptive property of Benford&#8217;s law: if <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> obeys Benford&#8217;s law, and <img src='http://s0.wp.com/latex.php?latex=%7BY%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{Y}&amp;fg=000000' title='{Y}&amp;fg=000000' class='latex' /> is any positive statistic independent of <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' />, then the product <img src='http://s0.wp.com/latex.php?latex=%7B%5Ctilde+X%3DXY%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;tilde X=XY}&amp;fg=000000' title='{&#92;tilde X=XY}&amp;fg=000000' class='latex' /> also obeys Benford&#8217;s law &#8211; <em>even if <img src='http://s0.wp.com/latex.php?latex=%7BY%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{Y}&amp;fg=000000' title='{Y}&amp;fg=000000' class='latex' /> did not obey this law</em>. Thus, if a statistic is the product of many independent factors, then it only requires a single factor to obey Benford&#8217;s law in order for the whole product to obey the law also. For instance, the population of a country is the product of its area and its population density. Assuming that the population density of a country is independent of the size of that country (which is not a completely reasonable assumption, but let us take it for the sake of argument), then we see that Benford&#8217;s law for the population would follow if just one of the area or population density obeyed this law. It is also clear that Benford&#8217;s law is the only distribution with this absorptive property (if there was another law with this property, what would happen if one multiplied a statistic with that law with an independent statistic with Benford&#8217;s law?). Thus we begin to get a glimpse as to why Benford&#8217;s law is universal for quantities which are the product of many separate factors, in a manner that no other law could be.
</p>
<p>
As an example: for any given number <img src='http://s0.wp.com/latex.php?latex=%7BN%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{N}&amp;fg=000000' title='{N}&amp;fg=000000' class='latex' />, the uniform distribution from <img src='http://s0.wp.com/latex.php?latex=%7B1%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{1}&amp;fg=000000' title='{1}&amp;fg=000000' class='latex' /> to <img src='http://s0.wp.com/latex.php?latex=%7BN%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{N}&amp;fg=000000' title='{N}&amp;fg=000000' class='latex' /> does not obey Benford&#8217;s law; for instance, if one picks a random number from <img src='http://s0.wp.com/latex.php?latex=%7B1%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{1}&amp;fg=000000' title='{1}&amp;fg=000000' class='latex' /> to <img src='http://s0.wp.com/latex.php?latex=%7B999%2C999%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{999,999}&amp;fg=000000' title='{999,999}&amp;fg=000000' class='latex' /> then each digit from <img src='http://s0.wp.com/latex.php?latex=%7B1%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{1}&amp;fg=000000' title='{1}&amp;fg=000000' class='latex' /> to <img src='http://s0.wp.com/latex.php?latex=%7B9%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{9}&amp;fg=000000' title='{9}&amp;fg=000000' class='latex' /> appears as the first digit with an equal probability of <img src='http://s0.wp.com/latex.php?latex=%7B1%2F9%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{1/9}&amp;fg=000000' title='{1/9}&amp;fg=000000' class='latex' /> each. However, if <img src='http://s0.wp.com/latex.php?latex=%7BN%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{N}&amp;fg=000000' title='{N}&amp;fg=000000' class='latex' /> is not fixed, but instead obeys Benford&#8217;s law, then a random number selected from <img src='http://s0.wp.com/latex.php?latex=%7B1%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{1}&amp;fg=000000' title='{1}&amp;fg=000000' class='latex' /> to <img src='http://s0.wp.com/latex.php?latex=%7BN%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{N}&amp;fg=000000' title='{N}&amp;fg=000000' class='latex' /> also obeys Benford&#8217;s law (ignoring for now the distinction between continuous and discrete distributions), as it can be viewed as the product of <img src='http://s0.wp.com/latex.php?latex=%7BN%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{N}&amp;fg=000000' title='{N}&amp;fg=000000' class='latex' /> with an independent random number selected from between <img src='http://s0.wp.com/latex.php?latex=%7B0%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{0}&amp;fg=000000' title='{0}&amp;fg=000000' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=%7B1%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{1}&amp;fg=000000' title='{1}&amp;fg=000000' class='latex' />.
</p>
<p>
Actually, one can say something even stronger than the absorption property. Suppose that the continuous Benford&#8217;s law <a href="#benfo">(1)</a> for a statistic <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> did not hold exactly, but instead held with some accuracy <img src='http://s0.wp.com/latex.php?latex=%7B%5Cvarepsilon+%26%2362%3B+0%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;varepsilon &gt; 0}&amp;fg=000000' title='{&#92;varepsilon &gt; 0}&amp;fg=000000' class='latex' />, thus <a name="bof">
<p align="center"><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle++%5Clog_%7B10%7D+%5Cfrac%7B%5Cbeta%7D%7B%5Calpha%7D+-+%5Cvarepsilon+%5Cleq+%7B%5CBbb+P%7D%28+%5Calpha+10%5En+%5Cleq+X+%26%2360%3B+%5Cbeta+10%5En+%5Chbox%7B+for+some+integer+%7D+n+%29+%5Cleq+%5Clog_%7B10%7D+%5Cfrac%7B%5Cbeta%7D%7B%5Calpha%7D+%2B+%5Cvarepsilon+%5C+%5C+%5C+%5C+%5C+%285%29%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle  &#92;log_{10} &#92;frac{&#92;beta}{&#92;alpha} - &#92;varepsilon &#92;leq {&#92;Bbb P}( &#92;alpha 10^n &#92;leq X &lt; &#92;beta 10^n &#92;hbox{ for some integer } n ) &#92;leq &#92;log_{10} &#92;frac{&#92;beta}{&#92;alpha} + &#92;varepsilon &#92; &#92; &#92; &#92; &#92; (5)&amp;fg=000000' title='&#92;displaystyle  &#92;log_{10} &#92;frac{&#92;beta}{&#92;alpha} - &#92;varepsilon &#92;leq {&#92;Bbb P}( &#92;alpha 10^n &#92;leq X &lt; &#92;beta 10^n &#92;hbox{ for some integer } n ) &#92;leq &#92;log_{10} &#92;frac{&#92;beta}{&#92;alpha} + &#92;varepsilon &#92; &#92; &#92; &#92; &#92; (5)&amp;fg=000000' class='latex' /></p>
<p></a> for all <img src='http://s0.wp.com/latex.php?latex=%7B1+%5Cleq+%5Calpha+%5Cleq+%5Cbeta+%26%2360%3B+10%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{1 &#92;leq &#92;alpha &#92;leq &#92;beta &lt; 10}&amp;fg=000000' title='{1 &#92;leq &#92;alpha &#92;leq &#92;beta &lt; 10}&amp;fg=000000' class='latex' />. Then it is not hard to see that any dilated statistic, such as <img src='http://s0.wp.com/latex.php?latex=%7B%5Ctilde+X+%3D+2X%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;tilde X = 2X}&amp;fg=000000' title='{&#92;tilde X = 2X}&amp;fg=000000' class='latex' />, or more generally <img src='http://s0.wp.com/latex.php?latex=%7B%5Ctilde+X%3DXY%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;tilde X=XY}&amp;fg=000000' title='{&#92;tilde X=XY}&amp;fg=000000' class='latex' /> for any fixed deterministic <img src='http://s0.wp.com/latex.php?latex=%7BY%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{Y}&amp;fg=000000' title='{Y}&amp;fg=000000' class='latex' />, also obeys <a href="#bof">(5)</a> with exactly the same accuracy <img src='http://s0.wp.com/latex.php?latex=%7B%5Cvarepsilon%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;varepsilon}&amp;fg=000000' title='{&#92;varepsilon}&amp;fg=000000' class='latex' />. But now suppose one uses a variable multiplier; for instance, suppose one uses the model discussed earlier in which <img src='http://s0.wp.com/latex.php?latex=%7B%5Ctilde+X%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;tilde X}&amp;fg=000000' title='{&#92;tilde X}&amp;fg=000000' class='latex' /> is equal to <img src='http://s0.wp.com/latex.php?latex=%7B2X%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{2X}&amp;fg=000000' title='{2X}&amp;fg=000000' class='latex' /> half the time and <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> half the time. Then the relationship between the distribution of the first digit of <img src='http://s0.wp.com/latex.php?latex=%7B%5Ctilde+X%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;tilde X}&amp;fg=000000' title='{&#92;tilde X}&amp;fg=000000' class='latex' /> and the first digit of <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> is given by formulae such as <a href="#px">(4)</a>. Now, in the right-hand side of <a href="#px">(4)</a>, each of the two terms <img src='http://s0.wp.com/latex.php?latex=%7B%7B%5CBbb+P%7D%28+X+%5Chbox%7B+has+first+digit+%7D+2%2C+3+%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{{&#92;Bbb P}( X &#92;hbox{ has first digit } 2, 3 )}&amp;fg=000000' title='{{&#92;Bbb P}( X &#92;hbox{ has first digit } 2, 3 )}&amp;fg=000000' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=%7B%7B%5CBbb+P%7D%28+X+%5Chbox%7B+has+first+digit+%7D+1+%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{{&#92;Bbb P}( X &#92;hbox{ has first digit } 1 )}&amp;fg=000000' title='{{&#92;Bbb P}( X &#92;hbox{ has first digit } 1 )}&amp;fg=000000' class='latex' /> differs from the Benford&#8217;s law predictions of <img src='http://s0.wp.com/latex.php?latex=%7B%5Clog_%7B10%7D+%5Cfrac%7B4%7D%7B2%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;log_{10} &#92;frac{4}{2}}&amp;fg=000000' title='{&#92;log_{10} &#92;frac{4}{2}}&amp;fg=000000' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=%7B%5Clog_%7B10%7D+%5Cfrac%7B2%7D%7B1%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;log_{10} &#92;frac{2}{1}}&amp;fg=000000' title='{&#92;log_{10} &#92;frac{2}{1}}&amp;fg=000000' class='latex' /> respectively by at most <img src='http://s0.wp.com/latex.php?latex=%7B%5Cvarepsilon%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;varepsilon}&amp;fg=000000' title='{&#92;varepsilon}&amp;fg=000000' class='latex' />. Since the left-hand side of <a href="#px">(4)</a> is the average of these two terms, it also differs from the Benford law prediction by at most <img src='http://s0.wp.com/latex.php?latex=%7B%5Cvarepsilon%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;varepsilon}&amp;fg=000000' title='{&#92;varepsilon}&amp;fg=000000' class='latex' />. But the averaging opens up an opportunity for cancelling; for instance, an overestimate of <img src='http://s0.wp.com/latex.php?latex=%7B%2B%5Cvarepsilon%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{+&#92;varepsilon}&amp;fg=000000' title='{+&#92;varepsilon}&amp;fg=000000' class='latex' /> for <img src='http://s0.wp.com/latex.php?latex=%7B%7B%5CBbb+P%7D%28+X+%5Chbox%7B+has+first+digit+%7D+2%2C+3+%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{{&#92;Bbb P}( X &#92;hbox{ has first digit } 2, 3 )}&amp;fg=000000' title='{{&#92;Bbb P}( X &#92;hbox{ has first digit } 2, 3 )}&amp;fg=000000' class='latex' /> could cancel an underestimate of <img src='http://s0.wp.com/latex.php?latex=%7B-%5Cvarepsilon%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{-&#92;varepsilon}&amp;fg=000000' title='{-&#92;varepsilon}&amp;fg=000000' class='latex' /> for <img src='http://s0.wp.com/latex.php?latex=%7B%7B%5CBbb+P%7D%28+X+%5Chbox%7B+has+first+digit+%7D+1+%29%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{{&#92;Bbb P}( X &#92;hbox{ has first digit } 1 )}&amp;fg=000000' title='{{&#92;Bbb P}( X &#92;hbox{ has first digit } 1 )}&amp;fg=000000' class='latex' /> to produce a spot-on prediction for <img src='http://s0.wp.com/latex.php?latex=%7B%5Ctilde+X%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;tilde X}&amp;fg=000000' title='{&#92;tilde X}&amp;fg=000000' class='latex' />. Thus we see that variable multipliers (or variable growth rates) not only preserve Benford&#8217;s law, but in fact <em>stabilise</em> it by averaging out the errors. In fact, if one started with a distribution which did not initially obey Benford&#8217;s law, and then started applying some variable (and independent) growth rates to the various samples in the distribution, then under reasonable assumptions one can show that the resulting distribution will converge to Benford&#8217;s law over time. This helps explain the universality of Benford&#8217;s law for statistics such as populations, for which the independent variable growth law is not so unreasonable (at least, until the population hits some maximum capacity threshold).
</p>
<p>
Note that the independence property is crucial; if for instance population growth always slowed down for some inexplicable reason to a crawl whenever the first digit of the population was <img src='http://s0.wp.com/latex.php?latex=%7B6%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{6}&amp;fg=000000' title='{6}&amp;fg=000000' class='latex' />, then there would be a noticeable deviation from Benford&#8217;s law, particularly in digits <img src='http://s0.wp.com/latex.php?latex=%7B6%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{6}&amp;fg=000000' title='{6}&amp;fg=000000' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=%7B7%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{7}&amp;fg=000000' title='{7}&amp;fg=000000' class='latex' />, due to this growth bottleneck. But this is not a particularly plausible scenario (being somewhat analogous to <a href="http://en.wikipedia.org/wiki/Maxwell&#037;27s_demon">Maxwell&#8217;s demon</a> in thermodynamics).
</p>
<p>
The above analysis can also be carried over to some extent to the Pareto distribution and Zipf&#8217;s law; if a statistic <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> obeys these laws approximately, then after multiplying by an independent variable <img src='http://s0.wp.com/latex.php?latex=%7BY%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{Y}&amp;fg=000000' title='{Y}&amp;fg=000000' class='latex' />, the product <img src='http://s0.wp.com/latex.php?latex=%7B%5Ctilde+X%3DXY%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;tilde X=XY}&amp;fg=000000' title='{&#92;tilde X=XY}&amp;fg=000000' class='latex' /> will obey the same laws with equal or higher accuracy, so long as <img src='http://s0.wp.com/latex.php?latex=%7BY%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{Y}&amp;fg=000000' title='{Y}&amp;fg=000000' class='latex' /> is small compared to the number of scales that <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> typically ranges over. (One needs a restriction such as this because the Pareto distribution and Zipf&#8217;s law must break down below the median.) These laws are also stable under other multiplicative processes, for instance if some fraction of the samples in <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> spontaneously split into two smaller pieces, or conversely if two samples in <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> spontaneously merge into one; as before, the key is that the occurrence of these events should be independent of the actual size of the objects being split. If one considers a generalisation of the Pareto or Zipf law in which the exponent <img src='http://s0.wp.com/latex.php?latex=%7B%5Calpha%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;alpha}&amp;fg=000000' title='{&#92;alpha}&amp;fg=000000' class='latex' /> is not fixed, but varies with <img src='http://s0.wp.com/latex.php?latex=%7Bn%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{n}&amp;fg=000000' title='{n}&amp;fg=000000' class='latex' /> or <img src='http://s0.wp.com/latex.php?latex=%7Bk%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{k}&amp;fg=000000' title='{k}&amp;fg=000000' class='latex' />, then the effect of these sorts of multiplicative changes is to blur and average together the various values of <img src='http://s0.wp.com/latex.php?latex=%7B%5Calpha%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;alpha}&amp;fg=000000' title='{&#92;alpha}&amp;fg=000000' class='latex' />, thus &#8220;flattening&#8221; the <img src='http://s0.wp.com/latex.php?latex=%7B%5Calpha%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;alpha}&amp;fg=000000' title='{&#92;alpha}&amp;fg=000000' class='latex' /> curve over time and making the distribution approach Zipf&#8217;s law and/or the Pareto distribution. This helps explain why <img src='http://s0.wp.com/latex.php?latex=%7B%5Calpha%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;alpha}&amp;fg=000000' title='{&#92;alpha}&amp;fg=000000' class='latex' /> eventually becomes constant; however, I do not have a good explanation as to why <img src='http://s0.wp.com/latex.php?latex=%7B%5Calpha%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;alpha}&amp;fg=000000' title='{&#92;alpha}&amp;fg=000000' class='latex' /> is often close to <img src='http://s0.wp.com/latex.php?latex=%7B1%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{1}&amp;fg=000000' title='{1}&amp;fg=000000' class='latex' />.
</p>
</p>
<p align="center"><b> &#8212;  2. Compatibility between laws  &#8212; </b></p>
<p>
Another mathematical line of support for Benford&#8217;s law, Zipf&#8217;s law, and the Pareto distribution are that the laws are highly compatible with each other. For instance, Zipf&#8217;s law and the Pareto distribution are formally equivalent: if there are <img src='http://s0.wp.com/latex.php?latex=%7BN%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{N}&amp;fg=000000' title='{N}&amp;fg=000000' class='latex' /> samples of <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' />, then applying <a href="#pareto">(3)</a> with <img src='http://s0.wp.com/latex.php?latex=%7Bx%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{x}&amp;fg=000000' title='{x}&amp;fg=000000' class='latex' /> equal to the <img src='http://s0.wp.com/latex.php?latex=%7Bn%5E%7Bth%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{n^{th}}&amp;fg=000000' title='{n^{th}}&amp;fg=000000' class='latex' /> largest value <img src='http://s0.wp.com/latex.php?latex=%7BX_n%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X_n}&amp;fg=000000' title='{X_n}&amp;fg=000000' class='latex' /> of <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> gives </p>
<p align="center"><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle++%5Cfrac%7Bn%7D%7BN%7D+%3D+%7B%5CBbb+P%7D%28+X+%5Cgeq+X_n+%29+%3D+c+X_n%5E%7B-1%2F%5Calpha%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle  &#92;frac{n}{N} = {&#92;Bbb P}( X &#92;geq X_n ) = c X_n^{-1/&#92;alpha}&amp;fg=000000' title='&#92;displaystyle  &#92;frac{n}{N} = {&#92;Bbb P}( X &#92;geq X_n ) = c X_n^{-1/&#92;alpha}&amp;fg=000000' class='latex' /></p>
<p> which implies Zipf&#8217;s law <img src='http://s0.wp.com/latex.php?latex=%7BX_n+%3D+C+n%5E%7B-%5Calpha%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X_n = C n^{-&#92;alpha}}&amp;fg=000000' title='{X_n = C n^{-&#92;alpha}}&amp;fg=000000' class='latex' /> with <img src='http://s0.wp.com/latex.php?latex=%7BC+%3A%3D+%28Nc%29%5E%5Calpha%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{C := (Nc)^&#92;alpha}&amp;fg=000000' title='{C := (Nc)^&#92;alpha}&amp;fg=000000' class='latex' />. Conversely one can deduce the Pareto distribution from Zipf&#8217;s law. These deductions are only formal in nature, because the Pareto distribution can only hold exactly for continuous distributions, whereas Zipf&#8217;s law only makes sense for discrete distributions, but one can generate more rigorous variants of these deductions without much difficulty. </p>
<p>
In some literature, Zipf&#8217;s law is applied primarily near the extreme edge of the distribution (e.g. the top <img src='http://s0.wp.com/latex.php?latex=%7B0.1%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{0.1&#92;%}&amp;fg=000000' title='{0.1&#92;%}&amp;fg=000000' class='latex' /> of the sample space), whereas the Pareto distribution in regions closer to the bulk (e.g. between the top <img src='http://s0.wp.com/latex.php?latex=%7B0.1%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{0.1&#92;%}&amp;fg=000000' title='{0.1&#92;%}&amp;fg=000000' class='latex' /> and and top <img src='http://s0.wp.com/latex.php?latex=%7B50%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{50&#92;%}&amp;fg=000000' title='{50&#92;%}&amp;fg=000000' class='latex' />). But this is mostly a difference of degree rather than of kind, though in some cases (such as with the example of the 2007 country populations data set) the exponent <img src='http://s0.wp.com/latex.php?latex=%7B%5Calpha%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;alpha}&amp;fg=000000' title='{&#92;alpha}&amp;fg=000000' class='latex' /> for the Pareto distribtion in the bulk can differ slightly from the exponent for Zipf&#8217;s law at the extreme edge.
</p>
<p>
The relationship between Zipf&#8217;s law or the Pareto distribution and Benford&#8217;s law is more subtle. For instance Benford&#8217;s law predicts that the proportion of <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> with initial digit <img src='http://s0.wp.com/latex.php?latex=%7B1%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{1}&amp;fg=000000' title='{1}&amp;fg=000000' class='latex' /> should equal the proportion of <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> with initial digit <img src='http://s0.wp.com/latex.php?latex=%7B2%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{2}&amp;fg=000000' title='{2}&amp;fg=000000' class='latex' /> or <img src='http://s0.wp.com/latex.php?latex=%7B3%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{3}&amp;fg=000000' title='{3}&amp;fg=000000' class='latex' />. But if one formally uses the Pareto distribution <a href="#pareto">(3)</a> to compare those <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> between <img src='http://s0.wp.com/latex.php?latex=%7B10%5Em%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{10^m}&amp;fg=000000' title='{10^m}&amp;fg=000000' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=%7B2+%5Ctimes+10%5Em%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{2 &#92;times 10^m}&amp;fg=000000' title='{2 &#92;times 10^m}&amp;fg=000000' class='latex' />, and those <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> between <img src='http://s0.wp.com/latex.php?latex=%7B2+%5Ctimes+10%5Em%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{2 &#92;times 10^m}&amp;fg=000000' title='{2 &#92;times 10^m}&amp;fg=000000' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=%7B4+%5Ctimes+10%5Em%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{4 &#92;times 10^m}&amp;fg=000000' title='{4 &#92;times 10^m}&amp;fg=000000' class='latex' />, it seems that the former is larger by a factor of <img src='http://s0.wp.com/latex.php?latex=%7B2%5E%7B1%2F%5Calpha%7D%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{2^{1/&#92;alpha}}&amp;fg=000000' title='{2^{1/&#92;alpha}}&amp;fg=000000' class='latex' />, which upon summing by <img src='http://s0.wp.com/latex.php?latex=%7Bm%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{m}&amp;fg=000000' title='{m}&amp;fg=000000' class='latex' /> appears inconsistent with Benford&#8217;s law (unless <img src='http://s0.wp.com/latex.php?latex=%7B%5Calpha%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;alpha}&amp;fg=000000' title='{&#92;alpha}&amp;fg=000000' class='latex' /> is extremely large). A similar inconsistency is revealed if one uses Zipf&#8217;s law instead.
</p>
<p>
However, the fallacy here is that the Pareto distribution (or Zipf&#8217;s law) does not apply on the entire range of <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' />, but only on the upper tail region when <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> is significantly higher than the median; it is a law for the <em>outliers</em> of <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' /> only. In contrast, Benford&#8217;s law concerns the behaviour of <em>typical</em> values of <img src='http://s0.wp.com/latex.php?latex=%7BX%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{X}&amp;fg=000000' title='{X}&amp;fg=000000' class='latex' />; the behaviour of the top <img src='http://s0.wp.com/latex.php?latex=%7B0.1%5C%25%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{0.1&#92;%}&amp;fg=000000' title='{0.1&#92;%}&amp;fg=000000' class='latex' /> is of negligible significance to Benford&#8217;s law, though it is of prime importance for Zipf&#8217;s law and the Pareto distribution. Thus the two laws describe different components of the distribution and thus complement each other. Roughly speaking, Benford&#8217;s law asserts that the bulk distribution of <img src='http://s0.wp.com/latex.php?latex=%7B%5Clog_%7B10%7D+X%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;log_{10} X}&amp;fg=000000' title='{&#92;log_{10} X}&amp;fg=000000' class='latex' /> is locally uniform at unit scales, while the Pareto distribution (or Zipf&#8217;s law) asserts that the tail distribution of <img src='http://s0.wp.com/latex.php?latex=%7B%5Clog_%7B10%7D+X%7D%26%2338%3Bfg%3D000000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;log_{10} X}&amp;fg=000000' title='{&#92;log_{10} X}&amp;fg=000000' class='latex' /> decays exponentially. Note that Benford&#8217;s law only describes the fine-scale behaviour of the bulk distribution; the coarse-scale distribution can be a variety of distributions (e.g. log-gaussian).
</p></p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Small samples, and the margin of error]]></title>
<link>http://terrytao.wordpress.com/2008/10/10/small-samples-and-the-margin-of-error/</link>
<pubDate>Fri, 10 Oct 2008 20:26:30 +0000</pubDate>
<dc:creator>Terence Tao</dc:creator>
<guid>http://terrytao.wordpress.com/2008/10/10/small-samples-and-the-margin-of-error/</guid>
<description><![CDATA[The U.S. presidential election is now only a few weeks away.  The politics of this election are of c]]></description>
<content:encoded><![CDATA[<p>The U.S. presidential election is now only a few weeks away.  The politics of this election are of course interesting and important, but I do <em>not</em> want to discuss these topics here (there is not exactly a shortage of other venues for such a discussion), and would request that readers refrain from doing so in the comments to this post.  However, I thought it would be apropos to talk about some of the basic mathematics underlying electoral polling, and specifically to explain the fact, which can be highly unintuitive to those not well versed in statistics, that polls can be accurate even when sampling only a tiny fraction of the entire population.</p>
<p>Take for instance a nationwide poll of U.S. voters on which presidential candidate they intend to vote for.  A typical poll will ask a number <img src='http://s0.wp.com/latex.php?latex=n&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='n' title='n' class='latex' /> of randomly selected voters for their opinion; a typical value here is <img src='http://s0.wp.com/latex.php?latex=n+%3D+1000&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='n = 1000' title='n = 1000' class='latex' />.  In contrast, the total voting-eligible population of the U.S. &#8211; let&#8217;s call this set <img src='http://s0.wp.com/latex.php?latex=X&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='X' title='X' class='latex' /> &#8211; is about 200 million.  (The actual turnout in the election is likely to be closer to 100 million, but let&#8217;s ignore this fact for the sake of discussion.)  Thus, such a poll would sample about 0.0005% of the total population <img src='http://s0.wp.com/latex.php?latex=X&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='X' title='X' class='latex' /> &#8211; an incredibly tiny fraction.  Nevertheless, the <a href="http://en.wikipedia.org/wiki/Margin_of_error">margin of error</a> (at the 95% <a href="http://en.wikipedia.org/wiki/Confidence_interval">confidence level</a>) for such a poll, if conducted under idealised conditions (see below), is about 3%.  In other words, if we let <img src='http://s0.wp.com/latex.php?latex=p&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p' title='p' class='latex' /> denote the proportion of the entire population <img src='http://s0.wp.com/latex.php?latex=X&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='X' title='X' class='latex' /> that will vote for a given candidate <img src='http://s0.wp.com/latex.php?latex=A&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='A' title='A' class='latex' />, and let <img src='http://s0.wp.com/latex.php?latex=%5Coverline%7Bp%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;overline{p}' title='&#92;overline{p}' class='latex' /> denote the proportion of the polled voters that will vote for <img src='http://s0.wp.com/latex.php?latex=A&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='A' title='A' class='latex' />, then the event <img src='http://s0.wp.com/latex.php?latex=%5Coverline%7Bp%7D-0.03+%5Cleq+p+%5Cleq+%5Coverline%7Bp%7D%2B0.03&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;overline{p}-0.03 &#92;leq p &#92;leq &#92;overline{p}+0.03' title='&#92;overline{p}-0.03 &#92;leq p &#92;leq &#92;overline{p}+0.03' class='latex' /> will occur with probability at least 0.95.  Thus, for instance (and oversimplifying a little &#8211; see below), if the poll reports that 55% of respondents would vote for A, then the true percentage of the electorate that would vote for A has at least a 95% chance of lying between 52% and 58%.  Larger polls will of course give a smaller margin of error; for instance the margin of error for an (idealised) poll of 2,000 voters is about 2%.</p>
<p>I&#8217;ll give a rigorous proof of a weaker version of the above statement (giving a margin of error of about 7%, rather than 3%) in an appendix at the end of this post.  But the main point of my post here is a little different, namely to address the common misconception that the accuracy of a poll is a function of the <em>relative</em> sample size rather than the <em>absolute</em> sample size, which would suggest that a poll involving only 0.0005% of the population could not possibly have a margin of error as low as 3%.  I also want to point out some limitations of the mathematical analysis; depending on the methodology and the context, some polls involving 1000 respondents may have a much higher margin of error than the idealised rate of 3%.</p>
<p><!--more--></p>
<p style="text-align:center;">&#8211; Assumptions and conclusion &#8211;</p>
<p>Not all polls are created equal; there are a certain number of hypotheses on the methodology and effectiveness of the poll that we have to assume in order to make our mathematical conclusions valid.  We will make the following idealised assumptions:</p>
<ol>
<li><strong>Simple question. </strong>Voters polled can only offer one of two responses, which I will call A and not-A; thus we ignore the effect of third-party candidates, undecided voters, or refusals to respond.  In particular, we do not try to combine this data with other questions about the polled voters, such as demographic data.  We also assume that the question is unambiguous and cannot be misinterpreted by respondents (see Hypothesis 3 below).</li>
<li><strong>Perfect response rate.</strong> All voters polled offer a response; there are no refusals to respond to the poll, or failures to make contact with the voter being polled.  (This is a special case of 1., but deserves to be emphasised.)  In particular, this excludes polls that are <a href="http://en.wikipedia.org/wiki/Self-selection">self-selected</a>, such as internet polls (since in most cases, a large fraction of viewers of a web page with a poll will refuse to respond to that poll).</li>
<li><strong>Honest responses</strong>.  The response given by a voter to the poll is an accurate representation whether that voter intends to vote for <img src='http://s0.wp.com/latex.php?latex=A&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='A' title='A' class='latex' /> or not; thus we ignore response-distorting effects such as the <a href="http://en.wikipedia.org/wiki/Bradley_effect">Bradley effect</a> or <a href="http://en.wikipedia.org/wiki/Push_poll">push-polling</a>, as well as <a href="http://en.wikipedia.org/wiki/Tactical_voting">tactical voting</a>, frivolous responses, misunderstanding of the question, or attempts to &#8220;game&#8221; a poll by the respondents.</li>
<li><strong>Fixed poll size.</strong> The number <img src='http://s0.wp.com/latex.php?latex=n&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='n' title='n' class='latex' /> of polled voters is fixed in advance; in particular, one cannot keep polling until one has achieved some desired outcome, and then stop.</li>
<li><a href="http://en.wikipedia.org/wiki/Simple_random_sample"><strong>Simple random sampling</strong></a> (without replacement).  Each one of the <img src='http://s0.wp.com/latex.php?latex=n&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='n' title='n' class='latex' /> voters polled is selected <a href="http://en.wikipedia.org/wiki/Uniform_distribution_(discrete)">uniformly at random</a> among the entire population <img src='http://s0.wp.com/latex.php?latex=X&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='X' title='X' class='latex' />, thus each voter is equally likely to be selected by the poll, and no non-voter can be selected by the poll.  (In particular, we make the important assumption that there is no <a href="http://en.wikipedia.org/wiki/Selection_bias">selection bias</a>.)  Furthermore, each polled voter is chosen <a href="http://en.wikipedia.org/wiki/Statistical_independence">independently</a> of all the others, except for the one condition that we do not poll any given voter more than once.  (Thus, once a voter is polled, that voter is &#8220;crossed off the list&#8221; of the pool <img src='http://s0.wp.com/latex.php?latex=X&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='X' title='X' class='latex' /> of voters that one randomly selects from to determine the next voter polled.)   In particular, we assume that the poll is not <a href="http://en.wikipedia.org/wiki/Cluster_sampling">clustered</a>.</li>
<li><strong>Honest reporting.</strong> The results of the poll are always reported, with no inaccuracies; one cannot cancel, modify, or ignore a poll once it has begun.  In particular, one cannot conduct multiple polls and only report the &#8220;best&#8221; results (thus running the risk of <a href="http://en.wikipedia.org/wiki/Confirmation_bias">confirmation bias</a>).</li>
</ol>
<p>Polls which deviate significantly from these hypotheses (e.g. due to complex questions, self-selection or other selection bias, confirmation bias, inaccurate responses, a high refusal rate, variable poll size, or clustering) will generally be less accurate than an idealised poll with the same sample size.  Of course, there is a substantial literature in statistics (and polling methodology) devoted to measuring, mitigating, avoiding, or compensating for these less ideal situations, but we will not discuss those (important) issues here.  We will remark though that in practice it is difficult to make the poll selection truly uniform.  For instance, if one is conducting a telephone poll, then the sample will of course be heavily biased towards those voters who actually own phones; a little more subtly, it will also be biased toward those voters who are near their phones at the time the poll was conducted, and have the time and inclination to answer phone calls.  As long as these factors are not strongly <a href="http://en.wikipedia.org/wiki/Correlation">correlated</a> with the poll question (i.e. whether the voter will vote for A), this is not a major concern, but in some cases, the poll methodology will need to be adjusted (e.g. by reweighting the sample) to compensate for the non-uniformity.</p>
<p>As stated in the introduction, we let <img src='http://s0.wp.com/latex.php?latex=p&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p' title='p' class='latex' /> be the proportion of the entire population <img src='http://s0.wp.com/latex.php?latex=X&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='X' title='X' class='latex' /> that will vote for <img src='http://s0.wp.com/latex.php?latex=A&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='A' title='A' class='latex' />, and <img src='http://s0.wp.com/latex.php?latex=%5Coverline%7Bp%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;overline{p}' title='&#92;overline{p}' class='latex' /> be the proportion of the polled voters that will vote for <img src='http://s0.wp.com/latex.php?latex=A&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='A' title='A' class='latex' /> (which, by Hypotheses 2 and 3, is exactly equal to the proportion of polled voters that say that they will vote for <img src='http://s0.wp.com/latex.php?latex=A&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='A' title='A' class='latex' />).  Under the above idealised conditions, if the number <img src='http://s0.wp.com/latex.php?latex=n&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='n' title='n' class='latex' /> of polled voters is 1,000, and the size of the population <img src='http://s0.wp.com/latex.php?latex=X&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='X' title='X' class='latex' /> is 200 million, then the margin of error is about 3%, thus <img src='http://s0.wp.com/latex.php?latex=%7B%5CBbb+P%7D%28+%5Coverline%7Bp%7D-0.03+%5Cleq+p+%5Cleq+%5Coverline%7Bp%7D+%2B+0.03+%29+%5Cgeq+0.95&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;Bbb P}( &#92;overline{p}-0.03 &#92;leq p &#92;leq &#92;overline{p} + 0.03 ) &#92;geq 0.95' title='{&#92;Bbb P}( &#92;overline{p}-0.03 &#92;leq p &#92;leq &#92;overline{p} + 0.03 ) &#92;geq 0.95' class='latex' />.  (See this <a href="http://www.americanresearchgroup.com/moe.html">margin of error calculator</a> for what happens with different choices of parameters.)</p>
<p>There is an important subtlety here: it is only the <em>unconditional</em> probability of the event <img src='http://s0.wp.com/latex.php?latex=%5Coverline%7Bp%7D-0.03+%5Cleq+p+%5Cleq+%5Coverline%7Bp%7D+%2B+0.03&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;overline{p}-0.03 &#92;leq p &#92;leq &#92;overline{p} + 0.03' title='&#92;overline{p}-0.03 &#92;leq p &#92;leq &#92;overline{p} + 0.03' class='latex' /> that is guaranteed to be greater than 0.95.  If one has additional <a href="http://en.wikipedia.org/wiki/Prior_probability">prior</a> information about <img src='http://s0.wp.com/latex.php?latex=p&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p' title='p' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=%5Coverline%7Bp%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;overline{p}' title='&#92;overline{p}' class='latex' />, then the <a href="http://en.wikipedia.org/wiki/Conditional_probability"><em>conditional</em> probability</a> of this event, relative to this information, may be very different.  For instance, if one had, prior to the poll, a very good reason to believe that <img src='http://s0.wp.com/latex.php?latex=p&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p' title='p' class='latex' /> is almost certainly between 0.4 and 0.6, and then the poll reports <img src='http://s0.wp.com/latex.php?latex=%5Coverline%7Bp%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;overline{p}' title='&#92;overline{p}' class='latex' /> to be 0.1, then the conditional probability that <img src='http://s0.wp.com/latex.php?latex=%5Coverline%7Bp%7D-0.03+%5Cleq+p+%5Cleq+%5Coverline%7Bp%7D%2B0.03&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;overline{p}-0.03 &#92;leq p &#92;leq &#92;overline{p}+0.03' title='&#92;overline{p}-0.03 &#92;leq p &#92;leq &#92;overline{p}+0.03' class='latex' /> occurs should be lower than the unconditional probability.  [Note though that having priori information just about <img src='http://s0.wp.com/latex.php?latex=p&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p' title='p' class='latex' />, and not <img src='http://s0.wp.com/latex.php?latex=%5Coverline%7Bp%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;overline{p}' title='&#92;overline{p}' class='latex' />, will not cause the probability to drop below 95%, as this bound on the confidence level is uniform in <img src='http://s0.wp.com/latex.php?latex=p&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p' title='p' class='latex' />.] The question of how to account for prior information is a very delicate one in <a href="http://en.wikipedia.org/wiki/Bayesian_probability">Bayesian probability</a>, and will not be discussed here.</p>
<p>One special case of the above point  is worth emphasising: the statement that <img src='http://s0.wp.com/latex.php?latex=%5Coverline%7Bp%7D-0.03+%5Cleq+p+%5Cleq+%5Coverline%7Bp%7D+%2B+0.03&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;overline{p}-0.03 &#92;leq p &#92;leq &#92;overline{p} + 0.03' title='&#92;overline{p}-0.03 &#92;leq p &#92;leq &#92;overline{p} + 0.03' class='latex' /> is true with at least 95% probability is only valid <em>before</em> one actually conducts the poll and finds out the value of <img src='http://s0.wp.com/latex.php?latex=%5Coverline%7Bp%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;overline{p}' title='&#92;overline{p}' class='latex' />.  Once <img src='http://s0.wp.com/latex.php?latex=%5Coverline%7Bp%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;overline{p}' title='&#92;overline{p}' class='latex' /> is computed, the statement <img src='http://s0.wp.com/latex.php?latex=%5Coverline%7Bp%7D-0.03+%5Cleq+p+%5Cleq+%5Coverline%7Bp%7D+%2B+0.03&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;overline{p}-0.03 &#92;leq p &#92;leq &#92;overline{p} + 0.03' title='&#92;overline{p}-0.03 &#92;leq p &#92;leq &#92;overline{p} + 0.03' class='latex' /> is either true or false, i.e. occurs with probability 1 or 0 (unless one takes a Bayesian approach, as mentioned above).  [This phenomenon of course occurs all the time in probability.  For instance, if x denotes the outcome of rolling a fair six-sided die, then before one performs this roll, the probability that x equals 1 will be 1/6, but after one has seen what the value of this die is, the probability that x equals 1 will be either 1 or 0.]</p>
<p style="text-align:center;">&#8211; Nobody asked for my opinion! &#8211;</p>
<p>One intuitive argument against a poll of small relative size being accurate goes something like this: a poll of just 1,000 people among a population of 200,000,000 is almost certainly not going to poll myself, or any of my friends or acquaintances.  If the opinions of myself, and everyone that I know, is not being considered at all in this poll, how could this poll possibly be accurate?</p>
<p>It is true that if you know, say, 5,000 voting-eligible people, then chances are that none of them (or maybe one of them, at best) will be contacted by the above poll.  However, even though the opinions of all these people are not being directly polled, there will be many other people with <em>equivalent</em> opinions that <em>will</em> be contacted by the poll.  Through those people, the views of yourself and your friends are being represented.  [This may seem like a very weak form of representation, but recall that you and your 5,000 friends and acquaintances still only represent 0.0025% of the total electorate.]</p>
<p>Now one may argue that no two voters are identical, and that each voter arrives at a decision of who to vote for their own unique reasons.  True enough &#8211; but recall that this poll is asking only a <em>simple</em> question: whether one is going to vote for A or not.  Once one narrowly focuses on this question alone, any two voters who both decide to vote for A, or to not vote for A, are considered equivalent, even if they arrive at this decision for totally different reasons.  So, for the purposes of this poll, there are only two types of voters in the world &#8211; A-voters, and not-A-voters -  with all voters in one of these two types considered equivalent.   In particular, any given voter is going to have millions of other equivalent voters distributed throughout the population <img src='http://s0.wp.com/latex.php?latex=X&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='X' title='X' class='latex' />, and a representative fraction of those equivalent voters is likely to be picked up by the poll.</p>
<p>As mentioned before, polls which offer complex questions (for instance, trying to discern the motivation behind one&#8217;s voting choices) will inherently be less accurate; there are now fewer equivalent voters for each individual, and it is harder for a poll to pick up each equivalence class in a representative manner.  (In particular, the more questions that are asked, the more likely it becomes that the responses to at least one of these questions will be inaccurate by an amount exceeding its margin of error.  This provides a limit as to how much information one can confidently extract from <a href="http://en.wikipedia.org/wiki/Data_mining">data mining</a> any given data set.)</p>
<p style="text-align:center;">&#8211; Is there enough information? &#8211;</p>
<p>Another common objection to the accuracy of polls argues that there is not enough information (or &#8220;degrees of freedom&#8221;) present in the poll sample to accurately describe the much larger amount of data present in the full population; 1,000 bits of data cannot possibly contain 200,000,000 bits of information.  However, we are not asking to find out so much information; the purpose of the poll is to estimate just a <em>single</em> piece of information, namely the number <img src='http://s0.wp.com/latex.php?latex=p&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p' title='p' class='latex' />.  If one is willing to accept an error of up to 3%, then one can represent this piece of information in about five bits rather than 200,000,000.  So, in principle at least, there is more than enough information present in the poll to recover this information; one does not need to sample the entire population to get a good reading.  (The same general philosophy underlies <a href="http://terrytao.wordpress.com/2007/04/13/compressed-sensing-and-single-pixel-cameras/">compressed sensing</a>, but that&#8217;s another story.)</p>
<p>As before, the accuracy degrades as one asks more and more complicated questions.  For instance, if one were to poll 1,000 voters for their opinions on two unrelated questions A and B, each of the answers to A and B would be accurate to within 3% with probability 95%, but the probability that the answers to A and B were simultaneously accurate to within 3% would be lower (around 90% or so), and so any data analysis that relies on the responses to both A and B may not have as high a confidence level as data analysis that relies on A and B separately.  This is consistent with the information-theoretic perspective: we are demanding more and more bits of information on our population, and it is harder for our fixed data set to supply so much information accurately and confidently.</p>
<p style="text-align:center;">&#8211; Swings &#8211;</p>
<p>One intuitive way to gauge the margin of error of a poll is to see how likely such a poll is to accurately detect a swing in the electorate.  Suppose for instance that over the course of a given time period (e.g. a week), 7% of the voters switch their vote from not-A to A, while another 2% of the voters switch their vote from A to not-A, leading to a net increase of 5% in the proportion <img src='http://s0.wp.com/latex.php?latex=p&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p' title='p' class='latex' /> of voters voting for A. How does would this swing in the vote affect the proportion <img src='http://s0.wp.com/latex.php?latex=%5Coverline%7Bp%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;overline{p}' title='&#92;overline{p}' class='latex' /> of the voters being polled, if one imagines the same voters being polled at both the start of the week and at the end of the week?</p>
<p>If the poll was conducted by simple random sampling, then each of the 1,000 voters polled would have a 7% probability of switching from not-A to A, and and a 2% probability of switching from A to not-A.  Thus, one would expect about 70 of the 1,000 voters polled to switch to A, and about 20 to switch to not-A, leading to a net swing of 50 voters, that would increase <img src='http://s0.wp.com/latex.php?latex=%5Coverline%7Bp%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;overline{p}' title='&#92;overline{p}' class='latex' /> by 5%, thus matching the increase in p.  Now, in practice, there will be some variability here; due to the luck of the draw, the poll may pick up more or less than 70 of the voters switching to A, and more or less than 20 of the voters switching to not-A.  But having 1,000 voters to sample is just about large enough for the <a href="http://en.wikipedia.org/wiki/Law_of_large_numbers">law of large numbers</a> to kick in and ensure that the number of voters switching to A picked up by the poll will be significantly larger than the number of voters switching to not-A.  Thus, this poll will have a good chance of detecting a swing of size 5% or more, which is consistent with the assertion of a margin of error of about 3%. [In appealing to the law of large numbers, we are implicitly exploiting the uniformity and independence assumptions in Hypothesis 5.]</p>
<p>It is worth noting that this swing of 5% in an electorate of 200,000,000 voters represents quite a large shift in absolute terms: fourteen million voters switching to A and four million switching away from A.  Quite a few of these shifting voters will be picked up by the poll (in contrast to one&#8217;s sphere of friends and acquaintances, which is likely to be missed completely).</p>
<p style="text-align:center;">&#8211; Irregularity &#8211;</p>
<p>Another intuitive objection to polling accuracy is that the voting population is far from homogeneous.  For instance, it is clear that voting preferences for the U.S. presidential election vary widely among the 50 states &#8211; shouldn&#8217;t one need to multiply the poll size by 50 just to accomodate this fact?  Similarly for distinctions in voting patterns based on gender, race, party affiliation, etc.</p>
<p>Again, these irregularities in voter distribution do not affect the final accuracy of the poll, for two reasons.  Firstly, we are asking only the simple question of whether a voter votes for A or not-A, and are not breaking down the answers to this question by state, gender, race, or any other factor; as stated before, two voters are considered equivalent as long as they have the same preference for A, even if they are in different states, have different genders, etc.  Secondly, while it is conceivable that the poll will cluster its sample in one particular state (or one particular gender, etc.), thus potentially skewing the poll, the fact that the voters are selected uniformly <em>and</em> independently of each other prevents this from happening very often.  (And in any event, clustering in a demographic or geographic category is not what is of direct importance to the accuracy of the poll; the only thing that really matters in the end is whether there is clustering in the category of A-voters or not-A-voters.)  The independence hypothesis is rather important.  If for instance one were to poll by picking one particular location in the U.S. at random, and polling 1,000 people from that location, then the responses would be highly correlated (as one could have picked a location which happens to highly favour A, or highly favour not-A) and would have a much larger margin of error than if one polled 1,000 people at random across the U.S..</p>
<p>[Incidentally, in the specific case of the U.S. presidential election, statewide polls are in fact more relevant to the outcome of the election than nationwide polls, due to the mechanics of the <a href="http://en.wikipedia.org/wiki/Electoral_College_(United_States)">U.S. Electoral College</a>, but this does not detract from the above points.]</p>
<p style="text-align:center;">&#8211; Analogies &#8211;</p>
<p>Some analogies may help explain why the relative size of a sample is largely irrelevant to the accuracy of a poll.</p>
<p>Suppose one is in front of a large body of water (e.g. a sea or ocean), and wants to determine whether it is a freshwater or saltwater body.  This can be done very easily: dip one&#8217;s finger into the body of water and taste a single drop.  This gives an extremely accurate result, even though the relative proportion of the sample size to the population size is, literally, a drop in the ocean; the quintillions of water molecues and salt molecues present in that drop are more than sufficient to give a good reading of the salinity of the water body.</p>
<p>[To be fair, in order for this reading to be accurate, one needs to assume that the salinity is uniformly distributed across the body of water; if for instance the body happened to be nearly fresh on one side and much saltier on the other, then dipping one's finger in just one of these two sides would lead to an inaccurate measurement of average salinity.  But if one were to stir the body of water vigorously, this irregularity of distribution disappears.  The procedure of taking a random sample, with each sample point being independent of all the others, is analogous to this stirring procedure.]</p>
<p>Another analogy comes from digital imaging.  As we all know, a digital camera takes a picture of a real-world object (e.g. a human face) and converts it into an array of pixels; an image with a larger number of pixels will generally lead to a more accurate image than one with fewer.  But even with just a handful of pixels, say 1,000 pixels, one is already able to make crude distinctions between different images, for instance to distinguish a light-skinned face from a dark-skinned face (despite the fact that skin colour is determined by millions of cells and quintillions of pigment molecues).  See for instance this well-known (and <em>very</em> low resolution) image of a US president, by <a href="http://en.wikipedia.org/wiki/Leon_Harmon">Leon Harmon</a>:</p>
<p style="text-align:center;"><a href="http://terrytao.files.wordpress.com/2008/10/lincoln.jpg"><img class="alignnone size-full wp-image-780" title="lincoln" src="http://terrytao.files.wordpress.com/2008/10/lincoln.jpg?w=111&#038;h=167" alt="" width="111" height="167" /></a></p>
<p style="text-align:center;">&#8211; Appendix: Mathematical justification &#8211;</p>
<p>One can compute the margin of error for this simple sampling problem very precisely using the <a href="http://en.wikipedia.org/wiki/Binomial_distribution">binomial distribution</a>; however I would like to present here a cruder but more robust estimate, based on the <a href="http://en.wikipedia.org/wiki/Second_moment_method">second moment method</a>, that works in much greater generality than the setting discussed here.  (It is closely related to the arguments in my <a class="snap_noshots" href="http://terrytao.wordpress.com/2008/06/18/the-strong-law-of-large-numbers/">previous post on the law of large numbers</a>.)  The main mathematical result we need is</p>
<blockquote><p><strong>Theorem.</strong> Let X be a finite set, let A be a subset of X, and let <img src='http://s0.wp.com/latex.php?latex=p+%3A%3D+%26%23124%3BA%26%23124%3B%2F%26%23124%3BX%26%23124%3B&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p := &#124;A&#124;/&#124;X&#124;' title='p := &#124;A&#124;/&#124;X&#124;' class='latex' /> be the proportion of elements of X that lie in A.  Let <img src='http://s0.wp.com/latex.php?latex=x_1%2C+%5Cldots%2C+x_n&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_1, &#92;ldots, x_n' title='x_1, &#92;ldots, x_n' class='latex' /> be sampled <a href="http://en.wikipedia.org/wiki/Independent_and_identically-distributed_random_variables">independently and uniformly</a> at random from X (in particular, we allow repetitions).  Let <img src='http://s0.wp.com/latex.php?latex=%5Coverline%7Bp%7D+%3A%3D+%26%23124%3B%5C%7B1+%5Cleq+i+%5Cleq+n%3A+x_i+%5Cin+A+%5C%7D%26%23124%3B%2Fn&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;overline{p} := &#124;&#92;{1 &#92;leq i &#92;leq n: x_i &#92;in A &#92;}&#124;/n' title='&#92;overline{p} := &#124;&#92;{1 &#92;leq i &#92;leq n: x_i &#92;in A &#92;}&#124;/n' class='latex' /> be the proportion of the <img src='http://s0.wp.com/latex.php?latex=x_1%2C%5Cldots%2Cx_n&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_1,&#92;ldots,x_n' title='x_1,&#92;ldots,x_n' class='latex' /> (counting repetition) that lie in A.  Then for any <img src='http://s0.wp.com/latex.php?latex=r+%26%2362%3B+0&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='r &gt; 0' title='r &gt; 0' class='latex' />, one has</p>
<p style="text-align:center;"><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle+%7B%5CBbb+P%7D%28+%26%23124%3B%5Coverline%7Bp%7D-p%26%23124%3B+%5Cleq+r+%29+%5Cgeq+1+-+%5Cfrac%7B1%7D%7B4+n+r%5E2%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;displaystyle {&#92;Bbb P}( &#124;&#92;overline{p}-p&#124; &#92;leq r ) &#92;geq 1 - &#92;frac{1}{4 n r^2}' title='&#92;displaystyle {&#92;Bbb P}( &#124;&#92;overline{p}-p&#124; &#92;leq r ) &#92;geq 1 - &#92;frac{1}{4 n r^2}' class='latex' />. (1)</p>
</blockquote>
<p><strong>Proof.</strong> We use the second moment method.  For each <img src='http://s0.wp.com/latex.php?latex=1+%5Cleq+i+%5Cleq+n&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='1 &#92;leq i &#92;leq n' title='1 &#92;leq i &#92;leq n' class='latex' />, let <img src='http://s0.wp.com/latex.php?latex=I_i&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='I_i' title='I_i' class='latex' /> be the <a href="http://en.wikipedia.org/wiki/Indicator_function">indicator</a> of the event <img src='http://s0.wp.com/latex.php?latex=x_i+%5Cin+A&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_i &#92;in A' title='x_i &#92;in A' class='latex' />, thus <img src='http://s0.wp.com/latex.php?latex=I_i+%3A%3D+1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='I_i := 1' title='I_i := 1' class='latex' /> when <img src='http://s0.wp.com/latex.php?latex=x_i+%5Cin+A&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_i &#92;in A' title='x_i &#92;in A' class='latex' /> and <img src='http://s0.wp.com/latex.php?latex=I_i+%3D+0&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='I_i = 0' title='I_i = 0' class='latex' /> otherwise.  Observe that each <img src='http://s0.wp.com/latex.php?latex=I_i&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='I_i' title='I_i' class='latex' /> has a probability of p of equaling 1, thus</p>
<p style="text-align:center;"><img src='http://s0.wp.com/latex.php?latex=p+%3D+%7B%5CBbb+E%7D+I_i.&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p = {&#92;Bbb E} I_i.' title='p = {&#92;Bbb E} I_i.' class='latex' /></p>
<p>On the other hand, we have</p>
<p style="text-align:center;"><img src='http://s0.wp.com/latex.php?latex=%5Coverline%7Bp%7D+%3D+%5Cfrac%7B1%7D%7Bn%7D+%5Csum_%7Bi%3D1%7D%5En+I_i&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;overline{p} = &#92;frac{1}{n} &#92;sum_{i=1}^n I_i' title='&#92;overline{p} = &#92;frac{1}{n} &#92;sum_{i=1}^n I_i' class='latex' />.</p>
<p>Thus</p>
<p style="text-align:center;"><img src='http://s0.wp.com/latex.php?latex=%5Coverline%7Bp%7D-p+%3D+%5Cfrac%7B1%7D%7Bn%7D+%5Csum_%7Bi%3D1%7D%5En+I_i+-+%7B%5CBbb+E%7D%28I_i%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;overline{p}-p = &#92;frac{1}{n} &#92;sum_{i=1}^n I_i - {&#92;Bbb E}(I_i)' title='&#92;overline{p}-p = &#92;frac{1}{n} &#92;sum_{i=1}^n I_i - {&#92;Bbb E}(I_i)' class='latex' />;</p>
<p>squaring this and taking expectations, we obtain</p>
<p style="text-align:center;"><img src='http://s0.wp.com/latex.php?latex=%7B%5CBbb+E%7D+%26%23124%3B%5Coverline%7Bp%7D-p%26%23124%3B%5E2+%3D+%5Cfrac%7B1%7D%7Bn%5E2%7D+%5Csum_%7Bi%3D1%7D%5En+%7B%5Cbf+Var%7D%28I_i%29+%2B+%5Cfrac%7B2%7D%7Bn%7D+%5Csum_%7B1+%5Cleq+i+%26%2360%3B+j+%5Cleq+n%7D+%7B%5Cbf+Cov%7D%28I_i%2CI_j%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;Bbb E} &#124;&#92;overline{p}-p&#124;^2 = &#92;frac{1}{n^2} &#92;sum_{i=1}^n {&#92;bf Var}(I_i) + &#92;frac{2}{n} &#92;sum_{1 &#92;leq i &lt; j &#92;leq n} {&#92;bf Cov}(I_i,I_j)' title='{&#92;Bbb E} &#124;&#92;overline{p}-p&#124;^2 = &#92;frac{1}{n^2} &#92;sum_{i=1}^n {&#92;bf Var}(I_i) + &#92;frac{2}{n} &#92;sum_{1 &#92;leq i &lt; j &#92;leq n} {&#92;bf Cov}(I_i,I_j)' class='latex' /></p>
<p>where <img src='http://s0.wp.com/latex.php?latex=%7B%5Cbf+Var%7D%28I_i%29+%3A%3D+%7B%5CBbb+E%7D+%28I_i-%7B%5CBbb+E%7D+I_i%29%5E2&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;bf Var}(I_i) := {&#92;Bbb E} (I_i-{&#92;Bbb E} I_i)^2' title='{&#92;bf Var}(I_i) := {&#92;Bbb E} (I_i-{&#92;Bbb E} I_i)^2' class='latex' /> is <a href="http://en.wikipedia.org/wiki/Variance">variance</a> of <img src='http://s0.wp.com/latex.php?latex=I_i&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='I_i' title='I_i' class='latex' />, and <img src='http://s0.wp.com/latex.php?latex=%7B%5Cbf+Cov%7D%28I_i%2CI_j%29+%3A%3D+%7B%5CBbb+E%7D%28+%28I_i-p%29+%28I_j-p%29%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;bf Cov}(I_i,I_j) := {&#92;Bbb E}( (I_i-p) (I_j-p))' title='{&#92;bf Cov}(I_i,I_j) := {&#92;Bbb E}( (I_i-p) (I_j-p))' class='latex' /> is the <a href="http://en.wikipedia.org/wiki/Covariance">covariance</a> of <img src='http://s0.wp.com/latex.php?latex=I_i%2C+I_j&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='I_i, I_j' title='I_i, I_j' class='latex' />.</p>
<p>By assumption, the random variable <img src='http://s0.wp.com/latex.php?latex=I_i%2C+I_j&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='I_i, I_j' title='I_i, I_j' class='latex' /> for <img src='http://s0.wp.com/latex.php?latex=i+%5Cneq+j&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='i &#92;neq j' title='i &#92;neq j' class='latex' /> are <a href="http://en.wikipedia.org/wiki/Statistical_independence">independent</a>, and so the covariances <img src='http://s0.wp.com/latex.php?latex=%7B%5Cbf+Cov%7D%28I_i%2C+I_j%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;bf Cov}(I_i, I_j)' title='{&#92;bf Cov}(I_i, I_j)' class='latex' /> vanish.  On the other hand, a direct computation shows that</p>
<p style="text-align:center;"><img src='http://s0.wp.com/latex.php?latex=%7B%5Cbf+Var%7D%28I_i%29+%3D+p+-+p%5E2+%3D+%5Cfrac%7B1%7D%7B4%7D+-+%28p-%5Cfrac%7B1%7D%7B2%7D%29%5E2+%5Cleq+%5Cfrac%7B1%7D%7B4%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;bf Var}(I_i) = p - p^2 = &#92;frac{1}{4} - (p-&#92;frac{1}{2})^2 &#92;leq &#92;frac{1}{4}' title='{&#92;bf Var}(I_i) = p - p^2 = &#92;frac{1}{4} - (p-&#92;frac{1}{2})^2 &#92;leq &#92;frac{1}{4}' class='latex' /></p>
<p>for each i.  Putting all this together we conclude that</p>
<p style="text-align:center;"><img src='http://s0.wp.com/latex.php?latex=%7B%5CBbb+E%7D+%26%23124%3B%5Coverline%7Bp%7D-p%26%23124%3B%5E2+%5Cleq+%5Cfrac%7B1%7D%7B4n%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;Bbb E} &#124;&#92;overline{p}-p&#124;^2 &#92;leq &#92;frac{1}{4n}' title='{&#92;Bbb E} &#124;&#92;overline{p}-p&#124;^2 &#92;leq &#92;frac{1}{4n}' class='latex' /></p>
<p>and the claim (1) follows from <a href="http://en.wikipedia.org/wiki/Markov%27s_inequality">Markov&#8217;s inequality</a>. <img src='http://s0.wp.com/latex.php?latex=%5CBox&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;Box' title='&#92;Box' class='latex' /></p>
<p>Applying this theorem with n=1000 and <img src='http://s0.wp.com/latex.php?latex=r%3D1%2F%5Csqrt%7B200%7D+%5Capprox+0.07&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='r=1/&#92;sqrt{200} &#92;approx 0.07' title='r=1/&#92;sqrt{200} &#92;approx 0.07' class='latex' />, we conclude that p and <img src='http://s0.wp.com/latex.php?latex=%5Coverline%7Bp%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;overline{p}' title='&#92;overline{p}' class='latex' /> lie within about 7% of each other with probability at least 95%, regardless of how large the population X is.  In the context of an election poll, this means that if one samples 1000 voters independently at random (with replacement) whether they would vote for A, the margin of error for the answer would be at most 7% at the 95% confidence level.</p>
<p><strong>Remark 1.</strong> Observe that the proof of the above theorem did not really need the <img src='http://s0.wp.com/latex.php?latex=x_i&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_i' title='x_i' class='latex' /> to be fully independent of each other; the key thing was that each <img src='http://s0.wp.com/latex.php?latex=x_i&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_i' title='x_i' class='latex' /> was close to uniformly distributed, and that the covariances between the indicators <img src='http://s0.wp.com/latex.php?latex=I_i%2C+I_j&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='I_i, I_j' title='I_i, I_j' class='latex' /> were small.  (Thus one only needs <a href="http://en.wikipedia.org/wiki/Pairwise_independence">pairwise independence</a> rather than joint independence for the theorem to hold.)  Because of this, one can also obtain variants of the above theorem when one selects <img src='http://s0.wp.com/latex.php?latex=x_1%2C%5Cldots%2Cx_n&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_1,&#92;ldots,x_n' title='x_1,&#92;ldots,x_n' class='latex' /> for random sampling without replacement (known as <a href="http://en.wikipedia.org/wiki/Simple_random_sample">simple random sampling</a>); now there is a slight correlation between <img src='http://s0.wp.com/latex.php?latex=I_i%2C+I_j&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='I_i, I_j' title='I_i, I_j' class='latex' />, but it turns out to be negligible when X is large, for instance when n=1000 and <img src='http://s0.wp.com/latex.php?latex=%26%23124%3BX%26%23124%3B+%5Csim+10%5E8&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#124;X&#124; &#92;sim 10^8' title='&#124;X&#124; &#92;sim 10^8' class='latex' />.  (For this range of parameters, there is a non-trivial probability of a <a href="http://en.wikipedia.org/wiki/Birthday_paradox">birthday paradox</a> occurring, so the two sampling methods are genuinely different from each other; but they turn out to have almost the same margin of error anyway.) <img src='http://s0.wp.com/latex.php?latex=%5Cdiamond&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;diamond' title='&#92;diamond' class='latex' /></p>
<p><strong>Remark 2.</strong> If one assumes joint independence instead of pairwise independence, one can obtain slightly sharper inequalities than (1) (e.g. by using the <a href="http://en.wikipedia.org/wiki/Chernoff_inequality">Chernoff inequality</a>), but at the 95% confidence level, this gives a relatively modest improvement only in the margin of error (in our specific example, the optimal margin of error is about 3% rather than 7%).  <img src='http://s0.wp.com/latex.php?latex=%5Cdiamond&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;diamond' title='&#92;diamond' class='latex' /></p>
<p><strong>Remark 3.</strong> An inspection of the argument shows that if p is known to be very small or very large, then the margin of error is better than what (1) predicts.  (In the most extreme case, if p=0 or p=1, then it is easy to see that the margin of error is zero.)  But in the case of election polls, p is generally expected to be close to 1/2, and so one does not expect to be able to improve the margin of error much from this effect. And in any case, we don&#8217;t know the value of p exactly in practice (otherwise why would we be doing the poll in the first place?).  <img src='http://s0.wp.com/latex.php?latex=%5Cdiamond&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;diamond' title='&#92;diamond' class='latex' /></p>
<p><strong>Remark 4.</strong> In real world situations, it can be difficult or impractical to get the <img src='http://s0.wp.com/latex.php?latex=x_i&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='x_i' title='x_i' class='latex' /> to be close to uniformly distributed (because of <a href="http://en.wikipedia.org/wiki/Sampling_bias">sampling bias</a>), and to keep the correlations low (because of effects such as <a href="http://en.wikipedia.org/wiki/Cluster_sampling">clustering</a>).  Because of this, one often needs to perform a more complicated sampling procedure than simple random sampling, which requires more sophisticated statistical analysis than given by the above theorem.  This is beyond the scope of this post, though. <img src='http://s0.wp.com/latex.php?latex=%5Cdiamond&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;diamond' title='&#92;diamond' class='latex' /></p>
<p>[<em>Updated</em>, October 13: added emphasis that the confidence level only applies before one performs the poll, not afterwards.]</p>
<p>[<em>Updated</em>, October 17: Minor corrections; thanks to Tom Verhoeff for pointing them out.]</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[The Dantzig selector: Statistical estimation when p is much larger than n]]></title>
<link>http://terrytao.wordpress.com/2008/03/22/the-dantzig-selector-statistical-estimation-when-p-is-much-larger-than-n/</link>
<pubDate>Sat, 22 Mar 2008 19:20:47 +0000</pubDate>
<dc:creator>Terence Tao</dc:creator>
<guid>http://terrytao.wordpress.com/2008/03/22/the-dantzig-selector-statistical-estimation-when-p-is-much-larger-than-n/</guid>
<description><![CDATA[Over two years ago, Emmanuel Candés and I submitted the paper &#8220;The Dantzig selector: Statistic]]></description>
<content:encoded><![CDATA[<p>Over two years ago, <a class="snap_noshots" href="http://www.acm.caltech.edu/~emmanuel/">Emmanuel Candés</a> and I submitted the paper &#8220;<a class="snap_noshots" href="http://arxiv.org/abs/math.ST/0506081">The Dantzig selector: Statistical estimation when <img src='http://s0.wp.com/latex.php?latex=p&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p' title='p' class='latex' /> is much<br />
larger than <img src='http://s0.wp.com/latex.php?latex=n&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='n' title='n' class='latex' /></a>&#8221; to the <a class="snap_noshots" href="http://www.imstat.org/aos/">Annals of Statistics</a>.  This paper, which <a href="http://projecteuclid.org/euclid.aos/1201012958">appeared last year</a>, proposed a new type of selector (which we called the <em>Dantzig selector</em>, due to its reliance on the linear programming methods to which <a href="http://en.wikipedia.org/wiki/George_Dantzig">George Dantzig</a>, who had died as we were finishing our paper, had contributed so much to) for <a href="http://en.wikipedia.org/wiki/Estimation_theory">statistical estimation</a>, in the case when the number <img src='http://s0.wp.com/latex.php?latex=p&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='p' title='p' class='latex' /> of unknown parameters is much larger than the number <img src='http://s0.wp.com/latex.php?latex=n&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='n' title='n' class='latex' /> of observations.  More precisely, we considered the problem of obtaining a reasonable estimate <img src='http://s0.wp.com/latex.php?latex=%5Cbeta%5E%2A&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;beta^*' title='&#92;beta^*' class='latex' /> for an unknown vector <img src='http://s0.wp.com/latex.php?latex=%5Cbeta+%5Cin+%7B%5CBbb+R%7D%5Ep&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;beta &#92;in {&#92;Bbb R}^p' title='&#92;beta &#92;in {&#92;Bbb R}^p' class='latex' /> of parameters given a vector <img src='http://s0.wp.com/latex.php?latex=y+%3D+X+%5Cbeta+%2B+z+%5Cin+%7B%5CBbb+R%7D%5En&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='y = X &#92;beta + z &#92;in {&#92;Bbb R}^n' title='y = X &#92;beta + z &#92;in {&#92;Bbb R}^n' class='latex' /> of measurements, where <img src='http://s0.wp.com/latex.php?latex=X&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='X' title='X' class='latex' /> is a known <img src='http://s0.wp.com/latex.php?latex=n+%5Ctimes+p&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='n &#92;times p' title='n &#92;times p' class='latex' /> predictor matrix and <img src='http://s0.wp.com/latex.php?latex=z&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='z' title='z' class='latex' /> is a (Gaussian) noise error with some variance <img src='http://s0.wp.com/latex.php?latex=%5Csigma%5E2&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;sigma^2' title='&#92;sigma^2' class='latex' />.    We assumed that the predictor matrix X obeyed the <em>restricted isometry property</em> (RIP, also known as UUP), which roughly speaking asserts that <img src='http://s0.wp.com/latex.php?latex=X%5Cbeta&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='X&#92;beta' title='X&#92;beta' class='latex' /> has norm comparable to <img src='http://s0.wp.com/latex.php?latex=%5Cbeta&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;beta' title='&#92;beta' class='latex' /> whenever the vector <img src='http://s0.wp.com/latex.php?latex=%5Cbeta&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;beta' title='&#92;beta' class='latex' /> is sparse.  This RIP property is known to hold for various ensembles of random matrices of interest; see my <a class="snap_noshots" href="http://terrytao.wordpress.com/2007/07/02/open-question-deterministic-uup-matrices/">earlier blog post on this topic</a>.</p>
<p>Our selection algorithm, inspired by our previous work on compressed sensing, chooses the estimated parameters <img src='http://s0.wp.com/latex.php?latex=%5Cbeta%5E%2A&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;beta^*' title='&#92;beta^*' class='latex' /> to have minimal <img src='http://s0.wp.com/latex.php?latex=l%5E1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='l^1' title='l^1' class='latex' /> norm amongst all vectors which are consistent with the data in the sense that the residual vector <img src='http://s0.wp.com/latex.php?latex=r+%3A%3D+y+-+X+%5Cbeta%5E%2A&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='r := y - X &#92;beta^*' title='r := y - X &#92;beta^*' class='latex' /> obeys the condition</p>
<p align="center"><img src='http://s0.wp.com/latex.php?latex=%5C%26%23124%3B+X%5E%2A+r+%5C%26%23124%3B_%5Cinfty+%5Cleq+%5Clambda&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;&#124; X^* r &#92;&#124;_&#92;infty &#92;leq &#92;lambda' title='&#92;&#124; X^* r &#92;&#124;_&#92;infty &#92;leq &#92;lambda' class='latex' />, where <img src='http://s0.wp.com/latex.php?latex=%5Clambda+%3A%3D+C+%5Csqrt%7B%5Clog+p%7D+%5Csigma&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;lambda := C &#92;sqrt{&#92;log p} &#92;sigma' title='&#92;lambda := C &#92;sqrt{&#92;log p} &#92;sigma' class='latex' /> (1)</p>
<p>(one can check that such a condition is obeyed with high probability in the case that <img src='http://s0.wp.com/latex.php?latex=%5Cbeta%5E%2A+%3D+%5Cbeta&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;beta^* = &#92;beta' title='&#92;beta^* = &#92;beta' class='latex' />, thus the true vector of parameters is <em>feasible</em> for this selection algorithm).  This selector is similar, though not identical, to the more well-studied <em>lasso selector</em> in the literature, which minimises the <img src='http://s0.wp.com/latex.php?latex=l%5E1&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='l^1' title='l^1' class='latex' /> norm of <img src='http://s0.wp.com/latex.php?latex=%5Cbeta%5E%2A&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;beta^*' title='&#92;beta^*' class='latex' /> penalised by the <img src='http://s0.wp.com/latex.php?latex=l%5E2&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='l^2' title='l^2' class='latex' /> norm of the residual.</p>
<p>A simple model case arises when n=p and X is the identity matrix, thus the observations are given by a simple additive noise model <img src='http://s0.wp.com/latex.php?latex=y_i+%3D+%5Cbeta_i+%2B+z_i&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='y_i = &#92;beta_i + z_i' title='y_i = &#92;beta_i + z_i' class='latex' />.  In this case, the Dantzig selector <img src='http://s0.wp.com/latex.php?latex=%5Cbeta%5E%2A&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;beta^*' title='&#92;beta^*' class='latex' /> is given by the <span style="text-decoration:line-through;">hard</span> soft thresholding formula</p>
<p align="center"><img src='http://s0.wp.com/latex.php?latex=%5Cbeta%5E%2A_i+%3D+%5Cmax%28%26%23124%3By_i%26%23124%3B+-+%5Clambda%2C+0+%29++%5Chbox%7Bsgn%7D%28y_i%29.&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;beta^*_i = &#92;max(&#124;y_i&#124; - &#92;lambda, 0 )  &#92;hbox{sgn}(y_i).' title='&#92;beta^*_i = &#92;max(&#124;y_i&#124; - &#92;lambda, 0 )  &#92;hbox{sgn}(y_i).' class='latex' /></p>
<p>The <em>mean square error</em> <img src='http://s0.wp.com/latex.php?latex=%7B%5CBbb+E%7D%28+%5C%26%23124%3B+%5Cbeta+-+%5Cbeta%5E%2A+%5C%26%23124%3B%5E2+%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;Bbb E}( &#92;&#124; &#92;beta - &#92;beta^* &#92;&#124;^2 )' title='{&#92;Bbb E}( &#92;&#124; &#92;beta - &#92;beta^* &#92;&#124;^2 )' class='latex' /> for this selector can be computed to be roughly</p>
<p align="center"><img src='http://s0.wp.com/latex.php?latex=%5Clambda%5E2+%2B+%5Csum_%7Bi%3D1%7D%5En++%5Cmin%28+%26%23124%3By_i%26%23124%3B%5E2%2C+%5Clambda%5E2%29&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;lambda^2 + &#92;sum_{i=1}^n  &#92;min( &#124;y_i&#124;^2, &#92;lambda^2)' title='&#92;lambda^2 + &#92;sum_{i=1}^n  &#92;min( &#124;y_i&#124;^2, &#92;lambda^2)' class='latex' /> (2)</p>
<p>and one can show that this is basically best possible (except for constants and logarithmic factors) amongst all selectors in this model.  More generally, the main result of our paper was that under the assumption that the predictor matrix obeys the RIP, the mean square error of the Dantzig selector is essentially equal to (2) and thus close to best possible.</p>
<p>After accepting our paper, the Annals of Statistics took the (somewhat uncommon) step of soliciting responses to the paper from various experts in the field, and then soliciting a rejoinder to these responses from Emmanuel and I.  Recently, the Annals posted these responses and rejoinder on the <a class="snap_noshots" href="http://arxiv.org/">arXiv</a>:</p>
<p><!--more--></p>
<ol>
<li> <a class="snap_noshots" href="http://arxiv.org/abs/0803.3124">Bickel</a> compared these results with recent results on the lasso selector by <a class="snap_noshots" href="http://www.ams.org/mathscinet-getitem?mr=2351101">Bunea-Tsybakov-Wegcamp</a> and <a class="snap_noshots" href="http://www.stats.ox.ac.uk/~meinshau/lasso_recovery.pdf">Meinshausen-Yu</a>, commented on the naturality of the constraint (1), raised the issue of <a href="http://en.wikipedia.org/wiki/Multicollinearity">collinearity</a> (which is precluded by the RIP hypothesis, but can of course occur in practice), and proposed the use of tools such as <a href="http://en.wikipedia.org/wiki/Cross-validation">cross-validation</a> to estimate the quantity <img src='http://s0.wp.com/latex.php?latex=%5Clambda&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;lambda' title='&#92;lambda' class='latex' /> appearing in the constraint (1).</li>
<li><a class="snap_noshots" href="http://arxiv.org/abs/0803.3126">Efron,  Hastie, and Tibshirani</a> performed numerics to compare the accuracy of the Dantzig selector and the lasso selector, the performance was broadly rather similar, but the Dantzig selector appeared to have some artefacts arising from the constraint (1).</li>
<li><a class="snap_noshots" href="http://arxiv.org/abs/0803.3127">Cai and Lv</a> asked whether the <img src='http://s0.wp.com/latex.php?latex=%5Csqrt%7B%5Clog+p%7D&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;sqrt{&#92;log p}' title='&#92;sqrt{&#92;log p}' class='latex' /> factor in (1) was too conservative in the case when p was extremely large compared to n, leading to an overly sparse model <img src='http://s0.wp.com/latex.php?latex=%5Cbeta%5E%2A&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;beta^*' title='&#92;beta^*' class='latex' />; similarly, they raised the question of whether the logarithmic losses in (2) were sharp.  Concerns were also raised about the verifiability of the RIP condition in this case (which is also the case in which collinearity becomes likely).  There were also some issues raised concerning speed and robustness of the implementation of the Dantzig selector.</li>
<li><a class="snap_noshots" href="http://arxiv.org/abs/0803.3130">Ritov</a> raised a more philosophical point, as to whether the prediction error (which is essentially <img src='http://s0.wp.com/latex.php?latex=%7B%5CBbb+E%7D+%5C%26%23124%3B+X+%5Cbeta+-+X+%5Cbeta%5E%2A+%5C%26%23124%3B%5E2&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;Bbb E} &#92;&#124; X &#92;beta - X &#92;beta^* &#92;&#124;^2' title='{&#92;Bbb E} &#92;&#124; X &#92;beta - X &#92;beta^* &#92;&#124;^2' class='latex' /> in this model) is a better indicator of accuracy than the loss <img src='http://s0.wp.com/latex.php?latex=%7B%5CBbb+E%7D+%5C%26%23124%3B+%5Cbeta+-+%5Cbeta%5E%2A+%5C%26%23124%3B%5E2&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='{&#92;Bbb E} &#92;&#124; &#92;beta - &#92;beta^* &#92;&#124;^2' title='{&#92;Bbb E} &#92;&#124; &#92;beta - &#92;beta^* &#92;&#124;^2' class='latex' />.</li>
<li><a class="snap_noshots" href="http://arxiv.org/abs/0803.3134">Meinshausen, Rocha, and Yu</a> compared our results with similar (though not perfectly analogous) existing results for the lasso selector, as well as an comparative analysis in low-dimensional situations.  Their numerical experiments also suggest that the <img src='http://s0.wp.com/latex.php?latex=%5Clambda&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;lambda' title='&#92;lambda' class='latex' /> parameter in (1) needs to be tuned by cross-validation, and that the Dantzig selector has particular difficulties with collinearity.</li>
<li><a class="snap_noshots" href="http://arxiv.org/abs/0803.3135">Friedlander and Saunders</a> focused on the speed of implementation of the Dantzig selector, concluding that using general-purpose linear algebra solvers (e.g. the simplex method) was moderately computationally intensive in practice.</li>
</ol>
<p>Finally, <a class="snap_noshots" href="http://arxiv.org/abs/0803.3136">Candès and myself</a> gave a rejoinder to these responses.  Our main points were:</p>
<ol>
<li>Regarding collinearity (which in particular implies breakdown of the RIP), an accurate estimation of the parameters <img src='http://s0.wp.com/latex.php?latex=%5Cbeta&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;beta' title='&#92;beta' class='latex' /> is essentially hopeless no matter what selector one uses, and it is indeed more profitable to instead focus on estimating <img src='http://s0.wp.com/latex.php?latex=X+%5Cbeta&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='X &#92;beta' title='X &#92;beta' class='latex' /> instead, but our selector is focused on applications (such as imaging, ADC conversion, or genomics) in which <img src='http://s0.wp.com/latex.php?latex=%5Cbeta&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;beta' title='&#92;beta' class='latex' /> is the variable of interest rather than <img src='http://s0.wp.com/latex.php?latex=X+%5Cbeta&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='X &#92;beta' title='X &#92;beta' class='latex' />.  It is certainly of interest to relax the RIP hypothesis, however, and to see whether one can still obtain good estimates for <img src='http://s0.wp.com/latex.php?latex=X+%5Cbeta&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='X &#92;beta' title='X &#92;beta' class='latex' /> in this case.  (The <em>canonical selector</em> always gives a near-optimal estimate for <img src='http://s0.wp.com/latex.php?latex=X+%5Cbeta&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='X &#92;beta' title='X &#92;beta' class='latex' />, but is NP-hard (and thus infeasible in practice) to compute.)</li>
<li>We agree with the points made that the parameters in (1) need to be tuned further to optimise performance, and in particular that cross-validation is an eminently sensible idea for this purpose.  Note that in many applications (e.g. imaging), the variance <img src='http://s0.wp.com/latex.php?latex=%5Csigma%5E2&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='&#92;sigma^2' title='&#92;sigma^2' class='latex' /> can be specified from the design of the application.</li>
<li>The mean-square error for the Dantzig selector does tend to underperform that for the the lasso selector slightly in many cases, but (as noted in our original paper) this can be compensated for by using a slightly more complicated <em>Gauss-Dantzig selector</em> in which the Dantzig selector is used to locate the &#8220;active&#8221; parameters, to which one then applies a classical least-squares regression method.</li>
<li>Running time of off-the-shelf implementations of the Dantzig selector are indeed somewhat of a concern for large data sets (e.g. <img src='http://s0.wp.com/latex.php?latex=10%5E6&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='10^6' title='10^6' class='latex' /> measurements).  But the same could have been said of the lasso selector when it was first introduced, and we expect to see faster ways to implement the algorithm in the future.</li>
</ol>
<p>It was an interesting and challenging new experience for Emmanuel and myself to engage in a formal discussion over one of my papers in a journal venue; I suppose this sort of thing is more common in the applied sciences such as statistics, but seems to be rather rare in pure mathematics.</p>
<p>Incidentally, after this rejoinder was submitted, more recent work has appeared showing that the lasso selector enjoys similar performance guarantees to the Dantzig selector: see this paper of  <a href="http://arxiv.org/abs/0801.1095">Bickel, Ritov, and Tsybakov</a>.  Also, a nice way to perform cross-validation for general compressed sensing problems via the Johnson-Lindenstrauss lemma was noted very recently <a class="snap_noshots" href="http://arxiv.org/abs/0803.1845">by Ward</a>.</p>
<p>[<em>Update</em>, Mar 22: The journal issue (Vol. 35, No. 6, 2007) in which all these articles appear <a href="http://projecteuclid.org/DPubS?service=UI&#38;version=1.0&#38;verb=Display&#38;handle=euclid.aos/1201012955">can be found here</a>.]</p>
<p>[<em>Update</em>, Mar 29: I have learned also of the recent paper of <a href="http://www-rcf.usc.edu/~gareth/research/DASSO.pdf">James, Radchenko, and Lv</a>, which introduces a new "DASSO" algorithm for computing the Dantzig selector in time comparable to that of the best known Lasso algorithms, and provides further theoretical connections between the two selectors.]</p>
]]></content:encoded>
</item>

</channel>
</rss>
