<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress.com" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>clustering &amp;laquo; WordPress.com Tag Feed</title>
	<link>http://en.wordpress.com/tag/clustering/</link>
	<description>Feed of posts on WordPress.com tagged "clustering"</description>
	<pubDate>Sun, 29 Nov 2009 06:19:36 +0000</pubDate>

	<generator>http://en.wordpress.com/tags/</generator>
	<language>en</language>

<item>
<title><![CDATA[Cluster Disk With Identifier (identifier) has a Persistent Reservation on it]]></title>
<link>http://fawzi.wordpress.com/2009/11/27/cluster-disk-with-identifier-identifier-has-a-persistent-reservation-on-it/</link>
<pubDate>Fri, 27 Nov 2009 08:38:47 +0000</pubDate>
<dc:creator>Mohamed Fawzi</dc:creator>
<guid>http://fawzi.wordpress.com/2009/11/27/cluster-disk-with-identifier-identifier-has-a-persistent-reservation-on-it/</guid>
<description><![CDATA[One of my customer&#8217;s team member had destroy the Hyper-V Cluster by mistake. He formatted the ]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>One of my customer&#8217;s team member had destroy the Hyper-V Cluster by mistake. He formatted the cluster nodes without evict them before doing that . The nodes of the clsuter used to be   part of old cluster that was destroyed by mistake.</p>
<p>I tried to build the nodes again from the scratch and Create new cluster, When I run the  validation wizard I got this error:</p>
<p><strong>Cluster disk with identifier  (identifier) has a persistent reservation on it ,the disk might be part of  other cluster. removing the disk from other validation set.</strong></p>
<p>My  SAN is HP EVA. The Cluster is not able to see any of my LUNS although I can  see them from disk management and can&#8217;t create the cluster.</p>
<p>This error due to the fact that the LUNs still keeping the old identifiers from the old cluster, You have to use Cluster command line to clear the reservation by that command:</p>
<p><strong>cluster.exe node %nodename% /clear:disknumber</strong></p>
<p>Now everything should work fine and you can pass the validation wizard <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> <strong><br />
</strong></p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[NextBio: Expertise of an author]]></title>
<link>http://scienceintelligence.wordpress.com/2009/11/25/expertise-of-an-author/</link>
<pubDate>Wed, 25 Nov 2009 20:12:23 +0000</pubDate>
<dc:creator>hbasset</dc:creator>
<guid>http://scienceintelligence.wordpress.com/2009/11/25/expertise-of-an-author/</guid>
<description><![CDATA[Filter terms in NextBio offer a fast way to know the expertise of an author at a glance. Tags size a]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Filter terms in NextBio offer a fast way to know the expertise of an author at a glance.</p>
<p>Tags size are proportional to relevance of keywords&#8230;</p>
<p>For e.g.:</p>
<p><a href="http://www.nextbio.com/b/search/author/A%20Yonath">http://www.nextbio.com/b/search/author/A%20Yonath</a></p>
<p>A. Yonath is one of the 3 latest Nobel (Chemistry) for mapping<br />
the ribosome at the atomic level.</p>
<p>P.S.: NextBio clustering features are also used in ScienceDirect&#8230;</p>
<p><a href="http://www.info.sciencedirect.com/using/searching-linking/nextbio/">http://www.info.sciencedirect.com/using/searching-linking/nextbio/</a></p>
<p><img class="alignnone" src="http://www.nextbio.com/b/s/img3/nextbio_basic.png" alt="" width="160" height="30" /></p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Week #2 – Distributed Computing with Hadoop]]></title>
<link>http://cmpe49eproject.wordpress.com/2009/11/25/week-2-%e2%80%93-distributed-computing-with-hadoop/</link>
<pubDate>Wed, 25 Nov 2009 14:20:03 +0000</pubDate>
<dc:creator>serdaryumlu</dc:creator>
<guid>http://cmpe49eproject.wordpress.com/2009/11/25/week-2-%e2%80%93-distributed-computing-with-hadoop/</guid>
<description><![CDATA[This distributed computing project aims to develop a distributed machine learning approach. In machi]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p style="text-align:left;">This distributed computing project aims to develop a distributed machine learning approach. In machine learning, there are two types of learning, supervised and unsupervised. In supervised learning, we attach labels to the data and try to attach labels or class to newcoming data. Classification is such a problem. Unsupervised learning is trying to understand your data and provide valuable information about it. Clustering is one of the examples of unsupervised learning where you don&#8217;t attach any label about your data points.</p>
<p style="text-align:left;">K-Means Clustering approach first selects K data points as clusters and attachs the closest data point to this class according to a distance measure such as Euclidean distance. Then we calculate the centroid of each cluster and reapply the procedure to calculate the cluster. At the end the clusters will converge and centroids will not change. We will use the last snapshot as our clusters and assign colours to each data point according to its cluster where it is closest to.</p>
<p style="text-align:left;">Map-Reduce version of clustering algorithm will be based on Map and Reduce algorithm style. In Hadoop there is tasktracker and under his control, nodes of processing map and reduce operations. By this way, calculating clusters and centroid will be done by Map operations and combining these will be based on Reduce operations.  After running this clustering approach successfully on a standalone Hadoop, we will extend it to a three VM cluster. Then after having this successful on a dataset such as wikipedia, if possible we will try to make this work on the 100-node cluster at the department in a two-three week time.</p>
<p style="text-align:left;">
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Coding Collective Intelligence]]></title>
<link>http://chemoton.wordpress.com/2009/11/24/1366/</link>
<pubDate>Mon, 23 Nov 2009 23:04:59 +0000</pubDate>
<dc:creator>Vitorino Ramos</dc:creator>
<guid>http://chemoton.wordpress.com/2009/11/24/1366/</guid>
<description><![CDATA[Figure &#8211; Book cover of Toby Segaran&#8217;s, &#8220;Programming Collective Intelligence ]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p style="text-align:center;"><a href="http://chemoton.wordpress.com/files/2009/11/pci-book.jpg"><img class="aligncenter size-full wp-image-1367" title="PCI Book" src="http://chemoton.wordpress.com/files/2009/11/pci-book.jpg" alt="" width="500" height="655" /></a>Figure &#8211; Book cover of Toby Segaran&#8217;s, &#8220;<a href="http://oreilly.com/catalog/9780596529321" target="_blank">Programming Collective Intelligence &#8211; Building Smart Web 2.0 Applications</a>&#8220;, O&#8217;Reilly Media, 368 pp., August 2007.</p>
<p>{<strong>scopus online description</strong>} Want to tap the power behind search rankings, product recommendations, social bookmarking, and online matchmaking? This fascinating book demonstrates how you can build Web 2.0 applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting data-sets from other web sites, collect data from users of your own applications, and analyze and understand the data once you&#8217;ve found it.  <em>Programming Collective Intelligence</em> takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general — all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application.</p>
<p style="text-align:justify;">{<strong>even if I don&#8217;t totally agree, here&#8217;s a &#8220;over-rated&#8221; description &#8211; specially on the scientific side, by someone &#8220;dwa&#8221; &#8211; link above</strong>} P<em>rogramming Collective Intelligence</em> is a new book from O&#8217;Reilly, which was written by Toby Segaran. The author graduated from MIT and is currently working at Metaweb Technologies. He develops ways to put large public data-sets into Freebase, a free online semantic database. You can find more information about him on his blog:  http://blog.kiwitobes.com/. Web 2.0 cannot exist without Collective Intelligence. The &#8220;giants&#8221; use it everywhere, YouTube recommends similar movies, Last.fm knows what would you like to listen and Flickr which photos are your favorites etc. This technology empowers <em>intelligent search</em>, <em>clustering</em>, <em>building price models</em> and <em>ranking on the web</em>. I cannot imagine modern service without <em>data analysis</em>. That is the reason why it is worth to start read about it. There are many titles about c<em>ollective intelligence</em> but recently I have read two, this one and &#8220;<em>Collective Intelligence in Action</em>&#8220;. Both are very pragmatic, but the O&#8217;Reilly&#8217;s one is more focused on the merit of the CI. The code listings are much shorter (but examples are written in <em>Python</em>, so that was easy). In general these books comparison is like <em>Java </em>vs. <em>Python</em>. If you would like to build recommendation engine &#8220;in Action&#8221;/Java way, you would have to read whole book, attach extra jar-s and design dozens of classes. The rapid <em>Python </em>way requires reading only 15 pages and voila, you have got the first recommendations. It is awesome!</p>
<p style="text-align:justify;">So how about rest of the book, there are still 319 pages! Further chapters say about: <em>discovering groups</em>, <em>searching</em>, <em>ranking</em>, <em>optimization</em>, <em>document filtering</em>, <em>decision trees</em>, <em>price models</em> or <em>genetic algorithms</em>. The book explains how to implement <em>Simulated Annealing</em>, <em>k-Nearest Neighbors</em>, <em>Bayesian Classifier</em> and many more. Take a look at the table of contents (here: http://oreilly.com/catalog/9780596529321/preview.html), it does not list all the algorithms but you can find more information there. Each chapter has about 20-30 pages. You do not have to read them all, you can choose the most important and still know what is going on. Every chapter contains minimum amount of theoretical introduction, for total beginners it might be not enough. I recommend this book for students who had statistics course (not only IT or computing science), this book will show you how to use your knowledge in practice _ there are many inspiring examples. For those who do not know <em>Python </em>- do not be afraid _ at the beginning you will find short introduction to language syntax. All listings are very short and well described by the author _ sometimes line by line. The book also contains necessary information about basic standard libraries responsible for xml processing or web pages downloading. If you would like to start learn about <em>collective intelligence</em> I would strongly recommend reading &#8220;<em>Programming Collective Intelligence</em>&#8221; first, then &#8220;Collective Intelligence in Action&#8221;. The first one shows how easy it is to implement basic algorithms, the second one would show you how to use existing open source projects related to <em>machine learning</em>.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Clustering in JBoss Application Server]]></title>
<link>http://vinaytech.wordpress.com/2009/11/23/clustering-in-jboss-application-server/</link>
<pubDate>Mon, 23 Nov 2009 08:42:55 +0000</pubDate>
<dc:creator>Vinay</dc:creator>
<guid>http://vinaytech.wordpress.com/2009/11/23/clustering-in-jboss-application-server/</guid>
<description><![CDATA[Whenever business wants to run their IT applications in a scalable and reliable way, they need to ha]]></description>
<content:encoded><![CDATA[Whenever business wants to run their IT applications in a scalable and reliable way, they need to ha]]></content:encoded>
</item>
<item>
<title><![CDATA[AstriCon 2009 Presentation -- Building a Distributed Call Center]]></title>
<link>http://leifmadsen.wordpress.com/2009/11/18/astricon-2009-presentation-building-a-distributed-call-center/</link>
<pubDate>Wed, 18 Nov 2009 20:40:15 +0000</pubDate>
<dc:creator>Leif Madsen</dc:creator>
<guid>http://leifmadsen.wordpress.com/2009/11/18/astricon-2009-presentation-building-a-distributed-call-center/</guid>
<description><![CDATA[For those who missed it, you&#8217;re able to get a PDF of my AstriCon 2009 presentation from the ht]]></description>
<content:encoded><![CDATA[For those who missed it, you&#8217;re able to get a PDF of my AstriCon 2009 presentation from the ht]]></content:encoded>
</item>
<item>
<title><![CDATA[Clustering with Shallow Trees]]></title>
<link>http://healthyalgorithms.wordpress.com/2009/11/14/clustering-with-shallow-trees/</link>
<pubDate>Sat, 14 Nov 2009 20:21:36 +0000</pubDate>
<dc:creator>Abraham Flaxman</dc:creator>
<guid>http://healthyalgorithms.wordpress.com/2009/11/14/clustering-with-shallow-trees/</guid>
<description><![CDATA[I&#8217;m updating my CV, and that reminded me that I meant to promote this cool clustering techniqu]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p><img class="alignleft size-full wp-image-710" src="http://healthyalgorithms.wordpress.com/files/2009/11/shallow_trees.png" alt="" width="338" height="353" />I&#8217;m updating my CV, and that reminded me that I meant to promote this cool clustering technique that I was a little bit involved in, <a href="http://arxiv.org/abs/0910.0767">Clustering With Shallow Trees</a>.</p>
<p>This goes way back to about half-way through my post-doc at MSR, when statistical physicist Riccardo Zecchina was visiting for a semester, and was teaching me about all of the &#8220;intractable&#8221; optimization problems that he can solve using his panoply of propagation algorithms.  In particular, he was working on algorithms for certain types of steiner tree optimization, and he had discovered that adding an extra constraint on the depth of the tree didn&#8217;t make the problem harder.  (All variants of the problem he considers are NP-hard, but some are NP-harder than others.)<!--more--></p>
<p>On the bus to work the next day, this depth constraint clicked with some complaints I had heard recently about the failures of single-linkage clustering in practice, that the algorithm produces long, stringy clusters, which are very sensitive to noise. Could having  a knob to tune the depth of the spanning tree could be a way to address this? Riccardo worked hard on it for a long time, and brought a bunch of collaborators into the mix, and eventually they figured out how to make it work really well.  They also proved that this approach interpolates between single-linkage (when the depth is unbounded) and the popular new affinity propagation technique (when the depth bound is 2.</p>
<p>It turns out that something between SL and AP is the thing to do in many instances.  Here is the home-run example from the paper, clustering people based on their SNPs:</p>
<p><a href="http://healthyalgorithms.wordpress.com/files/2009/11/shallow_trees.png"><img class="aligncenter size-full wp-image-710" src="http://healthyalgorithms.wordpress.com/files/2009/11/shallow_trees.png" alt="" width="338" height="353" /></a></p>
<p>Compare with the results of single linkage and affinity prop:</p>
<p><a href="http://healthyalgorithms.wordpress.com/files/2009/11/sl_and_ap.png"><img class="aligncenter size-full wp-image-711" src="http://healthyalgorithms.wordpress.com/files/2009/11/sl_and_ap.png" alt="" width="500" height="180" /></a></p>
<p>(What use is clustering people based on their genetic information?  It&#8217;s important and scary to think about that&#8230;)</p>
<p>I got them to try applying it to a public health dataset as well, and the results are promising, but it needs more careful attention to be useful.</p>
<p>That reminds me: Riccardo and Team Survey Propagation, is the code for this available?  We need to let other researchers try it on their own data if we want to give the technique a chance to take off.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[New level of parallelism in CloverETL]]></title>
<link>http://blog.cloveretl.com/2009/11/04/new-level-of-parallelism-in-cloveretl/</link>
<pubDate>Wed, 04 Nov 2009 12:28:07 +0000</pubDate>
<dc:creator>mvarecha</dc:creator>
<guid>http://blog.cloveretl.com/2009/11/04/new-level-of-parallelism-in-cloveretl/</guid>
<description><![CDATA[For the upcoming release of CloverETL 2.9, we are working on improvements in CloverETL Server which ]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>For the upcoming release of CloverETL 2.9, we are working on improvements in CloverETL Server which will allow run transformations in parallel on multiple cluster nodes.</p>
<p>CloverETL Server already supports clustering, so more instances may cooperate to each other. Current stable version already implements common cluster features: fail-over/high-availability and scalability of lots of requests which are load-balanced on available cluster nodes. These features are actually implemented since version 1.3.</p>
<p><strong>The basic concept of new parallelism</strong><br />
Transformation may be automatically executed in parallel on more cluster nodes according to configuration and each of these &#8220;worker&#8221; transformations processes just its part of data. Because there is one &#8220;master&#8221; transformation, which manages the other transformations and which gathers tracking data from &#8220;worker&#8221; transformations, the parallelism is transparent for CloverETL Server client. Client by default &#8220;sees&#8221; just one (master) execution and aggregated tracking data. However there are still logs and tracking data for each of &#8220;worker&#8221; transformations, so it&#8217;s still possible to inspect details of this parallel execution. &#8220;Worker&#8221; transformations outputs are gathered to the &#8220;master&#8221;, thus client has one single transformation output which may be processed further.</p>
<p><strong>So how to get parts of input data?</strong><br />
Basically, transformation can process data which is already partitioned, which is the best case and there is no overhead with partitioning of data, or CloverETL Server itself can partition input data from one single source and distribute data on the fly (during the transformation) to several cluster nodes using the network connection. Overhead of this operation depends on the speed of network communication and other conditions.</p>
<p><strong>Design changes in the graph</strong><br />
We aim to keep the transformation graph almost the same as it would be for &#8220;standalone&#8221; execution. Thus there will be just a couple of extra components in the graph which is intended to run in parallel. These components will handle partitioning/departitioning of data in case it&#8217;s not already partitioned.</p>
<p><strong>Scalability</strong><br />
The new parallelism in CloverETL Server is a giant leap for scalability of the transformations. Ever since the graph is designed for paraller run, the number of computers which run this transformation depends just on cluster configuration. Graph itself is still the same. Configuration of the parallelism includes:</p>
<ul>
<li>working CloverETL Server cluster, thus standalone server instances won&#8217;t be able to handle such execution</li>
<li>&#8220;partitioned&#8221; sandbox(see below) with list of locations</li>
</ul>
<p><strong>New sandbox types</strong><br />
On server side, graphs and related files are organized in so-called sandboxes. Until version 2.8, there was just one type: &#8220;shared&#8221; sandbox. It means that it contains the same files and directory structure on all cluster nodes. Since version 2.9 there will be two more types:</p>
<ul>
<li>&#8220;local&#8221; sandbox &#8211; is (locally) accessible on just one cluster node. It&#8217;s intended for huge input/output data which is not intended to be shared/replicated among multiple cluster nodes.</li>
<li>&#8220;partitioned&#8221; sandbox &#8211; each of its physical location contains just part of data. It&#8217;s intended as a storage for partitioned input/output data of transformations which are supposed to run in parallel. List of physical locations actually specifies nodes which will run &#8220;worker&#8221; transformations.</li>
</ul>
<p><strong>Master &#8211; worker responsibilities</strong><br />
Master observes all related workers and when some transformation phase is finished on all workers, it&#8217;s master&#8217;s responsibility to allow the workers to process next phase. When any of the workers fails from any reason, it&#8217;s master&#8217;s responsibility to abort all the other workers and select whole execution as failed. Master/worker &#8211; These terms have meaning only in the scope of one transformation. Since 2.9 there is no privileged node configured as &#8220;master&#8221; in the cluster, but it doesn&#8217;t mean that all the nodes are equal. There may be differences between nodes in accessibility to physical sources. Configuration of sandboxes should reflect it.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Thesis Design Development ~ Mapping + Diagramming in Regent’s Park ]]></title>
<link>http://marialardi.com/2009/11/02/thesis-design-development-mapping-diagramming-in-regent%e2%80%99s-park/</link>
<pubDate>Mon, 02 Nov 2009 12:52:41 +0000</pubDate>
<dc:creator>Maria Lardi</dc:creator>
<guid>http://marialardi.com/2009/11/02/thesis-design-development-mapping-diagramming-in-regent%e2%80%99s-park/</guid>
<description><![CDATA[&nbsp;]]></description>
<content:encoded><![CDATA[&nbsp;]]></content:encoded>
</item>
<item>
<title><![CDATA[Latent Class Analysis]]></title>
<link>http://yudiagusta.wordpress.com/2009/11/02/latent-class-analysis/</link>
<pubDate>Mon, 02 Nov 2009 01:01:25 +0000</pubDate>
<dc:creator>Yudi Agusta</dc:creator>
<guid>http://yudiagusta.wordpress.com/2009/11/02/latent-class-analysis/</guid>
<description><![CDATA[Latent Class Analysis merupakan turunan dari Latent Variable Analysis yang berusaha memodel data cat]]></description>
<content:encoded><![CDATA[Latent Class Analysis merupakan turunan dari Latent Variable Analysis yang berusaha memodel data cat]]></content:encoded>
</item>
<item>
<title><![CDATA[Sammanfattning - Content Free Clustering for Search Engine Query Log]]></title>
<link>http://querylog.wordpress.com/2009/10/16/sammanfattning-content-free-clustering-for-search-engine-query-log/</link>
<pubDate>Fri, 16 Oct 2009 09:43:18 +0000</pubDate>
<dc:creator>Frej</dc:creator>
<guid>http://querylog.wordpress.com/2009/10/16/sammanfattning-content-free-clustering-for-search-engine-query-log/</guid>
<description><![CDATA[Hosseini, M., Abolhassani, H., and Harikandeh, 2007 Författarna försöker klustra sökloggar från AOL ]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Hosseini, M., Abolhassani, H., and Harikandeh, 2007</p>
<p>Författarna försöker klustra sökloggar från AOL med hjälp av en bipartit graf mellan söksträngar och adresser och K-meansklustring i de resulterande komponenterna.</p>
<h2>Metod</h2>
<p>Metoden innehåller fyra steg:</p>
<ol>
<li>Bygg en bipartit med söksträngar och adresser, där en kant finns om ensökning har föranlett ett klick till adressen.</li>
<li>Elimination av skräpkanter, för att kunna få ut komponenter.</li>
<li>Dimensionreducering av grannmatrisern i komponentenerna.</li>
<li>Klustring med K-means i komponenterna.</li>
</ol>
<h2>Resultat</h2>
<p>Testdata är framtagen genom att plocka 40 k slumpvisa sökningar. Dessa klassificeras manuellt till 7 olika kategorier.</p>
<p>Efter de första stegen med den bipartita grafen finns det bara en komponent som är så stor att den är intressant att gå vidare med, och den innehåller 55 % av datan. Den komponenten klustras till fyra delar med K-means.</p>
<p>Resultaten utvärderas genom att precisionen för olika ämnen kontrolleras i varje kluster. Tre av fyra kluster hade något ämne som med betydligt högre precision än de andra.</p>
<h2>Relevans för oss</h2>
<p>Att klustra i AOL-loggarna är ju precis vad vi håller på med, så det är intressant att se hur de lyckas med. De har dock nöjt sig med att försöka passa in några få jättekluster i förutbestämda kategorier, något som vi bedömmer som ointressant. Något som däremot är intressant är att de har lagt ner ett stort arbete på att manuellt klassificera 40 k sökningar i sju kategorier, vilket ger oss en bild av hur proportionerna mellan dessa borde vara i våra kluster, det är möjligtvis något vi skulle kunna använda för utvärdering.</p>
<p>Att deras initiala steg med komponenter i en bipartit graf bara gav en stor komponent stämmer väl med våra erfarenheter av loggarna,  att det är skräpiga data som är svåra att slå isär.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Almost available; virtually virtual?]]></title>
<link>http://availabilityadvisor.com/2009/10/15/almost-available-virtually-virtual-what%e2%80%99s-the-deal/</link>
<pubDate>Thu, 15 Oct 2009 12:58:26 +0000</pubDate>
<dc:creator>Andy Bailey</dc:creator>
<guid>http://availabilityadvisor.com/2009/10/15/almost-available-virtually-virtual-what%e2%80%99s-the-deal/</guid>
<description><![CDATA[At the risk of sounding like a public service announcement, this week’s post comes with a health war]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p style="text-align:left;"><strong>At the risk of sounding like a public service announcement, this week’s post comes with a health warni</strong><strong>ng: read the small print when choosing your virtual availability provider.</strong></p>
<p style="text-align:left;"><img class="alignleft size-medium wp-image-22" title="virtual-reality-8" src="http://availabilityadvisor.wordpress.com/files/2009/10/virtual-reality-8.jpg?w=300" alt="virtual-reality-8" width="168" height="150" />Finding a fault tolerant virtualization solution to guarantee availability of your critical applications can be rather like choosing whether to die by the knife or by the sword.</p>
<p style="text-align:left;">Of course, at Stratus, we’d prefer that you did not choose to die at all. We’d prefer that you choose to live. This post aims to help you do just that.</p>
<p style="text-align:left;"><strong><br />
So what’s the deal?</strong></p>
<p style="text-align:left;">Every availability solution has its strengths and limitations. In the case of virtualized availability the considerations are</p>
<p style="text-align:left;">1.    Actual fault tolerance Vs claimed fault tolerance<br />
2.    Operational simplicity<br />
3.    Performance and financial advantages</p>
<p style="text-align:left;">Here are some key questions for vendors:</p>
<ul style="text-align:left;">
<li>How scalable is the solution?</li>
<li>Is it restricted to one core?</li>
<li>If so, how many systems need to be available to ensure continuous availability?</li>
<li>What impact does this have on costs?</li>
<li>What impact does this have on latency?</li>
</ul>
<p style="text-align:left;">It’s also worth investigating the benefits of fault tolerance versus clustering. Clustering solutions, such as VMware HA, can scale but introduce additional complexity and do not protect against data loss.</p>
<p style="text-align:left;">As VMware states when describing its VMware FT:</p>
<p style="text-align:left;">“This is different from HA, which restarts any virtual machines that fail. Such a restart requires the virtual machines to complete the process of rebooting, and information about the state of the virtual machine, such as applications or unsaved user-entered information, might be lost”</p>
<p style="text-align:left;">VMware FT may be different from VMware HA but it is still a clustering solution. It still requires separate physical servers. It still does not scale beyond a single core. It still incurs additional overhead as a function of its architecture</p>
<p style="text-align:left;">Sure, there is no “best”, “one size fits all”  solution for virtualized availability and there are situations where one virtualized availability solution is better suited than another, yet Stratus FT Server is always worth considering. It guarantees:</p>
<ul style="text-align:left;">
<li>Industry leading fault tolerance</li>
<li>Operational simplicity</li>
<li>Absolute security</li>
<li>Total scalability</li>
<li>No hidden costs</li>
</ul>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Windows Server 2003 (R2) System Level Fault Tolerance (Clustering/NLB) Best Practices]]></title>
<link>http://erickoo.wordpress.com/2009/10/14/windows-server-2003-r2-system-level-fault-tolerance-clusteringnlb-best-practices/</link>
<pubDate>Wed, 14 Oct 2009 19:04:11 +0000</pubDate>
<dc:creator>Eric</dc:creator>
<guid>http://erickoo.wordpress.com/2009/10/14/windows-server-2003-r2-system-level-fault-tolerance-clusteringnlb-best-practices/</guid>
<description><![CDATA[Always use quality server &amp; networking hardware for fault-tolerant systems. Use RAID to create d]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><ul>
<li>Always use quality server &#38; networking hardware for fault-tolerant systems.</li>
<li>Use RAID to create disk subsystem redundancy.</li>
<li>Don’t run MSCS and NLB on the same computer, as it’s not supported by Microsoft.</li>
<li>When possible, try to use cluster-aware applications, so you can use cluster service to monitor the application. If you use cluster-unaware application, it can run on a cluster, but the application is not monitored by cluster service.</li>
<li>Use active/passive clustering mode, when performance is not critical. It is easier to administrate and licensing costs are lower.</li>
<li>If you got TCP/IP-based services such as Terminal Services, Web sites, VPN services or streaming media services, use NLB.</li>
<li>For mission critical applications (enterprise messaging, databases, file and print services) use Windows Server 2003 Cluster Services to provide server failover functionality.</li>
<li>Disable power management on each of the cluster nodes. IN BIOS and in operating system’s control panel to avoid unwanted failovers.</li>
<li>Choose carefully whether you should use nonshared or shared disk approcah to clustering.</li>
<li>When you plan to use MSN cluster, always purchase 1 additional node.</li>
<li>Be sure that MS and software manufacturer certify that 3rd party software for Cluster Service works on Windows Server 2003 cluster or you might be faced with limited support when troubleshooting is needed.</li>
<li>In each node use multiple network cards. For example one card can be dedicated to private network (internal cluster communications), other can be used for public network (client connectivity) or both can be used for mixed network (public and private communication)&#160; </li>
<li>Configure failback schedule to allow failback only during non-peak times or after hours to reduce the chance of having a group failing back to a node during regular business hours after a failure. </li>
<li>Test failover and failback mechanism thoroughly.</li>
<li>If you are logged in with Cluster Service account, don’t use AD Users &#38; Computers or Windows security box to change the password.</li>
<li>If you’re removing a node from MNS cluster, make sure that majority of the nodes remain running to keep the cluster in a working state.</li>
<li>Carefully consider how to backup and restore a cluster.</li>
<li>Perform ASR backups periodically and immediately after any hardware changes to a cluster node including changes on a shared storage device or local disk configuration. </li>
<li>Before deciding which clustering technology to use, make sure you understand the application that will be used thoroughly.</li>
<li>Create a rule that allows only specific ports to the clustered IP address and block all others.</li>
<li>Use tools like robocopy.exe to replicate data between NLB nodes.</li>
</ul>
<p>-Eric</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Clustering ]]></title>
<link>http://santacruzconcepts.wordpress.com/2009/10/13/clustering/</link>
<pubDate>Tue, 13 Oct 2009 00:42:42 +0000</pubDate>
<dc:creator>vivaglobal</dc:creator>
<guid>http://santacruzconcepts.wordpress.com/2009/10/13/clustering/</guid>
<description><![CDATA[Big banks cluster...so do we The Economist  magazine recently expounded on the phenomenon whereby fi]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><div class="mceTemp">
<div id="attachment_159" class="wp-caption alignleft" style="width: 168px"><a rel="attachment wp-att-159" href="http://santacruzconcepts.wordpress.com/2009/10/13/clustering/banks-2/"><img class="size-full wp-image-159" title="Banks" src="http://santacruzconcepts.wordpress.com/files/2009/10/banks1.jpg" alt="Big banks cluster...so do we" width="158" height="93" /></a><p class="wp-caption-text">Big banks cluster...so do we</p></div>
<p><em>The Economist</em>  magazine recently expounded on the phenomenon whereby firms from the same industry gather together in close proximity.  &#8220;Clustering&#8221; is also one of the Top Ten reasons listed by <em>Expansion Management  </em>magazine for company relocations.</p>
</div>
<p>Companies cluster mainly because they enjoy the expertise and camaraderie of their industry colleagues, not to mention the competitive vibe.  Banks have long clustered in London or the Cayman Islands for that reason.  </p>
<p>There is also the factor of service support.  Providers come to these areas precisely to serve these companies, and thus a boon is born.  The adage: &#8220;fish where the fish are&#8221;  is an apt one here.</p>
<p> Here in Nogales, AZ, we have clustering a-plenty in the produce industry.  Approximately 130 fresh produce distributors, shippers, brokers or affiliated businesses call Santa Cruz County headquarters.  Most are located on the Hwy I-19 corridor that connects Tucson to the Mexico border&#8230;and beyond, but they can also be found in mini-clusters throughout the entire area.</p>
<p>These companies are here because the Port of Mariposa (slated for a huge government expansion program &#8211; more on that soon!) is responsible for crossing over 200,000 trucks of fresh produce per year from Mexico.</p>
<p>These Mexican trucks typically unload their winter produce at these distributors&#8217; warehouses, which are then reloaded onto U.S trucks that criss-cross the nation.  All told, fresh produce imports from Mexico is an $8B concern, of which Nogales represents about half.</p>
<p>Another fine example of clustering is right across the U.S. border in Nogales, Sonora, home to over 500,000. There, manufacturers like Motorola, Ford, Intel, etc. have parts factory (<em>maquiladoras</em>) which employ thousands in a business worth double-digit billions.</p>
<p>Either country, on both sides of the border, minutes from each other: this is arguably the only place in North America where you will see such an extensive and dynamic example of clustering.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[IBM WebSphere CE (WAS CE) clustering with WADI]]></title>
<link>http://sreebodapati.wordpress.com/2009/10/12/ibm-websphere-ce-was-ce-clustering-with-wadi/</link>
<pubDate>Mon, 12 Oct 2009 17:17:30 +0000</pubDate>
<dc:creator>sreebodapati</dc:creator>
<guid>http://sreebodapati.wordpress.com/2009/10/12/ibm-websphere-ce-was-ce-clustering-with-wadi/</guid>
<description><![CDATA[Here is my quick steps on setting up a WASCE cluster using WADI. I am using IBM WebSphere CE 2.1.1.2]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Here is my quick steps on setting up a WASCE cluster using WADI. I am using IBM WebSphere CE 2.1.1.2 to be specific; my impression was WASCE by default would use wadi and wadi would use uni cast; and I will have to make very few changes to get this to work. Well it was not that simple, nor were my assumptions true. IBM WebSphere CE 2.1.1.2 does not yet support wadi with unicast that is planned for a future release. Some of the key points that I noticed is wadi uses the ports 4000 and above to automatically select an available port for multicast communication. There seems to be  no way to reconfigure this port and the multicast ip used by wadi. The multicast communication between two wadi nodes is not secure so the communication is in clear text.</p>
<p><strong>Here are the setup steps:</strong></p>
<p><strong>Step 1:</strong> Create two WAS CE instances &#8211; App1Node1Server and App1Node2Server; and ensure both servers are shutdown. with assumption that the base port I start with for node1 is 21371 and for node2 21471; I am using a secure port so my script configures the server to listen at 21374 and 21474 respectively. The rmi ports are 21372 and 21472 respectively.</p>
<p><strong>Step 2:</strong> Update config-substitutions.properties with clusterNodeName on both  server; under &#60;server&#62;/var/config folder and set the names to node1 and node2 respectively. I set the RemoteDeployHostname to localhost in my test. But this would probably change when I need to test across multiple VMs/Machines.</p>
<p><strong>Step 3: </strong>Update config.xml for App1Node1Server as below, by adding the wadi modules (tomcat6-clustering-wadi, wadi-clustering, farming) and the gbean configuration for farming; it is important to put the NodeInfo and ClusterInfo gbeans as well. In the farming gbean you will need to put the username/password, and host/port, urlPath information for the other node (node2 in this case) that will be part of the cluster</p>
<pre> &#60;module name="org.apache.geronimo.configs/tomcat6-clustering-wadi/2.1.4/car"
   load="true"/&#62;
 &#60;module name="org.apache.geronimo.configs/wadi-clustering/2.1.4/car" load="true"/&#62;
 &#60;module name="org.apache.geronimo.configs/farming/2.1.4/car" load="true"&#62;
   &#60;gbean name="NodeInfo"&#62;
     &#60;attribute name="name"&#62;${clusterNodeName}&#60;/attribute&#62; 
   &#60;/gbean&#62; 
   &#60;gbean name="ClusterInfo"&#62;    
     &#60;attribute name="name"&#62;${clusterName}&#60;/attribute&#62; 
   &#60;/gbean&#62; 
   &#60;gbean name="org.apache.geronimo.configs/farming/2.1.4/car?ServiceModule=
       org.apache.geronimo.configs/farming/2.1.4/car,
       j2eeType=NodeInfo,name=NodeInfo2"
     gbeanInfo="org.apache.geronimo.farm.config.BasicNodeInfo"&#62;
     &#60;attribute name="name"&#62;node2&#60;/attribute&#62;
     &#60;attribute
       propertyEditor="org.apache.geronimo.farm.config.BasicExtendedJMXConnectorInfoEditor"
       name="extendedJMXConnectorInfo"&#62;          
       &#60;ns:javabean
          xmlns=""
          xmlns:ns4="<a href="http://geronimo.apache.org/xml/ns/attributes-1.2">http://geronimo.apache.org/xml/ns/attributes-1.2</a>"
          xmlns:ns="<a href="http://geronimo.apache.org/xml/ns/deployment/javabean-1.0">http://geronimo.apache.org/xml/ns/deployment/javabean-1.0</a>"&#62;            
            &#60;ns:property name="username"&#62;node2User&#60;/ns:property&#62;
            &#60;ns:property name="password"&#62;node2Passwd&#60;/ns:property&#62;
            &#60;ns:property name="protocol"&#62;rmi&#60;/ns:property&#62;
            &#60;ns:property name="host"&#62;localhost&#60;/ns:property&#62;
            &#60;ns:property name="port"&#62;21474&#60;/ns:property&#62;
            &#60;ns:property name="urlPath"&#62;/jndi/rmi://localhost:21472&#60;/ns:property&#62;
            &#60;ns:property name="local"&#62;true&#60;/ns:property&#62;
       &#60;/ns:javabean&#62;    
     &#60;/attribute&#62; 
   &#60;/gbean&#62;
 &#60;/module&#62;</pre>
<p><strong>Step 4:</strong> Update config.xml for App1Node2Server as below, by adding the wadi modules and the gbean for farming, we the the same as above but change the username/password, and host/port, urlPath information for the other node (node1) that will be part of the cluster</p>
<pre> &#60;module name="org.apache.geronimo.configs/tomcat6-clustering-wadi/2.1.4/car"
   load="true"/&#62;
 &#60;module name="org.apache.geronimo.configs/wadi-clustering/2.1.4/car" load="true"/&#62;
 &#60;module name="org.apache.geronimo.configs/farming/2.1.4/car" load="true"&#62;
   &#60;gbean name="NodeInfo"&#62;
     &#60;attribute name="name"&#62;${clusterNodeName}&#60;/attribute&#62; 
   &#60;/gbean&#62; 
   &#60;gbean name="ClusterInfo"&#62;    
     &#60;attribute name="name"&#62;${clusterName}&#60;/attribute&#62; 
   &#60;/gbean&#62; 
   &#60;gbean name="org.apache.geronimo.configs/farming/2.1.4/car?ServiceModule=
     org.apache.geronimo.configs/farming/2.1.4/car,j2eeType=NodeInfo,name=NodeInfo1"
     gbeanInfo="org.apache.geronimo.farm.config.BasicNodeInfo"&#62;
     &#60;attribute name="name"&#62;node1&#60;/attribute&#62;
     &#60;attribute
       propertyEditor="org.apache.geronimo.farm.config.BasicExtendedJMXConnectorInfoEditor"
       name="extendedJMXConnectorInfo"&#62;          
       &#60;ns:javabean
          xmlns=""
          xmlns:ns4="<a href="http://geronimo.apache.org/xml/ns/attributes-1.2">http://geronimo.apache.org/xml/ns/attributes-1.2</a>"
          xmlns:ns="<a href="http://geronimo.apache.org/xml/ns/deployment/javabean-1.0">http://geronimo.apache.org/xml/ns/deployment/javabean-1.0</a>"&#62;            
            &#60;ns:property name="username"&#62;node1User&#60;/ns:property&#62;
            &#60;ns:property name="password"&#62;node1Passwd&#60;/ns:property&#62;
            &#60;ns:property name="protocol"&#62;rmi&#60;/ns:property&#62;
            &#60;ns:property name="host"&#62;localhost&#60;/ns:property&#62;
            &#60;ns:property name="port"&#62;21374&#60;/ns:property&#62;
            &#60;ns:property name="urlPath"&#62;/jndi/rmi://localhost:21372&#60;/ns:property&#62;
            &#60;ns:property name="local"&#62;true&#60;/ns:property&#62;
       &#60;/ns:javabean&#62;    
     &#60;/attribute&#62; 
   &#60;/gbean&#62;
 &#60;/module&#62;</pre>
<p><strong>Step 5:</strong> If you restart the servers now &#8211; you will notice that the servers are looking for some specific folders that do not exist; Create the &#60;server&#62;/master-repository &#38; &#60;server&#62;/cluster-repository folders under the App1Node1Server and App1Node2Server folders (these folder must be created at the same level as the var folder which holds the config, logs etc.., ); (Since I was setting this up as root and the actual instances where running with another id, I had to ensure the ownership and permissions on the new folders was setup properly to be accessible by the other id with read/write permissions for the user)</p>
<p><strong>Step 6:</strong> Now restart both the server instances; and you will notice that one node is added as cluster member on the other node.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[PhotoSketch: Internet Image Montage]]></title>
<link>http://homeilja.wordpress.com/2009/10/09/photosketch-internet-image-montage/</link>
<pubDate>Fri, 09 Oct 2009 03:15:08 +0000</pubDate>
<dc:creator>ilja</dc:creator>
<guid>http://homeilja.wordpress.com/2009/10/09/photosketch-internet-image-montage/</guid>
<description><![CDATA[PhotoSketch is a system created by Chinese students. It automatically composes simple freehand sketc]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>PhotoSketch is a system created by Chinese students. It automatically composes simple freehand sketches into realistic pictures.</p>
<div id="attachment_41" class="wp-caption alignnone" style="width: 710px"><img class="size-full wp-image-41" title="PhotoSketch" src="http://homeilja.wordpress.com/files/2009/10/photosketch.jpg" alt="PhotoSketch pipeline" width="700" height="217" /><p class="wp-caption-text">PhotoSketch pipeline (Source: Tsinghua University)</p></div>
<p>How it works:</p>
<p><strong>Step 1:</strong> The user draws simple items on the screen. Every item can be drawn as a shape contour or an ellipse, which is the default. All items and the background must be annotated with text labels, so that the system knows how to interpret the figures.</p>
<p><strong>Step 2:</strong> The system searches the Internet (currently flickr, google and yahoo) for pictures applicable to the given text labels and downloads a lot of them.</p>
<p><strong>Step 3:</strong> Now the system tries to exclude false discoveries and chooses the most fitting pictures:</p>
<p style="padding-left:30px;"><strong>Choosing appropriate background pictures:</strong><br />
The first approach is to cluster all pictures by its <a title="CIELUV color space" href="http://en.wikipedia.org/wiki/CIELUV_color_space" target="_blank">L*, u*, v*</a> histograms. All pictures in the biggest cluster will be used for the next step. Example: If the user did annotate the background as &#8220;beach&#8221;, then the pictures should mostly consist of yellow and blue colors. Therefore these pictures should be all together in the biggest cluster. All pictures with much red or green will be divided into smaller clusters and thus they will be excluded. The next approach ensures that the horizon lines in the pictures are aligned well. Finally a <a title="Segmentation" href="http://en.wikipedia.org/wiki/Segmentation_%28image_processing%29" target="_blank">segmentation</a> algorithm tests how many segments of the background pictures are covered by the items. The lesser the better.</p>
<p style="padding-left:30px;"><strong>Choosing appropriate item pictures:</strong><br />
The first approach is to discard all pictures with complicated backgrounds, because in these pictures the separation of the figure from the background is hard to accomplish. Next the items  will be segmented. This will be done by iteratively applying dilation and a <a title="Grabcut" href="http://en.wikipedia.org/wiki/Grabcut" target="_blank">grabcut</a> algorithm until the result or the computation limit is reached. The segmented pictures will then be compared with the item shapes drawn by the user to get all pictures that match the shape best. Finally the pictures will be clustered to get only the consistent ones.</p>
<p style="padding-left:30px;">
<p><strong>Step 4:</strong> The images are composed seamlessly into several compositions by a novel self-made <a title="Alpha compositing" href="http://en.wikipedia.org/wiki/Alpha_compositing" target="_blank">blending</a> method. The user can then choose the best one and refine the composition.</p>
<p>In the experiments (&#8220;two PCs with 2.66 GHz Quad core CPUs and 6 GB RAM&#8221;) the total computation took about 15*n+5 minutes (n = number of items). To compute a full scene with e.g. 6 items it takes about 95 minutes beginning with downloading of the pictures till the composition arrives. With a slow Internet connection it would probably take much longer to download all the pictures. But I think it pays off. Can&#8217;t wait to try it out.</p>
<p>More info and the paper can be found <a title="PhotoSketch" href="http://cg.cs.tsinghua.edu.cn/montage/main.htm" target="_blank"><strong>here</strong></a>.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Things I don't like about CPanel]]></title>
<link>http://janeve.wordpress.com/2009/10/08/things-i-dont-like-about-cpanel/</link>
<pubDate>Thu, 08 Oct 2009 06:46:59 +0000</pubDate>
<dc:creator>Janeve</dc:creator>
<guid>http://janeve.wordpress.com/2009/10/08/things-i-dont-like-about-cpanel/</guid>
<description><![CDATA[CPanel is one of the most popular Hosting Control Panels out there today. I have looked at CPanel mo]]></description>
<content:encoded><![CDATA[CPanel is one of the most popular Hosting Control Panels out there today. I have looked at CPanel mo]]></content:encoded>
</item>

</channel>
</rss>
