<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress.com" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>data-mining &amp;laquo; WordPress.com Tag Feed</title>
	<link>http://en.wordpress.com/tag/data-mining/</link>
	<description>Feed of posts on WordPress.com tagged "data-mining"</description>
	<pubDate>Tue, 01 Dec 2009 21:54:22 +0000</pubDate>

	<generator>http://en.wordpress.com/tags/</generator>
	<language>en</language>

<item>
<title><![CDATA[Social networking boon for data mining tools]]></title>
<link>http://rlwilsonconsulting.wordpress.com/2009/12/01/social-networking-boon-for-data-mining-tools/</link>
<pubDate>Tue, 01 Dec 2009 16:16:55 +0000</pubDate>
<dc:creator>Randy Wilson</dc:creator>
<guid>http://rlwilsonconsulting.wordpress.com/2009/12/01/social-networking-boon-for-data-mining-tools/</guid>
<description><![CDATA[This Computerworld article discusses a couple data mining tools that were created for law enforcemen]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>This Computerworld <a href="http://www.computerworld.com/s/article/9141601/Scammers_get_better_tools_for_tapping_social_networks?taxonomyId=82&#38;pageNumber=2">article</a> discusses a couple data mining tools that were created for law enforcement to track potentially criminal activity and the like.  However, these tool can purchased by customers for less upstanding uses.</p>
<p>The tools grab the bits and pieces of data about an individual and synthesize that information into a fuller profile.  More significantly, the tools tracks relationships between various IP addresses in order to evaluate various social relationships the person has with various entities.  As an example, they mention that one of the tools, Maltego, can compile a list of gmail users at the National Security Agency.</p>
<p><strong>Competitive Intelligence: </strong>Exomind, another tool can track various activities between people so for example, it could observe a sudden rush of employees at a company giving and receiving LinkedIn recommendation which might signal that a company will shortly experience a layoff.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[A Simple Search Strategy to beat them all]]></title>
<link>http://irthoughts.wordpress.com/2009/12/01/a-simple-search-strategy-to-beat-them-all/</link>
<pubDate>Tue, 01 Dec 2009 13:20:24 +0000</pubDate>
<dc:creator>E. Garcia</dc:creator>
<guid>http://irthoughts.wordpress.com/2009/12/01/a-simple-search-strategy-to-beat-them-all/</guid>
<description><![CDATA[Now that I&#8217;m out of school, I am doing what I love the most: programming and testing IR system]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Now that I&#8217;m out of school, I am doing what I love the most: programming and testing IR systems.</p>
<p>I&#8217;m currently testing a ranking algorithm for an IR system built over the last years. The answer set is based on a simple matching (SM) search strategy.</p>
<p>Mistake not a simple matching strategy  for a simple or basic search approach as it can evolve into  the most complex one.</p>
<p>Unlike classic boolean searches (i.e., AND, OR, XOR),  SM is suitable for constructing answer sets and subsets based on coordination levels. Add a supporting scoring function (tf-IDF derivatives, RSJ-PM, BM25, etc) and&#8230; TA DA: a customizable clustering algorithm for retrieving and ranking search results.</p>
<p>Proper fine tuning allows presenting end-users with answer sets wherein AND results are accumulated at the top of the search results. As users move down the search results, they are presented with OR results and the search experience is perceived as if the system expands the answer set by switching query modes.</p>
<p>I&#8217;ve also added a query reduction mechanism for discoverying related searches. Brazilian Wax, nice!</p>
<p>In preliminary tests, results compare favorably with answer sets from search engines that claim to do search expansion/reduction, query mode switching, or clustering.</p>
<p>Next step is to check if with a large corpus and a thesaurus, results compare favorably with results from search engines that claim to use semantics.</p>
<p>So far, my one is cost effective and does not require of extra libraries.</p>
<p>PS: I forget to mention that my ranking algorithm is not based on computing vectors or cosine similarities, so any overhead from a Vector Space Model is avoided. That&#8217;s the icing on the cake!</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Machine Learning in Bioinformatics: A Review]]></title>
<link>http://biointelligence.wordpress.com/2009/12/01/machine-learning-in-bioinformatics-a-review/</link>
<pubDate>Tue, 01 Dec 2009 12:12:29 +0000</pubDate>
<dc:creator>biointelligence</dc:creator>
<guid>http://biointelligence.wordpress.com/2009/12/01/machine-learning-in-bioinformatics-a-review/</guid>
<description><![CDATA[Due to continued research there is a continuous groth in the amount of biological data available. Th]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p style="text-align:justify;"><span style="font-size:x-small;">Due to continued research there is a continuous groth in the amount of biological data available. The exponential growth of the amount of biological data available raises two problems:</span></p>
<p style="text-align:justify;"><span style="font-size:x-small;">1. Efficient information storage and management and, on the other hand, the extraction of useful information from these data.</span></p>
<p><span style="font-size:x-small;">2. It requires the development of tools and methods capable of transforming all these heterogeneous data into biological knowledge about the underlying mechanism.</span></p>
<p><span style="font-size:x-small;"> </span><span style="font-size:x-small;"><span style="font-size:x-small;">There are various biological domains where machine learning techniques are applied for knowledge extraction from data. The below figure shows the main areas of biology such as genomics, proteomics, microarrays, evolution and text mining where computational methods are being applied.</span><span style="font-size:x-small;"></span><span style="font-size:x-small;"></p>
<p style="text-align:center;"><img class="aligncenter size-full wp-image-533" title="Areas where Computational Biology has been applied" src="http://biointelligence.wordpress.com/files/2009/12/30_nov-img.jpg" alt="" width="387" height="324" /><span style="font-size:x-small;"> </span></p>
<p style="text-align:justify;"><span style="font-size:x-small;">In addition to all the above applications, computational techniques are used to solve other problems, such as efficient primer design for PCR, biological image analysis and backtranslation of proteins (which is, given the degeneration of the genetic code, a complex combinatorial problem). Machine learning consists in programming computers to optimize a performance criterion by using example data or past experience. The optimized criterion can be the accuracy provided by a predictive model—in a modelling problem—, and the value of a fitness or evaluation function—in an optimization problem. Machine learning uses statistical theory when building computational models since the objective is to make inferences from a sample. The two main steps in this process are:</span></p>
<p style="text-align:justify;"><span style="font-size:x-small;"> </span><span style="font-size:x-small;"><span style="font-size:x-small;">1. To induce the model by processing the huge amount of data</span></span></p>
<p><span style="font-size:x-small;">2. To represent the model and making inferences efficiently.</span></p>
<p><span style="font-size:x-small;"> </span><span style="font-size:x-small;"><span style="font-size:x-small;">The process of transforming data into knowledge is both iterative and interactive. The iterative phase consists of several steps. In the first step, we need to integrate and merge the different sources of information into only one format. By using data warehouse techniques, the detection and resolution of outliers and inconsistencies are solved. In the second step, it is necessary to select, clean and transform the data. To carry out this step, we need to eliminate or correct the uncorrected data, as well as decide the strategy to impute missing data. This step also selects the relevant and non-redundant variables; this selection could also be done with respect to the instances. In the third step, called data mining, we take the objectives of the study into account in order to choose the most appropriate analysis for the data. In this step, the type of paradigm for supervised or unsupervised classification should be selected and the model will be induced from the data. Once the model is obtained, it should be evaluated and interpreted—both from statistical and biological points of view—and, if necessary, we should return to the previous steps for a new iteration. This includes the solution of conflicts with the current knowledge in the domain. The model satisfactorily checked—and the new knowledge discovered—are then used to solve the problem.</span></span></p>
<p><span style="font-size:x-small;"> </span><span style="font-size:x-small;"><span style="font-size:x-small;">An article published in the journal &#8216;Briefings in Bioinformatics&#8217; gives an insight of various machine learning techniques used in Bioinformatics. It also throws light on some major techniques such as Bayesian classifiers, logistic regression, discriminant analysis, classification trees, nearest neighbour, neural networks, Support vector machines, clustering, Hidden Markov Models and much more.</span></span></p>
<p><span style="font-size:x-small;"> </span><span style="font-size:x-small;"><span style="font-size:x-small;">The article can be found here: <a href="http://bib.oxfordjournals.org/cgi/content/full/7/1/86?maxtoshow=&#38;HITS=&#38;hits=&#38;RESULTFORMAT=&#38;fulltext=bioinformatics&#38;andorexactfulltext=and&#38;searchid=1&#38;FIRSTINDEX=0&#38;resourcetype=HWCIT">http://bib.oxfordjournals.org/cgi/content/full/7/1/86?maxtoshow=&#38;HITS=&#38;hits=&#38;RESULTFORMAT=&#38;fulltext=bioinformatics&#38;andorexactfulltext=and&#38;searchid=1&#38;FIRSTINDEX=0&#38;resourcetype=HWCIT</a></span></span></p>
<p style="text-align:justify;"> </p>
<p></span></p>
<p></span></p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Скрипт русификации AdventureWorksDW и русифицированный пример данных для Data Mining Add-ins for Office 2007]]></title>
<link>http://microsoftbi.ru/2009/11/30/russianaw/</link>
<pubDate>Mon, 30 Nov 2009 11:43:41 +0000</pubDate>
<dc:creator>Иван Косяков</dc:creator>
<guid>http://microsoftbi.ru/2009/11/30/russianaw/</guid>
<description><![CDATA[Долгожданный скрипт русификации базы AdventureWorksDW можно скачать по адресу http://RussianAW.codep]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Долгожданный скрипт русификации базы AdventureWorksDW можно скачать по адресу <a href="http://russianaw.codeplex.com/">http://RussianAW.codeplex.com</a>.Описание русифицированных колонок и значений есть в документации на сайте проекта. Скрипт будет дополняться – пока это только альфа-версия.</p>
<p>Соответственно, русифицированный пример таблицы Excel для Data Mining Add-ins for Office 2007 можно скачать по адресу <a href="http://russiandmaddins.codeplex.com/">http://RussianDMAddins.codeplex.com</a>.</p>
<p>Комментарии и дополнения приветствуются.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Mind Your Tweets: The CIA Social Networking Surveillance System]]></title>
<link>http://axiomamuse.wordpress.com/2009/11/28/mind-your-tweets-the-cia-social-networking-surveillance-system/</link>
<pubDate>Sun, 29 Nov 2009 03:39:13 +0000</pubDate>
<dc:creator>AxXiom</dc:creator>
<guid>http://axiomamuse.wordpress.com/2009/11/28/mind-your-tweets-the-cia-social-networking-surveillance-system/</guid>
<description><![CDATA[From Wikileaks Jump to: navigation, search October 24, 2009 By Tom Burghardt (Global Research)[1] Th]]></description>
<content:encoded><![CDATA[From Wikileaks Jump to: navigation, search October 24, 2009 By Tom Burghardt (Global Research)[1] Th]]></content:encoded>
</item>
<item>
<title><![CDATA[Dear f., here is what I feel about "paper" Journals]]></title>
<link>http://chemoton.wordpress.com/2009/11/25/dear-f-here-is-what-i-feel-about-paper-journals/</link>
<pubDate>Wed, 25 Nov 2009 16:49:16 +0000</pubDate>
<dc:creator>Vitorino Ramos</dc:creator>
<guid>http://chemoton.wordpress.com/2009/11/25/dear-f-here-is-what-i-feel-about-paper-journals/</guid>
<description><![CDATA[Pranav Mistry and SixthSense technology &#8211; Part 1 of 2 Pranav Mistry and SixthSense technology ]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p style="text-align:center;"><span style='text-align:center; display: block;'><object width='425' height='350'><param name='movie' value='http://www.youtube.com/v/mzKmGTVmqJs&#038;rel=1&#038;fs=1&#038;showsearch=0&#038;hd=0' /><param name='allowfullscreen' value='true' /><param name='wmode' value='transparent' /><embed src='http://www.youtube.com/v/mzKmGTVmqJs&#038;rel=1&#038;fs=1&#038;showsearch=0&#038;hd=0' type='application/x-shockwave-flash' allowfullscreen='true' width='425' height='350' wmode='transparent'></embed></object></span><br />
Pranav Mistry and <em>SixthSense </em>technology &#8211; Part 1 of 2</p>
<p style="text-align:center;"><span style='text-align:center; display: block;'><object width='425' height='350'><param name='movie' value='http://www.youtube.com/v/8NK295ldF_g&#038;rel=1&#038;fs=1&#038;showsearch=0&#038;hd=0' /><param name='allowfullscreen' value='true' /><param name='wmode' value='transparent' /><embed src='http://www.youtube.com/v/8NK295ldF_g&#038;rel=1&#038;fs=1&#038;showsearch=0&#038;hd=0' type='application/x-shockwave-flash' allowfullscreen='true' width='425' height='350' wmode='transparent'></embed></object></span><br />
Pranav Mistry and <em>SixthSense </em>technology &#8211; Part 2 of 2</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Data Mining - Konsep Pohon Keputusan]]></title>
<link>http://fairuzelsaid.wordpress.com/2009/11/24/data-mining-konsep-pohon-keputusan/</link>
<pubDate>Tue, 24 Nov 2009 11:50:16 +0000</pubDate>
<dc:creator>Fairuz El Said</dc:creator>
<guid>http://fairuzelsaid.wordpress.com/2009/11/24/data-mining-konsep-pohon-keputusan/</guid>
<description><![CDATA[Pohon Keputusan Pada sesi ini akan dibahas secara ringkas konsep salah satu metode data mining yaitu]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><div id="attachment_522" class="wp-caption alignright" style="width: 290px"><a href="http://fairuzelsaid.wordpress.com/files/2009/11/decision-tree.gif"><img class="size-full wp-image-522" title="Pohon Keputusan" src="http://fairuzelsaid.wordpress.com/files/2009/11/decision-tree.gif" alt="Pohon Keputusan" width="280" height="274" /></a><p class="wp-caption-text">Pohon Keputusan</p></div>
<p style="text-align:left;">Pada sesi ini akan dibahas secara ringkas <strong>konsep </strong>salah satu metode <strong>data mining </strong>yaitu <strong>pohon keputusan</strong>. Bahasan meliputi:</p>
<ul>
<li><em>Latar Belakang Pohon Keputusan<br />
</em></li>
<li><em>Pengertian </em><em>Pohon Keputusan</em></li>
<li><em>Manfaat </em><em>Pohon Keputusan</em></li>
<li><em>Model Pohon Keputusan</em></li>
</ul>
<p><strong>Latar Belakang Pohon Keputusan</strong></p>
<p>Di dalam kehidupan manusia sehari-hari, manusia selalu dihadapkan oleh berbagai macam masalah dari berbagai macam bidang. Masalah-masalah ini yang dihadapi oleh manusia tingkat kesulitan dan kompleksitasnya sangat bervariasi, mulai dari yang teramat sederhana dengan sedikit faktor-faktor yang berkaitan dengan masalah tersebut dan perlu diperhitungkan sampai dengan yang sangat rumit dengan banyak sekali faktor-faktor turut serta berkaitan dengan masalah tersebut dan perlu untuk diperhitungkan.</p>
<p><!--more--></p>
<p>Untuk menghadapi masalah-masalah ini, manusia mulai mengembangkan sebuah sistem  yang dapat membantu manusia agar dapat dengan mudah mampu untuk menyelesaikan masalah-masalah tersebut. Adapun pohon keputusan ini adalah sebuah jawaban akan sebuah sistem yang manusia kembangkan untuk membantu mencari dan membuat keputusan untuk masalah-masalah tersebut dan dengan memperhitungkan berbagai macam factor yang ada di dalam lingkup masalah tersebut. Dengan pohon keputusan, manusia dapat dengan mudah melihat mengidentifikasi dan melihat hubungan antara faktor-faktor yang mempengaruhi suatu masalah dan dapat mencari penyelesaian terbaik dengan memperhitungkan faktor-faktor tersebut. Pohon keputusan ini juga dapat menganalisa nilai resiko dan nilai suatu informasi yang terdapat dalam suatu alternatif pemecahan masalah. Peranan pohon keputusan ini sebagai alat Bantu dalam mengambil keputusan (decision support tool) telah dikembangkan oleh manusia sejak perkembangan teori pohon yang dilandaskan pada teori graf. Kegunaan pohon keputusan yang sangat banyak ini membuatnya telah dimanfaatkan oleh manusia dalam berbagai macam sistem pengambilan keputusan.</p>
<p><strong>Pengertian Pohon Keputusan</strong></p>
<p>Pohon yang dalam analisis pemecahan masalah pengambilan keputusan adalah pemetaan mengenai alternatif-alternatif pemecahan masalah yang dapat diambil dari masalah tersebut. Pohon tersebut juga memperlihatkan faktor-faktor kemungkinan/probablitas yang akan mempengaruhi alternatif-alternatif keputusan tersebut, disertai dengan estimasi hasil akhir yang akan didapat bila kita mengambil alternatif keputusan tersebut.</p>
<p><strong>Struktur Pohon Keputusan</strong></p>
<p><strong> </strong></p>
<p><strong> </strong></p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Government 2.0]]></title>
<link>http://thegeoffblog.wordpress.com/2009/11/24/government-2-0/</link>
<pubDate>Tue, 24 Nov 2009 02:43:45 +0000</pubDate>
<dc:creator>thegeoffblog</dc:creator>
<guid>http://thegeoffblog.wordpress.com/2009/11/24/government-2-0/</guid>
<description><![CDATA[I just finished a group presentation for a Knowledge Management course I am taking for my MBA. The c]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>I just finished a group presentation for a Knowledge Management course I am taking for my MBA. The company we presented is a huge data mining firm called Attensity. They focus on a wide variety of industries with specific attention being placed on goverment. Check it out&#8230;</p>
<p><span style='text-align:center; display: block;'><object width='425' height='350'><param name='movie' value='http://www.youtube.com/v/ynGkdhaDa_Q&#038;rel=1&#038;fs=1&#038;showsearch=0&#038;hd=0' /><param name='allowfullscreen' value='true' /><param name='wmode' value='transparent' /><embed src='http://www.youtube.com/v/ynGkdhaDa_Q&#038;rel=1&#038;fs=1&#038;showsearch=0&#038;hd=0' type='application/x-shockwave-flash' allowfullscreen='true' width='425' height='350' wmode='transparent'></embed></object></span></p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Coding Collective Intelligence]]></title>
<link>http://chemoton.wordpress.com/2009/11/24/1366/</link>
<pubDate>Mon, 23 Nov 2009 23:04:59 +0000</pubDate>
<dc:creator>Vitorino Ramos</dc:creator>
<guid>http://chemoton.wordpress.com/2009/11/24/1366/</guid>
<description><![CDATA[Figure &#8211; Book cover of Toby Segaran&#8217;s, &#8220;Programming Collective Intelligence ]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p style="text-align:center;"><a href="http://chemoton.wordpress.com/files/2009/11/pci-book.jpg"><img class="aligncenter size-full wp-image-1367" title="PCI Book" src="http://chemoton.wordpress.com/files/2009/11/pci-book.jpg" alt="" width="500" height="655" /></a>Figure &#8211; Book cover of Toby Segaran&#8217;s, &#8220;<a href="http://oreilly.com/catalog/9780596529321" target="_blank">Programming Collective Intelligence &#8211; Building Smart Web 2.0 Applications</a>&#8220;, O&#8217;Reilly Media, 368 pp., August 2007.</p>
<p>{<strong>scopus online description</strong>} Want to tap the power behind search rankings, product recommendations, social bookmarking, and online matchmaking? This fascinating book demonstrates how you can build Web 2.0 applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting data-sets from other web sites, collect data from users of your own applications, and analyze and understand the data once you&#8217;ve found it.  <em>Programming Collective Intelligence</em> takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general — all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application.</p>
<p style="text-align:justify;">{<strong>even if I don&#8217;t totally agree, here&#8217;s a &#8220;over-rated&#8221; description &#8211; specially on the scientific side, by someone &#8220;dwa&#8221; &#8211; link above</strong>} P<em>rogramming Collective Intelligence</em> is a new book from O&#8217;Reilly, which was written by Toby Segaran. The author graduated from MIT and is currently working at Metaweb Technologies. He develops ways to put large public data-sets into Freebase, a free online semantic database. You can find more information about him on his blog:  http://blog.kiwitobes.com/. Web 2.0 cannot exist without Collective Intelligence. The &#8220;giants&#8221; use it everywhere, YouTube recommends similar movies, Last.fm knows what would you like to listen and Flickr which photos are your favorites etc. This technology empowers <em>intelligent search</em>, <em>clustering</em>, <em>building price models</em> and <em>ranking on the web</em>. I cannot imagine modern service without <em>data analysis</em>. That is the reason why it is worth to start read about it. There are many titles about c<em>ollective intelligence</em> but recently I have read two, this one and &#8220;<em>Collective Intelligence in Action</em>&#8220;. Both are very pragmatic, but the O&#8217;Reilly&#8217;s one is more focused on the merit of the CI. The code listings are much shorter (but examples are written in <em>Python</em>, so that was easy). In general these books comparison is like <em>Java </em>vs. <em>Python</em>. If you would like to build recommendation engine &#8220;in Action&#8221;/Java way, you would have to read whole book, attach extra jar-s and design dozens of classes. The rapid <em>Python </em>way requires reading only 15 pages and voila, you have got the first recommendations. It is awesome!</p>
<p style="text-align:justify;">So how about rest of the book, there are still 319 pages! Further chapters say about: <em>discovering groups</em>, <em>searching</em>, <em>ranking</em>, <em>optimization</em>, <em>document filtering</em>, <em>decision trees</em>, <em>price models</em> or <em>genetic algorithms</em>. The book explains how to implement <em>Simulated Annealing</em>, <em>k-Nearest Neighbors</em>, <em>Bayesian Classifier</em> and many more. Take a look at the table of contents (here: http://oreilly.com/catalog/9780596529321/preview.html), it does not list all the algorithms but you can find more information there. Each chapter has about 20-30 pages. You do not have to read them all, you can choose the most important and still know what is going on. Every chapter contains minimum amount of theoretical introduction, for total beginners it might be not enough. I recommend this book for students who had statistics course (not only IT or computing science), this book will show you how to use your knowledge in practice _ there are many inspiring examples. For those who do not know <em>Python </em>- do not be afraid _ at the beginning you will find short introduction to language syntax. All listings are very short and well described by the author _ sometimes line by line. The book also contains necessary information about basic standard libraries responsible for xml processing or web pages downloading. If you would like to start learn about <em>collective intelligence</em> I would strongly recommend reading &#8220;<em>Programming Collective Intelligence</em>&#8221; first, then &#8220;Collective Intelligence in Action&#8221;. The first one shows how easy it is to implement basic algorithms, the second one would show you how to use existing open source projects related to <em>machine learning</em>.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[At a Software Powerhouse, the Good Life Is Under Siege]]></title>
<link>http://enterpriseinformationmanagement.wordpress.com/2009/11/23/at-a-software-powerhouse-the-good-life-is-under-siege/</link>
<pubDate>Mon, 23 Nov 2009 10:01:27 +0000</pubDate>
<dc:creator>Andy Painter</dc:creator>
<guid>http://enterpriseinformationmanagement.wordpress.com/2009/11/23/at-a-software-powerhouse-the-good-life-is-under-siege/</guid>
<description><![CDATA[By STEVE LOHR A TOUR of its carefully tended, 300-acre corporate campus here leaves little doubt why]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>By <a title="Steve Lohr - The New York Times" href="http://topics.nytimes.com/top/reference/timestopics/people/l/steve_lohr/index.html?inline=nyt-per" target="_blank">STEVE LOHR</a></p>
<p>A TOUR of its carefully tended, 300-acre corporate campus here leaves little doubt why surveys, year after year, rate the <a title="SAS" href="http://www.sas.com/">SAS Institute</a>, the world’s largest private software company, among the best places to work.</p>
<p>There is the subsidized day care and preschool. There are the four company doctors and the dozen nurses who provide free primary care. The recreational amenities include basketball and racquetball courts, a swimming pool, exercise rooms and 40 miles of running and biking trails. There is a meditation garden, as well as on-site haircuts, manicures, and jewelry repair. Employees are encouraged to work 35-hour weeks.</p>
<p>Academics have studied the company’s benefit-enhanced corporate culture as a model for nurturing creativity and loyalty among engineers and other workers. Six years ago, in a report on <a title="Overview of SAS segment on “60 Minutes.“" href="http://www.cbsnews.com/stories/2003/04/18/60minutes/main550102.shtml">“60 Minutes,”</a> Morley Safer called working at SAS “the good life.”</p>
<p>But that good life is under threat today as never before. SAS’s specialty, a lucrative niche called business intelligence software, is becoming mainstream. Free, open-source alternatives to some of the company’s products are increasingly popular. On the other end of the spectrum, the heavyweights of the software industry — <a title="More information about Oracle Corporation" href="http://topics.nytimes.com/top/news/business/companies/oracle_corporation/index.html?inline=nyt-org">Oracle</a>, <a title="More information about SAP AG" href="http://topics.nytimes.com/top/news/business/companies/sap-ag/index.html?inline=nyt-org">SAP</a>, <a title="More information about Microsoft Corp" href="http://topics.nytimes.com/top/news/business/companies/microsoft_corporation/index.html?inline=nyt-org">Microsoft</a> and, especially, <a title="More information about International Business Machines Corporation" href="http://topics.nytimes.com/top/news/business/companies/international_business_machines/index.html?inline=nyt-org">I.B.M.</a> — are plunging in and investing billions of dollars.</p>
<p>“It will be a dogfight,” says Bill Hostmann, an analyst at Gartner. “SAS has never faced a competitor like I.B.M. And I do think I.B.M. sees SAS as a big, fatted cow.”</p>
<p>The term “business intelligence software” applies to a wide range of products and services, but all the technology is aimed at helping businesses mine nuggets of insight from mountains of data. SAS has traditionally specialized in advanced software to analyze huge data sets and to generate predictive statistical models for large corporations and government agencies.</p>
<p>Credit card companies, for example, use SAS to detect unusual buying patterns in real time, and to spot potentially fraudulent charges. Giant retail chains use SAS to tailor pricing and product offerings down to the store level. Telecommunications companies use SAS to identify the few thousand customers, among millions, most likely to switch to another cellphone carrier, and to aim marketing at them. SAS software is also used to parse sensor signals from North Sea oil rigs, combined with weather and structural data, to predict failure of parts before it happens. Of the 100 largest companies worldwide, 92 use SAS software.</p>
<p>But as the stream of companies’ collected data turns into a torrent, SAS and other software companies are trying to find new ways to harness it. The information is generated not only by computerized systems for tracking operations, customers and sales. It also comes from new data sources like Web site visits, social network chatter and public records accessible over the Internet, as well as genome sequences, sensor signals and surveillance tapes, all in digital form.</p>
<p>This data explosion, experts say, is an untapped asset at most companies, which lack the tools and skills to exploit it. Yet the long-range potential, they say, is to use this data for far more fine-grained analysis of markets, customer behavior and operations, making business more of a science and less a seat-of-the-pants art.</p>
<p>“Now, the data is available so business can move toward evidence-based decision-making,” says Erik Brynjolfsson, an economist and director of the <a title="The center’s home page." href="http://ebusiness.mit.edu/">Center for Digital Business</a> at the <a title="More articles about Massachusetts Institute of Technology" href="http://topics.nytimes.com/top/reference/timestopics/organizations/m/massachusetts_institute_of_technology/index.html?inline=nyt-org">Massachusetts Institute of Technology</a>. “This market is a huge opportunity.”</p>
<p>That opportunity is not lost on SAS. “Our advantage is the incredible depth of our technology, developed over years and applied to specific industries,” says James H. Goodnight, the chief executive and a co-founder of SAS. “No one can match our toolbox.”</p>
<p>Indeed, no one underestimates SAS’s technical prowess. The big question is whether the company’s seemingly pampered culture can embrace the higher-octane institutional metabolism that it will need to succeed.</p>
<p>“We know we have to change — no question about it,” says Jim Davis, 51, a senior vice president at SAS. “Our market space has changed dramatically in the last 18 months or so, more than at any time over the 33-year history of the company. We can’t sit back. Things are only going to get faster.”</p>
<p>THE company traces its roots to a time when computing was costly and for the few. Originally called Statistical Analysis System, it was founded in 1976 by Mr. Goodnight and three colleagues from the agricultural statistics department at <a title="More articles about North Carolina State University" href="http://topics.nytimes.com/top/reference/timestopics/organizations/n/north_carolina_state_university/index.html?inline=nyt-org">North Carolina State University</a>. Its techniques were initially used to calculate the intricacies of soil, weather, seed varieties and other factors to improve crop yields.</p>
<p>To build an audience, Mr. Goodnight spent nights packing up boxes of computer tapes and manuals, which he sent to university and corporate researchers. Soon, companies wanted him and his academic colleagues to develop software tools tailored for industry. In 1976 at a users’ conference, 300 or so people showed up, many from business.</p>
<p>“That was pretty much an ‘aha’ moment for us, that it was time to expand beyond the university,” Mr. Goodnight recalls. “It was a little scary, cutting the academic umbilical cord. But I was convinced we could do it.”</p>
<p>He and his colleagues at SAS developed their own programming language and software tools, and designed them for eggheads like themselves. Users were analysts with Ph.D.’s, working with programmers and employed by the largest companies at the forefront of using computing in their businesses, including banks, national retailers, insurers and drug companies.</p>
<p>SAS invested heavily in research and development, and even today allocates 22 percent of the company’s revenue to research. The formula has paid off in steady growth, year after year. Revenue reached $2.26 billion in 2008, up from $1.34 billion five years earlier.</p>
<p>Yet the company also faces the classic challenge of being the innovative pioneer — enjoying rich profit margins but facing new competition from rivals seeking to gain market share with lower prices and substitute technology.</p>
<p>In the last two years, the major software companies have scooped up companies in the business intelligence market. Among the larger moves, SAP bought Business Objects for $6.8 billion, I.B.M. bought Cognos for $4.9 billion and Oracle picked up Hyperion for $3.3 billion.</p>
<p>Still, those companies compete in the broad swath of the business intelligence market for reporting and analysis products. Such data on sales, shipments, customers and operations amount to a numbers-laden portrait of the recent past. The SAS stronghold is a more sophisticated kind of software typically called “advanced analytics and predictive modeling,” which uses historical and current data to try to peer into the future and model likely outcomes.</p>
<p>The competitive thrust that really grabbed SAS’s attention came in late July, when I.B.M. announced that it planned to pay $1.2 billion for SPSS, a maker of predictive modeling software. I.B.M. has placed SPSS and Cognos into a new business analytics and optimization group. That business will be supported by 200 scientists, and the company has said it will retrain or hire 4,000 consultants and analysts to work in the group.</p>
<p>“This is the big growth strategy for I.B.M., the company’s next big play for this decade,” says Ambuj Goyal, a computer scientist who is general manager of I.B.M’s business analytics software unit. “SAS comes from the legacy world of statisticians and programmers. The real opportunity is in deploying this technology broadly in corporations.”</p>
<p>To counter I.B.M. and others, SAS is looking to forge a tighter relationship with a big technology services company. It is also shortening product development cycles to 12 to 18 months, down from 24 to 36. “That’s what the market expects,” Mr. Davis says.</p>
<p>The most sweeping change is the company’s move toward the Internet model of software delivery — as a service that customers tap into over the Web, much as <a title="More information about Google Inc" href="http://topics.nytimes.com/top/news/business/companies/google_inc/index.html?inline=nyt-org">Google</a> and other Internet companies do. SAS has dipped its toe in, with some initial products. But a major expansion is planned, supported by a sprawling $70 million data center scheduled to begin operating next year.</p>
<p>The remotely delivered software is part of a drive to broaden the market for SAS technology beyond an elite corps of quantitative analysts and into the rank-and-file of corporate professionals.</p>
<p>Analysts say the company’s strategy looks sound, even if the outcome is uncertain. “SAS has to do a lot of things right to succeed,” says Peter Sondergaard, senior vice president of research for Gartner. “But if it executes correctly, it could be a winner.”</p>
<p>ACROSS its campus here, there are signs that the SAS culture is evolving with the times. Rick Langston, 54, a senior software manager who joined the company 29 years ago, smiles and shrugs when asked about the 35-hour workweek. After leaving the office, Mr. Langston routinely checks on work e-mail at home.</p>
<p>These days, he explains, SAS is a global company with far-flung project teams, and overnight e-mails can resolve problems and speed things along. Deadline work to meet product development schedules, he adds, can mean long hours at times. “But this is certainly not a place where you are working 60-hour weeks, week in and week out,” he said.</p>
<p>To be sure, the corporate cocoon in Cary can breed insularity. SAS, for example, was slow to recognize the brewing challenge from free, open-source alternatives to some of its products. A free programming language and set of software tools for statistical computing, called <a title="The R Project for Statistical Computing" href="http://www.r-project.org/" target="_blank">R</a>, has become increasingly popular at universities and labs.</p>
<p>The company shifted course earlier this year and modified its software so programs written with R work seamlessly with SAS technology. “Shame on us for not engaging more with the open-source community,” says Keith Collins, senior vice president and chief technology officer. “But we’re committed to doing that now.”</p>
<p>THE architect of the SAS culture is Mr. Goodnight, a lanky, laconic billionaire. The benefits have built up gradually over the years as a series of pragmatic steps, he says. The day-care program began after a valued employee was about to leave to take care of her young child. The on-site medical checkups grow out of the belief that “good health is good business,” he says.</p>
<p>Today, SAS estimates that its health care center saves the company $5 million a year, by providing care more cheaply than an outside insurer and by not having employees leave the campus for doctor’s visits. Employee turnover at SAS averages 4 percent a year, versus about 20 percent for the overall software industry.</p>
<p>The office atmosphere is sedate. There are no dogs roaming the halls, no Nerf-ball fights, no one jumping on trampolines — no whiff of Silicon Valley. The SAS culture is engineered for its own logic: to reduce distractions and stress, and thus foster creativity.</p>
<p>“The SAS model is sensible and durable; there’s nothing faddish or ephemeral,” says Richard Florida, a professor at the Rotman School of Management at the University of Toronto, who has studied SAS and is the author of “The Rise of the Creative Class.”</p>
<p>During the technology boom at the start of this decade, SAS considered a drastic change in its model: going public. <a title="More information about Goldman Sachs Group Incorporated" href="http://topics.nytimes.com/top/news/business/companies/goldman_sachs_group_inc/index.html?inline=nyt-org">Goldman Sachs</a> bankers were brought in as advisers, and in 2000 SAS recruited a former Oracle executive, Andre Boisvert, as its president.</p>
<p>Under Mr. Boisvert, SAS installed a new financial reporting system and paid the sales force incentive commissions rather than salary only. But when technology stocks plummeted, the appeal of selling shares to the public also receded. Mr. Boisvert resigned from SAS in 2001 and is now an independent investor and consultant.</p>
<p>Mr. Goodnight recalls those days as a brief period of New Economy surrealism, and going public as a path wisely avoided. SAS, he says, is a culture averse to the short-term pressures of Wall Street, which he characterizes as “a bunch of 28-year-olds, hunched over spreadsheets, trying to tell you how to run your business.”</p>
<p>Unlike many other tech companies, SAS has had no recession-related layoffs this year. “I’ve got a two-year pipeline of projects in R &#38; D,” Mr. Goodnight says. “Why would I lay anyone off?”</p>
<p>Mr. Goodnight, though 66, has no plans to retire himself. His fingerprints, colleagues say, remain all over the business, especially in meeting with customers and in overseeing research.</p>
<p>He is not only a statistician, but also a bit of gambler who enjoys calculating his chances. For example, he is co-author of a paper that simulated millions of possible outcomes in blackjack.</p>
<p>Mr. Goodnight regards his new rivals the way a confident card player might. He likes the odds, and he likes his hand.</p>
<p>“We’re pushing as fast as we can to stay ahead — on the cutting edge of everything,” he says. “We’ll do fine.”</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[SWAT4LS2009 - James Eales: Mining Semantic Networks of Bioinformatics eResources from Literature]]></title>
<link>http://semanticscience.wordpress.com/2009/11/20/swat4ls2009-james-eales-mining-semantic-networks-of-bioinformatics-eresources-from-literature/</link>
<pubDate>Fri, 20 Nov 2009 14:06:49 +0000</pubDate>
<dc:creator>na303</dc:creator>
<guid>http://semanticscience.wordpress.com/2009/11/20/swat4ls2009-james-eales-mining-semantic-networks-of-bioinformatics-eresources-from-literature/</guid>
<description><![CDATA[eResource Annotations could help with making better choices: which resource is best? which is availa]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>eResource Annotations could help with</p>
<ul>
<li> making better choices: which resource is best? </li>
<li> which is available?</li>
<li> reduce curation</li>
<li> help with service discovery</li>
</ul>
<p>Approach: link bioinformatics resources using semantic descriptors generated from text mining&#8230;.head terms for services can be used to assign services to types..e.g. applications, data sources etc.</p>
<div style="margin-top:10px;height:15px;" class="zemanta-pixie"><a class="zemanta-pixie-a" href="http://reblog.zemanta.com/zemified/36fa0518-d461-4cb7-b36a-9515630525eb/" title="Reblog this post [with Zemanta]"><img style="border:medium none;float:right;" class="zemanta-pixie-img" src="http://img.zemanta.com/reblog_e.png?x-id=36fa0518-d461-4cb7-b36a-9515630525eb" alt="Reblog this post [with Zemanta]"></a></div>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Data Mining - Konsep Jaringan Syaraf Tiruan (JST)]]></title>
<link>http://fairuzelsaid.wordpress.com/2009/11/19/metode-data-mining-jaringan-syaraf-tiruan/</link>
<pubDate>Thu, 19 Nov 2009 15:22:22 +0000</pubDate>
<dc:creator>Fairuz El Said</dc:creator>
<guid>http://fairuzelsaid.wordpress.com/2009/11/19/metode-data-mining-jaringan-syaraf-tiruan/</guid>
<description><![CDATA[Pada bagian akan kita bahas secara ringkas tentang konsep jaringan syaraf tiruan yang merupakan  sal]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Pada bagian akan kita bahas secara ringkas tentang <strong>konsep jaringan syaraf tiruan </strong>yang merupakan  salah satu metode analisi yang dapat digunakan dalam data mining.</p>
<p><strong>Definisi </strong><strong> jaringan syaraf tiruan</strong></p>
<ul>
<li><em>Hecht-Nielsend (1988), </em>&#8220;Suatu neural network (NN), adalah suatu struktur pemroses informasi yang terdistribusi dan bekerja secara paralel, yang terdiri atas elemen pemroses (yang memiliki memori lokal dan beroperasi dengan informasi lokal) yang diinterkoneksi bersama dengan alur sinyal searah yang disebut koneksi. Setiap elemen pemroses memiliki koneksi keluaran tunggal yang bercabang (fan out) ke sejumlah koneksi kolateral yang diinginkan (setiap koneksi membawa sinyal yang sama dari keluaran elemen pemroses tersebut). Keluaran dari elemen pemroses tersebut dapat merupakan sebarang jenis persamaan matematis yang diinginkan. Seluruh proses yang berlangsung pada setiap elemen pemroses harus benar-benar dilakukan secara lokal, yaitu keluaran hanya bergantung pada nilai masukan pada saat itu yang diperoleh melalui koneksi dan nilai yang tersimpan dalam memori lokal&#8221;.</li>
<li><em> Haykin, S. (1994), </em>Sebuah jaringan saraf adalah sebuah prosesor yang terdistribusi paralel dan mempuyai kecenderungan untuk menyimpan pengetahuan yang didapatkannya dari pengalaman dan membuatnya tetap tersedia untuk digunakan. Hal ini menyerupai kerja otak dalam dua hal yaitu: 1. Pengetahuan diperoleh oleh jaringan melalui suatu proses belajar. 2. Kekuatan hubungan antar sel saraf yang dikenal dengan bobot sinapsis digunakan untuk menyimpan pengetahuan.</li>
<li><em>Zurada, J.M. (1992),</em> Sistem saraf tiruan atau jaringan saraf tiruan adalah sistem selular fisik yang dapat memperoleh, menyimpan dan menggunakan pengetahuan yang didapatkan dari pengalaman.</li>
<li>DARPA Neural Network Study (1988), Sebuah jaringan syaraf adalah sebuah sistem yang dibentuk dari sejumlah elemen pemroses sederhana yang bekerja secara paralel dimana fungsinya ditentukan oleh stuktur jaringan, kekuatan hubungan, dan pegolahan dilakukan pada komputasi elemen atau nodes</li>
<li>
<div>JJ Siang,  sistem pemrosesan informasi yang memiliki karakteristik mirip dengan jaringan syaraf manusia.</div>
</li>
</ul>
<p style="text-align:center;"><!--more--></p>
<p><strong>Asumsi </strong><strong> Jaringan Syaraf  Tiruan</strong></p>
<p><strong> </strong>Jaringan syaraf tiruan dibentuk sebagai generalisasi model matematika dari jaringan syaraf manusia, dengan asumsi JST:</p>
<ul>
<li>
<div>Pemrosesan terjadi pada banyak elemen yang sederhana</div>
</li>
<li>
<div>Sinyal dikirim diantara neuron2 melalui sinapsis</div>
</li>
<li>
<div>Sinapsis memiliki bobot yang akan memperkuat atau memperlemah sinyal.</div>
</li>
<li>
<div>Output ditentukan menggunakan fungsi aktivasi yang dikenakan pada jumlah input yang diterima</div>
</li>
<li>
<div>Output dibandingan dengan suatu tracehold.</div>
</li>
</ul>
<p><strong>Syaraf Biologi</strong></p>
<p>Karakteristik syaraf biologi:<strong><br />
</strong></p>
<ul>
<li>Jaringan Syaraf Tiruan keluar dari penelitian kecerdasan buatan, terutama percobaan untuk menirukan  fault-tolerence dan kemampuan untuk belajar dari sistem  syaraf biologi dengan model struktur low-level  dari otak.<br />
Otak terdiri dari sekitar (10.000.000.000) sel syaraf yang saling berhubungan.</li>
<li>Sel syaraf mempunyai cabang struktur input (dendrites), sebuah inti sel dan percabangan  struktur output (axon).  Axon dari sebuah sel terhubung dengan dendrites yang lain  melalui sebuah synapse.</li>
<li>Ketika sebuah sel  syaraf aktif, kemudian menimbulkan suatu signal electrochemical pada axon.  Signal ini melewati synapses menuju ke sel syaraf  yang lain.</li>
<li>Sebuah sel syaraf lain akan mendapatkan signal jika memenuhi batasan tertentu yang sering disebut dengan nilai ambang atau (threshold).</li>
</ul>
<p style="text-align:center;">
<div id="attachment_449" class="wp-caption aligncenter" style="width: 310px"><a href="http://fairuzelsaid.wordpress.com/files/2009/11/jst-susunan-syaraf-manusia1.gif"><img class="size-medium wp-image-449" title="JST - Susunan Syaraf manusia" src="http://fairuzelsaid.wordpress.com/files/2009/11/jst-susunan-syaraf-manusia1.gif?w=300" alt="JST - Susunan Syaraf manusia" width="300" height="172" /></a><p class="wp-caption-text">JST - Susunan Syaraf manusia</p></div>
<ul>
<li>Tidak ada dua otak manusia yang sama, setiap otak selalu berbeda. Beda dalam ketajaman, ukuran dan pengorganisasiannya. Salah satu cara untuk memahami bagaimana otak bekerja adalah dengan mengumpulkan informasi dari sebanyak mungkin scan otak manusia dan memetakannya. Hal tersebut merupakan upaya untuk menemukan cara kerja rata-rata otak manusia itu. Peta otak manusia diharapkan dapat menjelaskan misteri mengenai bagaimana otak mengendalikan setiap tindak tanduk manusia, mulai dari penggunaan bahasa hingga gerakan. Walaupun demikian kepastian cara kerja otak manusia masih merupakan suatu misteri. Meski beberapa aspek dari prosesor yang menakjubkan ini telah diketahui tetapi itu tidaklah banyak. Beberapa aspek-aspek tersebut, yaitu :
<ul>
<li> Tiap bagian pada otak manusia memiliki alamat, dalam bentuk formula kimia, dan sistem saraf manusia berusaha untuk mendapatkan alamat yang cocok untuk setiap akson (saraf penghubung) yang dibentuk.</li>
<li>Melalui pembelajaran, pengalaman dan interaksi antara sistem maka struktur dari otak itu sendiri akan mengatur fungsi-fungsi dari setiap bagiannya.</li>
<li>Axon-axon pada daerah yang berdekatan akan berkembang dan mempunyai bentuk fisik mirip, sehingga terkelompok dengan arsitektur tertentu pada otak.</li>
<li> Axon berdasarkan arsitekturnya bertumbuh dalam urutan waktu, dan terhubung pada struktur otak yang berkembang dengan urutan waktu yang sama.</li>
</ul>
</li>
<li>Berdasarkan keempat aspek tersebut di atas dapat ditarik suatu kesimpulan bahwa otak tidak seluruhnya terbentuk oleh proses genetis. Terdapat proses lain yang ikut membentuk fungsi dari bagian-bagian otak, yang pada akhirnya menentukan bagaimana suatu informasi diproses oleh otak.</li>
<li>Elemen yang paling mendasar dari jaringan saraf adalah sel saraf. Sel-sel saraf inilah membentuk bagian kesadaran manusia yang meliputi beberapa kemampuan umum. Pada dasarnya sel saraf biologi menerima masukan dari sumber yang lain dan mengkombinasikannya dengan beberapa cara, melaksanakan suatu operasi yang non-linear untuk mendapatkan hasil dan kemudian mengeluarkan hasil akhir tersebut.</li>
<li>Dalam tubuh manusia terdapat banyak variasi tipe dasar sel saraf, sehingga proses berpikir manusia menjadi sulit untuk direplikasi secara elektrik. Sekalipun demikian, semua sel saraf alami mempunyai empat komponen dasar yang sama. Keempat komponen dasar ini diketahui berdasarkan nama biologinya yaitu, dendrit, soma, akson, dan sinapsis. Dendrit merupakan suatu perluasan dari soma yang menyerupai rambut dan bertindak sebagai saluran masukan. Saluran masukan ini menerima masukan dari sel saraf lainnya melalui sinapsis. Soma dalam hal ini kemudian memproses nilai masukan menjadi sebuah output yang kemudian dikirim ke sel saraf lainnya melalui akson dan sinapsis.</li>
<li>Penelitian terbaru memberikan bukti lebih lanjut bahwa sel saraf biologi mempunyai struktur yang lebih kompleks dan lebih canggih daripada sel saraf buatan yang kemudian dibentuk menjadi jaringan saraf buatan yang ada sekarang ini. Ilmu biologi menyediakan suatu pemahaman yang lebih baik tentang sel saraf sehingga memberikan keuntungan kepada para perancang jaringan untuk dapat terus meningkatkan sistem jaringan saraf buatan yang ada berdasarkan pada pemahaman terhadap otak biologi.</li>
<li>Sel saraf-sel saraf ini terhubung satu dengan yang lainnya melalui sinapsis. Sel saraf dapat menerima rangsangan berupa sinyal elektrokimiawi dari sel saraf-sel saraf yang lain. Berdasarkan rangsangan tersebut, sel saraf akan mengirimkan sinyal atau tidak berdasarkan kondisi tertentu. Konsep dasar semacam inilah yang ingin dicoba para ahli dalam menciptakan sel tiruan.</li>
</ul>
<p><strong>Jaringan Syaraf Manusia</strong></p>
<ul>
<li>Struktur sangat kompleks</li>
<li>Kemampuan luar biasa</li>
<li>Terdiri dari Neuron dan Penghubung (sinapsis)</li>
<li>Neuron: 1012 dan Sinapsis: 6.1018</li>
<li>Karena jumlah yang banyak, maka mampu mengenali pola, melakukan perhitungan dan mengontrol tubuh dengan kecepatan yang lebih tinggi dari pada komputer digital. Ex: mampu mengenali wajah seseorang yang sedikit berubah</li>
<li>
<div>Otak mempunyai struktur yang menakjubkan karena kemampuannya membentuk sendiri aturan/aturan atau pola  berdasarkan pengalaman</div>
</li>
<li>
<div>Jumlah dan kemampuanya berkembang seiring pertumbuhan fisik manusia</div>
</li>
<li>
<div>Tahun pertama umur manusia, terbentuk 1 Juta sinapsis perdetiknya</div>
</li>
</ul>
<p><strong>Sejarah JST<br />
</strong></p>
<p>Saat ini bidang kecerdasan buatan dalam usahanya menirukan intelegensi manusia, belum mengadakan pendekatan dalam bentuk fisiknya melainkan dari sisi yang lain. Pertama-tama diadakan studi mengenai teori dasar mekanisme proses terjadinya intelegensi. Bidang ini disebut <em>‘Cognitive Science’</em>. Dari teori dasar ini dibuatlah suatu model untuk disimulasikan pada komputer, dan dalam perkembangannya yang lebih lanjut dikenal berbagai sistem kecerdasan buatan yang salah satunya adalah jaringan saraf tiruan. Dibandingkan dengan bidang ilmu yang lain, jaringan saraf tiruan relatif masih baru. Sejumlah literatur menganggap bahwa konsep jaringan saraf tiruan bermula pada makalah Waffen McCulloch dan Walter Pitts pada tahun 1943. Dalam makalah tersebut mereka mencoba untuk memformulasikan model matematis sel-sel otak. Metode yang dikembangkan berdasarkan sistem saraf biologi ini, merupakan suatu langkah maju dalam industri komputer.</p>
<p>Berikut perkembangan sejarah jararingan syaraf tiruan diurutkan berdasarkan waktu:</p>
<ul>
<li>1943, McCulloch &#38; Pitts memperkenalkan JST sederhana.</li>
<li>1958, Rosenbelatt JST dengan model perceteron.</li>
<li>1960, Widrow dan Hoff, JST Percepteron dengan pelatihan.</li>
<li>1986, Rumelhart, JST Backpropagation (beberapa layer)</li>
<li>1976, Kohenen, JST model kohenen</li>
<li>1982, Hopfield, JST model Hopfield</li>
</ul>
<p><strong>Komponen Neuron</strong></p>
<ul>
<li><em>Dendrit</em>, berfungsi sebagai alat input penerima impuls yang dikirim secara elektrokimiawi oleh dari neuron lain melalui celah sinapsis. Pada celah sinasis ini, kemudian impuls tersesbut diperkuat atau diperlemah.</li>
<li><em>Soma</em>, berfungsi menjumlahkan impuls-impuls yang masuk.</li>
<li><em>Axon</em>, berfungsi menerima jumlahan impuls yang cukup kuat dan melebihi ambang batas (trasehold) dan mengirimkannya ke neuron yang lain.</li>
</ul>
<p><strong>Aplikasi </strong><strong>Jaringan Syaraf Tiruan</strong></p>
<div>
<ul>
<li><em>Pengenalan Pola, </em>Mengenali pola: huruf, angka, suara, tanda tangan</li>
<li><em>Pemrosesan Sinyal,</em> merduksi noise dalam salauran telepon</li>
<li><em>Peramalam, </em>Memprediksi yang akan terjadi pada masa depan berdasarkan pola kejadian pada masa lampau.</li>
</ul>
</div>
<div><strong>Kelebihan </strong><strong>Jaringan Syaraf Tiruan</strong></div>
<div>
<ul>
<li><em>Handal</em>,  Jaringan Syaraf Tiruan adalah teknik pemodelan yang sangat memuaskan yang  dapat membuat model suatu fungsi yang sangat kompleks.  Khususnya Jaringan Syaraf  Tiruan nonlinear.  Sejak beberapa tahun, model linear umumnya digunakan dimana  model linear dikenal dengan strategi optimasi.  Jaringan Syaraf Tiruan juga menggunakanmodel nonlinear dengan berbagai variabel.</li>
<li><em>Mudah digunakan. </em> Jaringan Syaraf Tiruan dipelajari dengan contoh. PenggunaJaringan Syaraf Tiruan mengumpulkan data dan melakukan pembelajaran algoritmauntuk mempelajari secara otomatis struktur data, sehingga pengguna tidak memerlukan pengetahuan khusus mengenai bagaimana memilih dan mempersiapkan data, bagaimanamemilih Jaringan Syaraf Tiruan yang tepat, bagaimana membaca hasil, tingkatanpengetahuan yang diperlukan untuk keberhasilan Menggunakan Jaringan Syaraf Tiruantidak lebih dari pemecahan masalah yang menggunakan metode statistik nonlinear yangtelah dikenal.</li>
</ul>
</div>
<div><strong>Keterbatasn </strong><strong>Jaringan Syaraf Tiruan</strong></div>
<div>
<ul>
<li>Ketidak akuratan hasil yang diperoleh</li>
<li>Bekerja berdasarkan pola yang terbentuk pada inputnya.</li>
</ul>
</div>
<div id="_mcePaste" style="overflow:hidden;position:absolute;left:-10000px;top:2255px;width:1px;height:1px;"><!--[if gte mso 9]&#62;  Normal 0     false false false  EN-US X-NONE AR-SA              MicrosoftInternetExplorer4              &#60;![endif]--><!--[if gte mso 9]&#62;                                                                                                                                            &#60;![endif]--><!--  /* Font Definitions */  @font-face 	{font-family:"Cambria Math"; 	panose-1:2 4 5 3 5 4 6 3 2 4; 	mso-font-charset:1; 	mso-generic-font-family:roman; 	mso-font-format:other; 	mso-font-pitch:variable; 	mso-font-signature:0 0 0 0 0 0;}  /* Style Definitions */  p.MsoNormal, li.MsoNormal, div.MsoNormal 	{mso-style-unhide:no; 	mso-style-qformat:yes; 	mso-style-parent:""; 	margin:0in; 	margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:12.0pt; 	font-family:"Times New Roman","serif"; 	mso-fareast-font-family:"Times New Roman";} .MsoChpDefault 	{mso-style-type:export-only; 	mso-default-props:yes; 	font-size:10.0pt; 	mso-ansi-font-size:10.0pt; 	mso-bidi-font-size:10.0pt;} @page Section1 	{size:8.5in 11.0in; 	margin:1.0in 1.25in 1.0in 1.25in; 	mso-header-margin:.5in; 	mso-footer-margin:.5in; 	mso-paper-source:0;} div.Section1 	{page:Section1;} --><!--[if gte mso 10]&#62; &#60;!   /* Style Definitions */  table.MsoNormalTable 	{mso-style-name:&#34;Table Normal&#34;; 	mso-tstyle-rowband-size:0; 	mso-tstyle-colband-size:0; 	mso-style-noshow:yes; 	mso-style-priority:99; 	mso-style-qformat:yes; 	mso-style-parent:&#34;&#34;; 	mso-padding-alt:0in 5.4pt 0in 5.4pt; 	mso-para-margin:0in; 	mso-para-margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:11.0pt; 	font-family:&#34;Calibri&#34;,&#34;sans-serif&#34;; 	mso-ascii-font-family:Calibri; 	mso-ascii-theme-font:minor-latin; 	mso-fareast-font-family:&#34;Times New Roman&#34;; 	mso-fareast-theme-font:minor-fareast; 	mso-hansi-font-family:Calibri; 	mso-hansi-theme-font:minor-latin; 	mso-bidi-font-family:Arial; 	mso-bidi-theme-font:minor-bidi;} --> <!--[endif]-->
<p>&#160;</p>
<p class="MsoNormal" style="text-align:justify;text-indent:.5in;line-height:200%;margin:5pt 0 5pt .5in;">Bidang <span style="color:black;"><a href="/wiki/Kecerdasan_buatan"><span style="color:black;text-decoration:none;">kecerdasan buatan</span></a> dalam usahanya menirukan <a href="/w/index.php?title=Intelegensi&#38;action=edit"><span style="color:black;text-decoration:none;">intelegensi</span></a> </span>manusia, belum mengadakan pendekatan dalam bentuk fisiknya melainkan dari sisi yang lain. Pertama-tama diadakan studi mengenai teori dasar mekanisme proses terjadinya intelegensi. Bidang ini disebut <em>‘Cognitive Science’</em>. Dari teori dasar ini dibuatlah suatu model untuk disimulasikan <span style="color:black;">pada <a href="/wiki/Komputer"><span style="color:black;text-decoration:none;">komputer</span></a>,</span> dan dalam perkembangannya yang lebih lanjut dikenal berbagai sistem kecerdasan buatan yang salah satunya adalah jaringan saraf tiruan. Dibandingkan dengan bidang ilmu yang lain, jaringan saraf tiruan relatif masih baru. Sejumlah literatur menganggap bahwa konsep jaringan saraf tiruan bermula pada makalah <span style="color:black;"><a href="/w/index.php?title=Waffen_McCulloch&#38;action=edit"><span style="color:black;text-decoration:none;">Waffen McCulloch</span></a> dan <a href="/w/index.php?title=Walter_Pitts&#38;action=edit"><span style="color:black;text-decoration:none;">Walter Pitts</span></a> pada tahun <a href="/wiki/1943"><span style="color:black;text-decoration:none;">1943</span></a>. Dalam makalah tersebut mereka mencoba untuk memformulasikan model matematis sel-sel <a href="/wiki/Otak"><span style="color:black;text-decoration:none;">otak</span></a></span>. Metode yang dikembangkan berdasarkan sistem saraf biologi ini, merupakan suatu langkah maju dalam industri komputer.</p>
</div>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[this is not how i expected "friending" you to come back to haunt me]]></title>
<link>http://permut.wordpress.com/2009/11/19/oh-dear/</link>
<pubDate>Thu, 19 Nov 2009 13:49:44 +0000</pubDate>
<dc:creator>jimi adams</dc:creator>
<guid>http://permut.wordpress.com/2009/11/19/oh-dear/</guid>
<description><![CDATA[This story&#8217;s been making the rounds rather quickly. For now, i&#8217;ll link it sans commentar]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p><a href="http://www.fastcompany.com/blog/lucas-conley/advertising-branding-and-marketing/company-we-keep">This story</a>&#8217;s been making the rounds rather quickly. For now, i&#8217;ll link it sans commentary. Basically it&#8217;s a company that&#8217;s claiming to be able to make inferences about individuals&#8217; credit ratings based solely on their social networks as publicly observable through Facebook, Twitter, etc.<br />
(via <a href="http://twitter.com/mysocnet">Keith Hampton</a> and <a href="http://twitter.com/valdiskrebs">Valdis Krebs</a>)</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Personas]]></title>
<link>http://interactivecity.wordpress.com/2009/11/18/personas/</link>
<pubDate>Wed, 18 Nov 2009 22:44:38 +0000</pubDate>
<dc:creator>candreoli</dc:creator>
<guid>http://interactivecity.wordpress.com/2009/11/18/personas/</guid>
<description><![CDATA[In a world where fortunes are sought through data-mining vast information repositories, the computer]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p><a href="http://interactivecity.wordpress.com/files/2009/11/personas_mit1.jpg"><img class="alignnone size-full wp-image-52" title="personas_mit" src="http://interactivecity.wordpress.com/files/2009/11/personas_mit1.jpg" alt="" width="510" height="255" /></a></p>
<p>In a world where fortunes are sought through data-mining vast information repositories, the computer is our indispensable but far from infallible assistant. Personas demonstrates the computer&#8217;s uncanny insights and its inadvertent errors, such as the mischaracterizations caused by the inability to separate data from multiple owners of the same name. It is meant for the viewer to reflect on our current and future world, where digital histories are as important if not more important than oral histories, and computational methods of condensing our digital traces are opaque and socially ignorant.</p>
<p><a href="http://personas.media.mit.edu/">Link</a></p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Apache Mahout 0.2 Released - Now classify, cluster and generate recommendations!]]></title>
<link>http://techdigger.wordpress.com/2009/11/18/apache-mahout-0-2-released-now-classify-cluster-and-generate-recommendations/</link>
<pubDate>Wed, 18 Nov 2009 13:48:32 +0000</pubDate>
<dc:creator>TechDigger</dc:creator>
<guid>http://techdigger.wordpress.com/2009/11/18/apache-mahout-0-2-released-now-classify-cluster-and-generate-recommendations/</guid>
<description><![CDATA[Apache Mahout For the past two years, I have been working with this amazing bunch of people whilst, ]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><div class="wp-caption alignright" style="width: 92px"><a href="http://lucene.apache.org/mahout"><img src="http://lucene.apache.org/mahout/images/Mahout-logo-82x100.png" alt="Apache Mahout" width="82" height="100" /></a><p class="wp-caption-text">Apache Mahout</p></div>
<p align="justify">
For the past two years, I have been working with this amazing bunch of people whilst, being paid by Google in their summer of code program in a project called <a href="http://lucene.apache.org/mahout">Mahout</a>. And like the name says, it is trying to tame the young beast known as <a href="http://hadoop.apache.org">Hadoop</a>. I have received a lot from the community. Being part of the project, I have got some real exposure to Java, data mining, machine learning and hands on experience over distributed systems like <a href="http://hadoop.apache.org">Hadoop</a>, <a href="http://hadoop.apache.org/hbase">Hbase</a>, <a href="http://hadoop.apache.org/pig">Pig</a>.  The project is still in its infancy, but, its ambitions are high in the sky. I am happy to announce the second release of the project, and proud to be a part of it. I hope people will adapt it in their projects and that it becomes the defacto standard machine learning library the way lucene and hadoop has become in their respective focus areas.
</p>
<p>If you are already excited and want to take it for a ride, read Grant&#8217;s article on IBM developerworks <a href="https://www.ibm.com/developerworks/java/library/j-mahout/index.html">here</a><br />
The release announcement below</p>
<div align="justify" style="font-size:90%;border:1px dashed #337733;padding:10px;">
<p>Apache Mahout 0.2 has been released and is now available for public download at<a href="http://www.apache.org/dyn/closer.cgi/lucene/mahout">http://www.apache.org/dyn/closer.cgi/lucene/mahout</a></p>
<p>Up to date maven artifacts can be found in the Apache repository at<br />
<a href="https://repository.apache.org/content/repositories/releases/org/apache/mahout/">https://repository.apache.org/content/repositories/releases/org/apache/mahout/</a></p>
<p>Apache Mahout is a subproject of Apache Lucene with the goal of delivering scalable machine learning algorithm implementations under the Apache license. http://www.apache.org/licenses/LICENSE-2.0</p>
<p>Mahout is a machine learning library meant to scale: Scale in terms of community to support anyone interested in using machine learning. Scale in terms of business by providing the library under a commercially friendly, free software license. Scale in terms of computation to the size of data we manage today.</p>
<p>Built on top of the powerful map/reduce paradigm of the Apache Hadoop project, Mahout lets you solve popular machine learning problem settings like clustering, collaborative filtering and classification<br />
over Terabytes of data over thousands of computers.</p>
<p>Implemented with scalability in mind the latest release brings many performance optimizations so that even in a single node setup the library performs well.</p>
<p>The complete changelist can be found here:</p>
<p><a href="http://issues.apache.org/jira/browse/MAHOUT/fixforversion/12313278">http://issues.apache.org/jira/browse/MAHOUT/fixforversion/12313278</a></p>
<p>New Mahout 0.2 features include</p>
<ul>
<li>Major performance enhancements in Collaborative Filtering, Classification and Clustering</li>
<li>New: Latent Dirichlet Allocation(LDA) implementation for topic modelling</li>
<li>New: Frequent Itemset Mining for mining top-k patterns from a list of transactions</li>
<li>New: Decision Forests implementation for Decision Tree classification (In Memory &#38; Partial Data)</li>
<li>New: HBase storage support for Naive Bayes model building and classification</li>
<li>New: Generation of vectors from Text documents for use with Mahout Algorithms</li>
<li>Performance improvements in various Vector implementations</li>
<li>Tons of bug fixes and code cleanup</li>
</ul>
<p>Getting started: New to Mahout?</p>
<ul>
<li> Download Mahout at <a href="http://www.apache.org/dyn/closer.cgi/lucene/mahout">http://www.apache.org/dyn/closer.cgi/lucene/mahout</a></li>
<li> Check out the Quick start: <a href="http://cwiki.apache.org/MAHOUT/quickstart.html">http://cwiki.apache.org/MAHOUT</a></li>
<li> Read the Mahout Wiki: <a href="http://cwiki.apache.org/MAHOUT">http://cwiki.apache.org/MAHOUT</a></li>
<li> Join the community by subscribing to mahout-user@lucene.apache.org</li>
<li> Give back: <a href="http://www.apache.org/foundation/getinvolved.html">http://www.apache.org/foundation/getinvolved.html</a></li>
<li> Consider adding yourself to the power by Wiki page:<a href="http://cwiki.apache.org/MAHOUT/poweredby.html">http://cwiki.apache.org/MAHOUT/poweredby.html</a></li>
</ul>
<p>For more information on Apache Mahout, see <a href="http://lucene.apache.org/mahout">http://lucene.apache.org/mahout</a>
</div>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Proposals for Big Data web mining talk]]></title>
<link>http://bixolabs.com/2009/11/16/proposals-for-big-data-web-mining-talk/</link>
<pubDate>Mon, 16 Nov 2009 19:44:05 +0000</pubDate>
<dc:creator>kkrugler</dc:creator>
<guid>http://bixolabs.com/2009/11/16/proposals-for-big-data-web-mining-talk/</guid>
<description><![CDATA[I&#8217;m going to be giving a talk at the Bay Area ACM data mining SIG in December, and I need to f]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>I&#8217;m going to be giving a talk at the <a href="http://sfbayacm.org/dmsig.php" target="_blank">Bay Area ACM data mining SIG</a> in December, and I need to finalize my topic soon &#8211; like today <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>I was going to expand on my <a href="/2009/11/02/elastic-web-mining-talk/">Elastic Web Mining talk</a> (&#8220;Web mining for SEO keywords&#8221;) from the <a href="http://events.linkedin.com/events/142420/clickthru" target="_blank">ACM data mining unconference</a> a few weeks back.</p>
<p>But the fact that I&#8217;ll have 10s to 100s of millions of web page data to work with, from the <a href="/datasets/public-terabyte-dataset-project/">public terabyte dataset</a> crawl, makes me want to apply <a href="http://lucene.apache.org/mahout/" target="_blank">Mahout</a> to the data.</p>
<p>I tossed out one idea on the Mahout list, looking for input:</p>
<ul>
<li>I&#8217;d like to automatically generate a timeline of events.</li>
<li>I can extract potential dates from web pages, using simple patterns.</li>
<li>I can extract 2-to-4 word terms (skipping those which start/end with stop words) from pages that have extracted dates.</li>
<li>And then by the miracle of LDA (<a href="http://cwiki.apache.org/MAHOUT/latent-dirichlet-allocation.html" target="_blank">latent dirichlet allocation</a>), I get clusters of date+terms.</li>
</ul>
<p>But in this example, I don&#8217;t actually need LDA &#8211; I have my &#8220;topic&#8221;, which is the date. So it might not be a very good example. And will LDA scale to 100M web pages (which implies many billions of terms)? And how will I handle the same term (e.g. &#8220;barack inauguration&#8221;) being associated with a cluster of dates, since stories from a range of dates before/after the event will contain that same term?</p>
<p>So it could be a non-starter &#8211; I&#8217;m hoping for input on feasibility, level of effort, or if somebody else has a suggestion for something simple that could provide interesting/obvious results, I&#8217;m all ears.</p>
<p>Thanks!</p>
<p>&#8211; Ken</p>
<p>PS &#8211; my current fall-back is to just do brute-force map-reduce to come up with lists of terms per unique date, pick the top N, and maybe do some filtering for top-level terms that have too many associated unique dates. Which unfortunately wouldn&#8217;t use Mahout, but would be an example of crunching lots of data.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[การเก็บ และ สร้าง ฐานข้อมูล Twitter Search สำหรับ Data Mining]]></title>
<link>http://fbong.wordpress.com/2009/11/16/%e0%b9%80%e0%b8%81%e0%b9%87%e0%b8%9a-%e0%b8%aa%e0%b8%a3%e0%b9%89%e0%b8%b2%e0%b8%87-%e0%b8%90%e0%b8%b2%e0%b8%99%e0%b8%82%e0%b9%89%e0%b8%ad%e0%b8%a1%e0%b8%b9%e0%b8%a5-twitter-mining/</link>
<pubDate>Mon, 16 Nov 2009 11:06:38 +0000</pubDate>
<dc:creator>fbong</dc:creator>
<guid>http://fbong.wordpress.com/2009/11/16/%e0%b9%80%e0%b8%81%e0%b9%87%e0%b8%9a-%e0%b8%aa%e0%b8%a3%e0%b9%89%e0%b8%b2%e0%b8%87-%e0%b8%90%e0%b8%b2%e0%b8%99%e0%b8%82%e0%b9%89%e0%b8%ad%e0%b8%a1%e0%b8%b9%e0%b8%a5-twitter-mining/</guid>
<description><![CDATA[หลายคน คงเคยใช้ search.twitter.com นะครับ (ผมก็เคยลงบทความ) ซึ่งทำให้เราสามารถ ค้นหา content หรือ สิ]]></description>
<content:encoded><![CDATA[หลายคน คงเคยใช้ search.twitter.com นะครับ (ผมก็เคยลงบทความ) ซึ่งทำให้เราสามารถ ค้นหา content หรือ สิ]]></content:encoded>
</item>
<item>
<title><![CDATA[Is Big Brother Watching?]]></title>
<link>http://charlenecroft.wordpress.com/2009/11/15/is-big-brother-watching/</link>
<pubDate>Sun, 15 Nov 2009 23:50:08 +0000</pubDate>
<dc:creator>charlenecroft</dc:creator>
<guid>http://charlenecroft.wordpress.com/2009/11/15/is-big-brother-watching/</guid>
<description><![CDATA[As with other increasingly complex concepts, privacy is one that has many nuanced meanings.  That is]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>As with other increasingly complex concepts, privacy is one that has many nuanced meanings.  That is, the more we experience issues of privacy, and are forced to create our own boundaries of public and private, a sliding scale of acceptability emerges for us individually.  Then, we have to mix in those personal expectations of privacy, and reconcile them with third-party definitions and policies of privacy.</p>
<p>Everyone needs a privacy policy these days.</p>
<p>And it makes sense to a certain degree, but ultimately I wonder, does any ever actually read privacy policies?  And if they do pay attention to such things, could a bad privacy policy change a consumers mind about using the product?  I also wonder, to what extent do we value our privacy?  Sure, when people think that their privacy has been violated, it is a big deal&#8230; but we sign over our rights to privacy on a daily basis.  Especially those of us who are heavily engaged with the internet.</p>
<p>Last month, in an editorial piece on CNN, Pete Cashmore (a social media consultant) boldly stated that, &#8220;<a href="http://edition.cnn.com/2009/OPINION/10/28/cashmore.online.privacy/" target="_blank">Privacy is dead, and social media is holding the smoking gun.</a>&#8221; He gets into the nuts and bolts of why people embed themselves in these digital networks.  He speaks of the &#8220;attention economy&#8221; and the notion that a public life is a successful one.  The more public you are, the more capital you will earn.  An interesting notion, and probably not too far from the truth.</p>
<p>By engaging in the participatory infrastructure of Cyberspace, we record and post our lives, for all to see and analyse.  Even when we are clever and set up our privacy controls so that our &#8220;work friends&#8221; can&#8217;t invade our personal profiles, all of our online activity is continually fed into a massive stream of data which I imagine looks something like the Matrix.  Every keystroke, every website visit, every tweet, every photo, every video we share and look at&#8230; all being fed into numerous databases for numerous purposes.</p>
<p>As a heavy feeder of data into these streams, I have tried to reconcile my private life with my public one; but I know that if I want to use them, and try to make some headway into the &#8220;attention economy&#8221;, I must reasonably expect that the price for participating in Cyberspace, is the recording of my every movement within it.</p>
<p>Of course, the underlying assumption in my (and perhaps your) use is that there is no unifying program tying all the little data droppings we leave behind in our daily lives&#8230; no one  actually listening to and watching the Matrix&#8230; This is what allows us to easily invoke Big Brother as if it were still a fictional archetype of a society. Big Brother may have the capability of watching, but he only pays attention when you are breaking the rules, or exploiting personal data.</p>
<p>But what about the people who don&#8217;t participate in Cyberspace, and cite privacy issues as their reason number 1.</p>
<p>Well, <a href="http://thechronicleherald.ca/NovaScotian/1152888.html" target="_blank">as the Kelly Shiers from the Chronicle Herald reminds us today, the allegorical Big Brother is potentially watching just as closely in Natural Space as he is in Cyberspace.</a></p>
<p>The HRM has over 1200 cameras in use across the city in facilities and on Metro Transit Buses.  That figure is no where close to the total number of CCTV cameras in use across the city, and indeed the whole province.  The article indicates (and I suspect most public opinion agrees) that the primary purpose for these cameras is safety, and crime prevention. Although no one is monitoring the cameras, and it is hard to imagine a camera stopping a crime in progress even if they are being monitored&#8230; it is generally acceptable that CCTV cameras are a good way to enhance our personal safety.</p>
<p>The Brits have been doing it for years already, and major cities across Canada seem to be adopting a model of surveillance, with one noticable difference from the way it is carried out over there.  In the UK, you are constantly being reminded that you are being watched by an omnious voice that comes across the subway speakers every 10 minutes, and asked to assist the CCTV cameras and report &#8220;all suspicious activity to authorities.&#8221;  The authorities want people to feel like Big Brother is watching (even if he isn&#8217;t).</p>
<p>In Canada we like to do these things more subtley and friendly&#8230; just check out the picture with the associated Herald Story&#8230; Smile, you are on camera.</p>
<p>And we complacently smile and wave away our expectations of privacy&#8230; enthusiastically even, when the Google car drives by.</p>
<p>But where do we draw the line in the sand?  We accept public surveillance in the name of security and public safety.  We find electronic banking convenient and reward cards rewarding.  We accept most of the cameras and data-tracking.  We accept the technology which invades and kills our privacy&#8230; in fact we love it. We assist in the creation of the panoptic mosaic which is our technocracy by documenting our lives ourselves, and sharing it with anyone who cares to take an interest.</p>
<p>Perhaps CCTV recording will always remain okay and acceptable by the public, so long as it is related to our activities which are conducted in public.</p>
<p>And perhaps it will remain okay when we install cameras in the houses of people on welfare, like they are now doing in the UK (<a href="http://www.wired.com/gadgetlab/2009/08/britain-to-put-cctv-cameras-inside-private-homes/" target="_blank">as reported by Wired Magazine in August</a>). It is apparently a reasonable and rational thing to do over there&#8230; so why not here?</p>
<p>I&#8217;m glad there are watchdog organizations out there who make it their business to advocate for a human right to privacy&#8230; but ultimately I&#8217;m just happy for the little claims to privacy I can still make.  I still feel as though I am in relative control of my public/private boundaries. Though I acknowledge that control is fairly superficial, because as we are so often reminded &#8211; Big Brother could watch if he wanted to.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Algoritma Generalized Sequential Pattern untuk menggali data sekuensial sirkulasi buku pada perpustakaan UK Petra]]></title>
<link>http://wahyudisetiawan.wordpress.com/2009/11/15/algoritma-generalized-sequential-pattern-untuk-menggali-data-sekuensial-sirkulasi-buku-pada-perpustakaan-uk-petra/</link>
<pubDate>Sat, 14 Nov 2009 22:08:05 +0000</pubDate>
<dc:creator>admin</dc:creator>
<guid>http://wahyudisetiawan.wordpress.com/2009/11/15/algoritma-generalized-sequential-pattern-untuk-menggali-data-sekuensial-sirkulasi-buku-pada-perpustakaan-uk-petra/</guid>
<description><![CDATA[Dengan mengetahui pattern sekuensial peminjaman buku pada perpustakaan, banyak putusan/kebijakan str]]></description>
<content:encoded><![CDATA[Dengan mengetahui pattern sekuensial peminjaman buku pada perpustakaan, banyak putusan/kebijakan str]]></content:encoded>
</item>
<item>
<title><![CDATA[U.S. Spies Buy Stake in Firm That Monitors Blogs, Tweets ]]></title>
<link>http://yahstruthseeker.wordpress.com/2009/11/13/u-s-spies-buy-stake-in-firm-that-monitors-blogs-tweets/</link>
<pubDate>Fri, 13 Nov 2009 05:09:02 +0000</pubDate>
<dc:creator>yahstruthseeker</dc:creator>
<guid>http://yahstruthseeker.wordpress.com/2009/11/13/u-s-spies-buy-stake-in-firm-that-monitors-blogs-tweets/</guid>
<description><![CDATA[Source: Just get us there By Noah Shachtman Wired | America’s spy agencies want to read your blog po]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p><strong>Source: Just get us there</strong></p>
<p><strong>By Noah Shachtman</strong></p>
<p><!-- s9ymdb:3476 --><img src="http://justgetthere.us/blog/uploads/cia-men.jpg" alt="" width="191" height="215" /><a title="http://www.wired.com/dangerroom/2009/10/exclusive-us-spies-buy-stake-in-twitter-blog-monitoring-firm/" href="http://justgetthere.us/blog/exit.php?url_id=28392&#38;entry_id=5204"><strong><span style="color:#000099;font-size:small;">Wired</span></strong></a> &#124; America’s spy agencies want to read your blog posts, keep track of your Twitter updates — even check out your book reviews on Amazon.</p>
<p><strong><span style="color:#000099;font-size:small;">In-Q-Tel</span></strong>, the investment arm of the CIA and the wider intelligence community, is putting cash into <strong><span style="color:#000099;font-size:small;">Visible Technologies</span></strong>, a software firm that specializes in monitoring social media. It’s part of a larger movement within the spy services to get better at using ”<strong><span style="color:#000099;font-size:small;">open source intelligence</span></strong>” — information that’s publicly available, but often hidden in the flood of TV shows, newspaper articles, blog posts, online videos and radio reports generated every day.</p>
<p>Visible crawls over half a million web 2.0 sites a day, scraping more than a million posts and conversations taking place on blogs, online forums, Flickr, YouTube, Twitter and Amazon. (It doesn’t touch closed social networks, like Facebook, at the moment.) Customers get customized, real-time feeds of what’s being said on these sites, based on a series of keywords.</p>
<p>“That’s kind of the basic step — get in and monitor,” says company senior vice president Blake Cahill.</p>
<p>Then Visible “scores” each post, labeling it as positive or negative, mixed or neutral. It examines how influential a conversation or an author is. (”Trying to determine who really matters,” as Cahill puts it.) Finally, Visible gives users a chance to tag posts, forward them to colleagues and allow them to response through a web interface.</p>
<p>In-Q-Tel says it wants Visible to keep track of foreign social media, and give spooks “early-warning detection on how issues are playing internationally,” spokesperson Donald Tighe tells Danger Room.</p>
<p>Of course, such a tool can also be pointed inward, at domestic bloggers or tweeters. Visible already keeps tabs on web 2.0 sites for Dell, AT&#38;T and Verizon. For Microsoft, the company is monitoring the buzz on its Windows 7 rollout. For Spam-maker Hormel, Visible is tracking animal-right activists’ online campaigns against the company.</p>
<p>“Anything that is out in the open is fair game for collection,” says <strong><span style="color:#000099;font-size:small;">Steven Aftergood</span></strong>, who tracks intelligence issues at the Federation of American Scientists. But “even if information is openly gathered by intelligence agencies it would still be problematic if it were used for unauthorized domestic investigations or operations. Intelligence agencies or employees might be tempted to use the tools at their disposal to compile information on political figures, critics, journalists or others, and to exploit such information for political advantage. That is not permissible even if all of the information in question is technically ‘open source.’”</p>
<p>Read rest of article <a href="http://justgetthere.us/blog/archives/U.S.-Spies-Buy-Stake-in-Firm-That-Monitors-Blogs,-Tweets.html">HERE</a></p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Arctic Adventurer: We Feel Fine]]></title>
<link>http://oceanflynn.wordpress.com/2009/11/13/arctic-adventurer-we-feel-fine/</link>
<pubDate>Thu, 12 Nov 2009 22:09:40 +0000</pubDate>
<dc:creator>Maureen Flynn-Burhoe</dc:creator>
<guid>http://oceanflynn.wordpress.com/2009/11/13/arctic-adventurer-we-feel-fine/</guid>
<description><![CDATA[Arctic Adventurer: We Feel Fine, originally uploaded by ocean.flynn. DRAFT Photos of Iqaluit cemeter]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>	<a href="http://www.flickr.com/photos/oceanflynn/4098528929/" title="photo sharing"><img src="http://farm3.static.flickr.com/2627/4098528929_fbbcf1f182.jpg" class="flickr-photo" width="500" alt="Arctic Adventurer: We Feel Fine" /></a><br />
	<span class="flickr-caption"><br />
		<a href="http://www.flickr.com/photos/oceanflynn/4098528929/">Arctic Adventurer: We Feel Fine</a>,<br /> originally uploaded by <a href="http://www.flickr.com/people/oceanflynn/">ocean.flynn</a>.<br />
	</span><br />
DRAFT<br />
Photos of Iqaluit cemetery taken October 2002; Uploaded to Flickr, Trawled by wefeelfine, Linked to wordpress, wefeelfine.org</p>
<p>American artist, Jonathan Harris describes his work on his <a href="http://www.number27.org">website</a>:</p>
<blockquote><p>&#8220;I make (mostly) online projects that reimagine how we relate to our machines and to each other. I use computer science, statistics, storytelling, and visual art as tools. I believe in technology, but I think we need to make it more human. I believe that the Internet is becoming a planetary meta-organism, but that it is up to us to guide its evolution, and to shape it into a space we actually want to inhabit—one that can understand and honor both the individual human and the human collective, just like real life does (<a href="http://www.number27.org/">Harris</a>).&#8221;</p></blockquote>
<p>&#8220;Sep Kamvar is a consulting professor of Computational Mathematics at Stanford University. His research focuses on data mining and information retrieval in large-scale networks. He also is interested in using large amounts of data and accessible media in the study of human nature through art. [Among his other areas of interest he includes] probabilistic models for classification where there is little labeled data (<a href="http://kamvar.org/profile">Sep Kamvar&#8217;s blog profile</a>).&#8221;</p>
<p><strong>Glossary of Terms</strong></p>
<p>Nonlinearity: &#8220;At the beginning of Chapter 5 in Kurt Vonnegut&#8217;s <em>Slaughterhouse-Five</em>, Billy Pilgrim finds himself in jail on the planet of Tralfamadore. Billys captors give him some Tralfamadorian books to pass the time, and while Billy can&#8217;t read Tralfamadorian, he does notice that the books are laid out in brief clumps of text, separated by stars. &#8220;Each clump of symbols is a brief, urgent message &#8212; discribing a situation, a scene,&#8221; explained one of his captors. &#8220;We Tralfamadorians read them all at once, not one after the other. There isn&#8217;t any relationship between all the mssages, except that the author has chosen then carefully, so that, when seen all at once, they produce an image of life that is beautiful and surprising and deep. There is no beginning, no middle, no end, no suspense, no moral, no causes, no effects. What we love in our books are the depths of many marvelous moments seen all at one time.&#8221; Harris and Kamvar aimed to write <em>Almanac of Human Emotions</em> in the telegraphic, schizophrenic manner of tales from Tralfamadore, where the flying saucers are.&#8221; </p>
<p><strong>Open Platforms</strong>: &#8220;The power of open platforms in enabling the easy generation of consumable content has been demonstrated repeatedly on the internet, not only with the web itself, but also with sub-platforms like Facebook, Flickr, Google Gadgets, among others. I am interested in platforms that easily enable high-quality content creation for developers and provide a straightforward content consumption and navigation experience for users.&#8221;</p>
<p><strong>Open Sub-platforms</strong> Open Sub-platforms like Facebook, Flickr, Google Gadgets, among others, facilitate the generation-creation of high-quality consumable content while providing easier access and consumption for users.</p>
<p><strong>Timeline</strong></p>
<p><strong>Webliography and Bibliography</strong></p>
<p>http://wp.me/p1TTs-j6</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Ajakan Nikah Ditengah Malam, Web 2.0 dan Soft Security]]></title>
<link>http://essajiwa.wordpress.com/2009/11/12/ajakan-nikah-ditengah-malam-web-2-0-dan-soft-security/</link>
<pubDate>Thu, 12 Nov 2009 08:39:13 +0000</pubDate>
<dc:creator>essajiwa</dc:creator>
<guid>http://essajiwa.wordpress.com/2009/11/12/ajakan-nikah-ditengah-malam-web-2-0-dan-soft-security/</guid>
<description><![CDATA[Tengah malem, lagi asyik FB-an, &#8220;Buzz&#8221; YM gue ada yang nge-buzz, trus munculah sebuah pe]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Tengah malem, lagi asyik FB-an, &#8220;Buzz&#8221; YM gue ada yang nge-buzz, trus munculah sebuah pesan</p>
<p>&#8220;<strong>Sa kita nikah yuk!</strong>&#8220;.</p>
<p>Weks!! Gue kaget bukan kepalang, cewek mana nih yang lagi mabok ngajakin gw nikah!! Tapi selanjutnya muncul lagi pesan,</p>
<p>&#8220;<strong>Nikah di FB <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </strong>&#8220;.<!--more--></p>
<p>Ooooh.. aakakaka, akhirnya kaget gua terbayarkan. Kiranin gua, ini anak udah sinting, ternyata cuma pengen ganti status hubungan di FB aja. Orang itu temen SMA gua yang emang tukang iseng dan emang rada sinting <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_lol.gif' alt=':lol:' class='wp-smiley' />  , ya udah gua sih ngikut aja.. toh cuman buat lucu-lucuan dan bikin gosip biar terkenal aja <em>*akakakakak sok ngartis beudh daah!!*.</em></p>
<p>Eniwei, kalo ngomongin status hubungan di FB berarti ada hubungannya dengan pengisian data di profil <em>right?</em> Dan rata-rata temen-temen yang gua tanya <em>&#8220;Eh lo isi data sebenernya ga di profil lo?&#8221;</em> akan menjawab <em>&#8220;Ya enggak laah, gila lo gimana kalo data kita disalahgunakan dan dipakai buat yang engga-engga&#8221;. </em>(btw, yang engga-engga itu kayak gimana ya? :p ).</p>
<p>Inilah yang menurut gua sebuah masalah kalo kita ingin mencari informasi di internet, kadar kebenaran data itu sangat rendah, karena mayoritasa pengguna internet sendiri tidak percaya akan keamanan data jika disimpan di internet. Oleh karena itu melakukan data mining dari internet memang mempunyai tantangannya sendiri.</p>
<p>Apalagi sekarang di era Web 2.0 dimana aliran informasi tak hanya datang dari 1 penyedia informasi saja, tapi bisa datang dari konsumen-nya sendiri, dengan kata lain pengguna bisa membuat informasi, bisa membuat sebuah berita, bisa berinteraksi dengan pengguna lain dalam sebuah jejaring sosial, sekarang semua pengguna internet bisa aktif memberi informasi.</p>
<p>Nah masalah kepercayaan informasi menjadi hal yang krusial disini. Jadi inget mata kuliah network securuty, dosen S2 saya pak <a title="Dosen Network Security Gua" href="http://avinanta.staff.gunadarma.ac.id/" target="_blank">Dr.rer.nat. Avinanta Tarigan</a> pernah melakukan penelitian tentang <strong>Reputation Systems</strong> (Soft-Security) yang diharapkan dapat menjadi sebuah pengaman data untuk menunjukan bahwa data itu bisa dipercaya dengan merujuk pada reputasi sebuah entiti-nya.</p>
<p>So? Kalo di FB ada orang yang bilang dirinya single, apa langsung kita percaya? Hehehehe, tapi gak sedikit juga loh orang-orang yang sangat percaya dengan data di FB, bahkan pernah gua denger ada pasangan suami istri bertengkar gara-gara suaminya merubah status &#8220;married&#8221; menjadi &#8220;single&#8221;, kalo gua bisa bikin istilah, berarti mereka adalah contoh  <strong>User 2.0 </strong>:p.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[What do I do with all this data?]]></title>
<link>http://aodaniel.wordpress.com/2009/11/11/what-do-i-do-with-all-this-data/</link>
<pubDate>Wed, 11 Nov 2009 15:54:48 +0000</pubDate>
<dc:creator>Ann O&#39;Daniel</dc:creator>
<guid>http://aodaniel.wordpress.com/2009/11/11/what-do-i-do-with-all-this-data/</guid>
<description><![CDATA[Many companies are now realizing that gathering lots of information on customers can create overwhel]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Many companies are now realizing that gathering lots of information on customers can create overwhelming uncertainty as to how to identify the most valuable and useful information. Even integrating  various sources of data into one central database and organizing it one way or another isn’t actually using this data to drive fact-based action plans that create brand loyalty and exceptional customer experiences.</p>
<p>These companies are asking “How do I know which data gives me the most valuable customer insight?”</p>
<p>As Stuart Lauchlin advises in his Customer Intelligence post: <a href="http://www.mycustomer.com/topic/customer-intelligence-supplement-do-you-really-know-your-customers">Do you really know your customers?</a> There are plenty of technology options to help you gather and sort data. CRM applications, Business Intelligence (BI), Web Analytics, Data Warehousing, Speech Analytics (for unstructured data and social network conversations) and even Loyalty Cards can provide vast amounts of customer intelligence.</p>
<p>And he rightly points out that: “Customer intelligence needs to also reach further than traditional CRM. Organizations have spent millions on customer data capture, storage and analytics but this is based on actions which took place. But what about the actions that didn’t take place? What about the customer who wandered into a shop and didn’t buy anything? Why did they leave? Has anyone asked them? Is there any data on file to monitor track and spot a negative trend that can be addressed?”</p>
<p>These are provocative questions that need to be asked but he stops short of providing a clear, step-by-step roadmap for answering them. How can you translate data into fact-based insights and action plans that can advance the business goals of the organization and fulfill the needs of its most valuable customers?</p>
<p>There is a pretty simple (but not necessarily easy) approach for putting all that data to work to reach your business goals and build a system for ongoing creation of new value for customers.</p>
<p><strong>Step 1: </strong> Don’t try to boil the ocean. Identify just a few of your most important business problems or questions you need the data to help solve. For instance, if customer churn is a key business problem, then you can narrow down the specific questions you need to look to your data to answer. Who is leaving? (Demographics, geographics, etc)  When?  Connected to which transactions?</p>
<p><strong>Step 2:  </strong>Identify your most valuable customer segments and draw a preliminary profile of their common characteristics and behavior. Identify what is unknown about them as well as what is known. For instance why does one customer in Segment A remain loyal for 5 years but another in that same segment defect after one year?</p>
<p><strong>Step 3:  </strong>Explore all<strong> </strong>data sources and consider data mining and predictive analytics to uncover the “why” of customer churn. You can then start to draw “predictive” rather than just behavioral or transactional profiles of your most valuable customer segments.  These two data analysis techniques can provide attitudinal insight through exploring correlations, patterns and trends in large amounts of data. Also consider new “Sentiment Analysis” technology that can mine unstructured data from social network conversations and open-ended survey questions to identify emotional attitudes that are driving customer satisfaction (or lack thereof).</p>
<p><strong>Step 4:</strong> Refine your most valuable customer profiles to reflect attitudinal characteristics as well as behavioral. Include qualitative insights gained through other forms of research. If you use data mining and predictive analytics you should be able to determine which customers look like they are “most grow-able” based on a deeper understanding of how their needs may change over time.</p>
<p><strong>Step 4:</strong> Using these profiles, you can then begin to design the optimal customer experience for each valuable customer segment based on their emotional needs as well as transactional behavior and map that experience back to the original business goal and your brand&#8217;s unique value proposition. For instance, to reduce churn you could create a special service, or cross-sell an existing one, offer an exclusive incentive or act of recognition that will surprise and delight a valuable customer who looks like they may be likely to churn based on a predictive trigger.  The key here is to make the customer feel like you understand their needs better than your competitors do. For instance, a good car insurance customer might defect when their adult child no longer needs the family coverage if you don’t anticipate their need for a more favorable rate or type of coverage. You can then create in-market tests to determine which offers, messaging and customer touch points deliver the most impact on reducing churn.</p>
<p><strong>Step 5:</strong> Pilot, test and continuously measure the impact of your brand experience action plan tactics on your most valuable customers and your overall business goal. Build new data gathering techniques into your tests to continuously refine and enrich your customer profiles for further experience design and testing.</p>
<p>The ultimate goal should be to use your data to become predictive rather than reactive in how you design customer experiences and anticipate your customer’s emotional as well as transactional needs.  Over time you will spend less time solving those vexing business problems and more time innovating new customer insight-informed brand experiences that will create more brand value and customer loyalty.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[NTT Docomo launched PETA mining system]]></title>
<link>http://jclouds.wordpress.com/2009/11/01/ntt-docomo-launched-peta-mining-system/</link>
<pubDate>Sun, 01 Nov 2009 22:41:23 +0000</pubDate>
<dc:creator>Agile Cat</dc:creator>
<guid>http://jclouds.wordpress.com/2009/11/01/ntt-docomo-launched-peta-mining-system/</guid>
<description><![CDATA[Nov.1 &#8211; The company has started the peta mining project&#160; and built facilities named ]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Nov.1 &#8211; The company has started the peta mining project&#160; and built facilities named &#34;Social Brain&#34; with 200 servers last July. In this project, Docomo will address to analysis large data for city planning and improve traffic network, with it&#8217;s real-time population research.</p>
<p>However, it&#8217;s in the experimentation phase, and will need some other factors, the company added.</p>
<p><font color="#000080">J</font> &#60;<a href="http://itpro.nikkeibp.co.jp/article/COLUMN/20091029/339683/">http://itpro.nikkeibp.co.jp/article/COLUMN/20091029/339683/</a>&#62;</p>
</div>]]></content:encoded>
</item>

</channel>
</rss>
