<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress.com" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>machine-translation &amp;laquo; WordPress.com Tag Feed</title>
	<link>http://en.wordpress.com/tag/machine-translation/</link>
	<description>Feed of posts on WordPress.com tagged "machine-translation"</description>
	<pubDate>Sat, 28 Nov 2009 12:27:54 +0000</pubDate>

	<generator>http://en.wordpress.com/tags/</generator>
	<language>en</language>

<item>
<title><![CDATA[IBM seeks to burnish its machine translation solution with a "human" touch]]></title>
<link>http://pangeanic.wordpress.com/2009/11/25/ibm-seeks-to-burnish-his-automatic-translator-with-a-human-touch/</link>
<pubDate>Wed, 25 Nov 2009 21:11:00 +0000</pubDate>
<dc:creator>pangeanic</dc:creator>
<guid>http://pangeanic.wordpress.com/2009/11/25/ibm-seeks-to-burnish-his-automatic-translator-with-a-human-touch/</guid>
<description><![CDATA[The software giant is improving translations of his expected &#8220;n.Fluent&#8220;, aimed at instan]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>The software giant is improving translations of his expected &#8220;<a href="http://www.research.ibm.com/social/projects_nfluent.html" target="_blank">n.Fluent</a>&#8220;, aimed at instant messenging <span style="text-decoration:underline;">with the help of thousands of employees</span>. The system converts text in real time and is being tested internally.</p>
<p>Using a series of Crowdsourcing strategies and events, IBM&#8217;s <a href="http://www.research.ibm.com/social/projects_nfluent.html" target="_blank">n.Fluent</a> has managed to successfully engage and nurture an active, multinational pool of volunteer translators, who are dedicated to innovation.  This is just one of the stategies IBM will use to add the &#8220;human&#8221; touch in its announced, and <em>soon to be released</em> machine translation solution. The tool seeks to become an important channel for communication between different languages in instant messaging systems, comercially and socially.</p>
<p>The IBM statement is important for the translation and localization community, &#8220;one key cornerstones of the n.Fluent project is its Crowdsourcing strategy-which enables us to effectively tap into the collective power of bilingual IBMers for translating sentences or correcting machine translated sentences-for improving translation accuracy and quality.&#8221;</p>
<p>Indoors, a team of about a hundred people including developers, linguists and mathematicians, are working to shape a comprehensive program for Internet machine translation that will be useful for sites and documents, but especially for IM.</p>
<p>The system is supposed to support languages including English, Spanish, French, German, Italian, Japanese, Arabic, Chinese, Korean, Portuguese and Russian.</p>
<p>Now, tucked into the last stage of development, the multinational company with headquarters in New York is looking for that little &#8220;human&#8221; touch for n.Fluent.  The human touch will come from the contributions and comments to the translations that are being analyzed and corrected by the company&#8217;s own employees, (IBM has about 400,000 in nearly 161 countries), in an effort to refine the computing work done by the servers.</p>
<p>According to The New York Times, the n.Fluent was launched internally a year ago. It now counts on massive amounts of parallel data as around 3,000 crowdsourcing volunteers have collectively contributed about 36Million words (crowdsourcing from instant message chats and crowdsourcing translations). This is expected to provide further improvements in accuracy, meaning and quality, and programmers will seek that the machines &#8220;learn&#8221; the most accurate expressions for each language.</p>
<p>IBM is the world&#8217;s largest IT services company, with revenues of 103,600 million dolars last year.</p>
<h4 style="text-align:center;"><em>Next time you think languages, think Pangeanic</em></h4>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Universal Translators Are All Around Us ]]></title>
<link>http://gambari.wordpress.com/2009/11/24/universal-translators-are-all-around-us/</link>
<pubDate>Tue, 24 Nov 2009 12:26:33 +0000</pubDate>
<dc:creator>Daniel Radev</dc:creator>
<guid>http://gambari.wordpress.com/2009/11/24/universal-translators-are-all-around-us/</guid>
<description><![CDATA[Since machine translations is one of the topics here is an interesting article plus some video demos]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Since machine translations is one of the topics here is an interesting article plus some video demos  </p>
<p>http://singularityhub.com/2009/11/23/universal-translators-are-all-around-us-video</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Machine Translation and the Porpoise Corpus]]></title>
<link>http://healthyalgorithms.wordpress.com/2009/11/24/machine-translation-and-the-porpoise-corpu/</link>
<pubDate>Tue, 24 Nov 2009 01:12:57 +0000</pubDate>
<dc:creator>Abraham Flaxman</dc:creator>
<guid>http://healthyalgorithms.wordpress.com/2009/11/24/machine-translation-and-the-porpoise-corpu/</guid>
<description><![CDATA[I might have mentioned that I got to do some world traveling for my work recently. Seeing rural Tanz]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p><a href="http://en.wikipedia.org/wiki/File:Humpback_Whale_underwater_shot.jpg"><img class="aligncenter size-full wp-image-731" src="http://healthyalgorithms.wordpress.com/files/2009/11/wikipedia_whale.jpg" alt="" width="280" height="170" /></a>I might have mentioned that I got to do some world traveling for my work recently.  Seeing rural Tanzania was an experience that I still don&#8217;t really have good words to describe.  But this is not a post about that.  This is a post about a sticky idea I got stuck on in some science fiction I was reading during my multi-day to and fro travel.</p>
<p>On my around-the-world-in-4.5-days journey, I read the Jewish feminist sci-fi novel <em>He, She, and It</em> by Marge Piercy.  It&#8217;s got a classic hard AI theme, about a robot that is so, so human&#8230; I&#8217;d recommend it. But dilemmas of whether a robot can make a minyon in the reform tradition of 2059 has not stuck in my mind the way <a href="http://books.google.com/books?id=WqJRCX2pWf8C&#38;lpg=PP1&#38;dq=he%20she%20and%20it&#38;client=firefox-a&#38;pg=PA77#v=onepage&#38;q=whales&#38;f=false">this one line about whales</a> has:<!--more--></p>
<blockquote><p>The great whales&#8212;we had just about killed off the last of them before we began to translate their epic and lyric poetry.</p></blockquote>
<p>Okay, I&#8217;m a little embarrassed by it when I re-read it, but seriously, could we do it?  That is, does a serious attempt to translate whale songs into english have a chance in this modern age?  I once had a dusty book about an effort by Carl Sagan and his buddies to learn to communicate with dolphins in the 1960s, but technology has seriously advanced since then.</p>
<p>The last talk I saw on statistical machine translation was an effort to do arabic-to-english translation without telling the computer anything about the structure of sentences in either language.</p>
<p>Whale-to-english translation is at least one step harder, since there is a whale-speech-to-whale-text component that needs to precede the machine translation part (and, I suppose, there is the possibility that whale songs cannot be translated into english).</p>
<p>A few questions:  Has it already been done/proven impossible?  Do you think we could do it?  Do you have a vast collection of whale songs available to aid in the quest?</p>
<p>Regarding question 3, after a little searching, I&#8217;ve found <a href="http://www.birds.cornell.edu/brp/publications/iuss-research/">the lab at Cornell</a> that probably has the necessary data set.  They seem more interested in counting and tracking whales than translating them, but I&#8217;ve seen many a health researcher be protective of this sort of precious data.  I wonder if the Bioacoustics Research Program could share a few thousand hours of recordings.</p>
<p><a href="http://www3.interscience.wiley.com/journal/119929954/abstract"><img class="aligncenter size-full wp-image-730" src="http://healthyalgorithms.wordpress.com/files/2009/11/whale_spectrogram.png" alt="" width="430" height="302" /></a></p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Automatic Subtitles Rolling Out on Youtube]]></title>
<link>http://komplettie.wordpress.com/2009/11/23/automatic-subtitles-rolling-out-on-youtube/</link>
<pubDate>Mon, 23 Nov 2009 09:51:59 +0000</pubDate>
<dc:creator>komplettie</dc:creator>
<guid>http://komplettie.wordpress.com/2009/11/23/automatic-subtitles-rolling-out-on-youtube/</guid>
<description><![CDATA[Google has announced that it is to deploy automatic captions across certain YouTube channels in an a]]></description>
<content:encoded><![CDATA[Google has announced that it is to deploy automatic captions across certain YouTube channels in an a]]></content:encoded>
</item>
<item>
<title><![CDATA[Musings on Machine Translation]]></title>
<link>http://cetrainc.com/2009/11/22/musings-on-machine-translation/</link>
<pubDate>Sun, 22 Nov 2009 19:07:38 +0000</pubDate>
<dc:creator>cetrablog</dc:creator>
<guid>http://cetrainc.com/2009/11/22/musings-on-machine-translation/</guid>
<description><![CDATA[At the reccent OTTIAQ Conference, the keynote speaker &#8211; Pierre Isabelle of the National Resear]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>At the reccent <a title="OTTIAQ conference" href="http://www.ottiaq.org/communications/congres_en.php" target="_blank">OTTIAQ Conference</a>, the keynote speaker &#8211; Pierre Isabelle of the <a href="http://www.nrc-cnrc.gc.ca" target="_blank">National Research Center Canada</a> &#8211; talked about machine translation (MT) and suggested new approaches that may allow translators to incorporate MT into their work. The fact that the conference organizers chose to open a conference attended mainly by freelance translators with a talk on MT ties in nicely with my efforts to establish communication between &#8220;human translators&#8221; and MT developers, which until very recently was virtually non-existent. In my role as the President of <a href="https://www.atanet.org" target="_blank">ATA</a> (American Translators Association), I reached out to Laurie Gerber, then the President of <a href="http://www.eamt.org/iamt.php" target="_blank">IAMT</a> (International Association for Machine Translation) and also Past President of <a href="http://www.amtaweb.org" target="_blank">AMTA</a> (Association for Machine Translation in the Americas), and we agreed that it would be good to have MT representatives give presentations at translation association events and vice versa, which we also successfully implemented. Among other things, the AMTA Summit will be collocated with the ATA Conference in Denver in 2010. The outcomes of the dialogue that ensued were quite interesting. For example, MT developers learned that translators are not interested in post-editing of MT output, and translators learned that human translation versus MT is not a zero-sum proposition (where more MT equals less work for translators); rather, it creates markets that have not existed before. In other words, translators should view MT as an opportunity rather than a threat. By the same token, MT developers should not expect translators to post-edit; this will be done by a new breed of linguists, and translators will continue to do what they enjoy the most and what they are good at: translating.</p>
<p>Jiri Stejskal</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Microsoft Translator Widget &amp; APIs, now Beta]]></title>
<link>http://pangeanic.wordpress.com/2009/11/18/microsoft-translator-widget-apis-now-beta/</link>
<pubDate>Wed, 18 Nov 2009 16:45:25 +0000</pubDate>
<dc:creator>pangeanic</dc:creator>
<guid>http://pangeanic.wordpress.com/2009/11/18/microsoft-translator-widget-apis-now-beta/</guid>
<description><![CDATA[Microsoft’s statistical machine-translation technology designed for integration into third-party web]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p><a href="www.microsoft.com" target="_blank">Microsoft</a>’s <a href="http://en.wikipedia.org/wiki/Statistical_machine_translation" target="_blank">statistical machine-translation</a> technology designed for integration into third-party web properties has now reached its Beta stage. The latest development, announced at <a href="http://www.microsoft.com/europe/TechEd/" target="_blank">TechEd Europe</a> is also synonymous with the Redmond company opening up access to the <a href="http://www.microsofttranslator.com/Widget/" target="_blank">Microsoft Translator Widget</a> and the Microsoft Translator AJAX API. Prior to TechEd Europe, both the Translator widget and the application programming interface were only available on an invitation-only basis.</p>
<p>This move follows the current push to offer more and more machine-translation integration, a trend followed by several language companies, like Pangeanic. Language Service Providers have been feeling the pressure on technology innovation for some time now. Sharing initiatives like <a href="www.translationautomation.com" target="_blank">TAUS</a>&#8216; <a href="http://www.tausdata.org" target="_blank">TDA</a> have paved the way for an increase in the availability of data which in turn has accelerated several developments. In other occasions, TDA initiative has made possible the birth of <a href="http://www.tausdata.org/index.php/news/news/113" target="_blank">new revenue streams for LSP&#8217;s</a>.</p>
<p>With this new Beta, website owners who want to make use of Microsoft Translator technology into their online content can do so with no limitations as the Beta program is public. In a nutshell, the company is enabling all customers to generate either a snippet or application based on its own MT technology. Website owners only need to visit the widget and AJAX API adoption portals, and get the generated code from there.</p>
<p>Vikram Dendi, from the Microsoft Translator team, highlighted a few points of interest from TechEd Europe: “Microsoft Translator APIs and the webpage widget are now in beta. Generate a translator widget for your webpage here, or use the AJAX API to further customize the translation experience.&#8221;</p>
<p style="text-align:center;">
<h4 style="text-align:center;"><em> Next time you think languages, think Pangeanic</em></h4>
<p>&#160;</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[For the Advancement of Arabic/English Machine-Translation Technology (and others): IBM and KACST]]></title>
<link>http://pangeanic.wordpress.com/2009/11/18/for-the-advancement-of-arabicenglish-machine-translation-technology-and-others-ibm-and-kacst/</link>
<pubDate>Wed, 18 Nov 2009 16:21:42 +0000</pubDate>
<dc:creator>pangeanic</dc:creator>
<guid>http://pangeanic.wordpress.com/2009/11/18/for-the-advancement-of-arabicenglish-machine-translation-technology-and-others-ibm-and-kacst/</guid>
<description><![CDATA[The Saudi Arabian National Research and Development Organization announced yesterday a multi-year ag]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>The Saudi Arabian National Research and Development Organization announced yesterday a multi-year agreement to collaborate on, amongst other areas, the advancement of Machine-Translation technologies.</p>
<p>Under terms of the agreement, King Abdulaziz City for Science and Technology (KACST) will purchase an IBM Blue Gene supercomputer that will enable its researchers to perform complex simulations and computational modelling.</p>
<p>The software giant will provide training services to KACST researchers on the functionality and features of Statistical Machine Translation technology. IBM&#8217;s Research and Development team will be in charge of building the machine translation system with the initial basic system capabilities, which will be trained with several million words of data &#8211; the basis of the translation training process.</p>
<p>IBM will commit researchers, and business consultants and KACST scientists will work together to further enhance the IBM Machine Translation Engine into a powerful translation engine to translate Arabic to other languages. This project deals with natural language analysis and computational methods for language translation. Technologies used for machine translation, such as syntactic parsing and word sense disambiguation, are commonly used in other applications of natural language processing.</p>
<p>The agreement is one of several joint research projects undertaken between both organizations.</p>
<p>Help will also be provided on Intellectual Property Development management so that IBM&#8217;s expertise is used to help KACST tools and processes that turn its inventions into patents.</p>
<p>The agreement also includes collaboration to create the National Center for Women Engineers.</p>
<p>Full story has been reported in several sites: <a href="http://www.zawya.com/story.cfm/sidZAWYA20091117091656" target="_blank">zawya.com</a>, <a href="http://www.ameinfo.com/216490.html" target="_blank">ameinfo.com</a>, and the <a href="http://www.us-sabc.org/custom/news/details.cfm?id=540" target="_blank">US-Saudia Arabian Business Council</a>.</p>
<h4 style="text-align:center;"><em>Next time you think languages, think Pangeanic.</em><br />
<em><em></em></em></h4>
<p><em> </em></p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Great presentation at the ATA conference about the future of translation]]></title>
<link>http://martinoprada.wordpress.com/2009/11/17/great-presentation-at-the-ata-conference-about-the-future-of-translation/</link>
<pubDate>Wed, 18 Nov 2009 02:20:56 +0000</pubDate>
<dc:creator>martinho21</dc:creator>
<guid>http://martinoprada.wordpress.com/2009/11/17/great-presentation-at-the-ata-conference-about-the-future-of-translation/</guid>
<description><![CDATA[Renato Beninatto, CEO of milengo posted in his blog his presentation at the ATA conference.  You can]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p><a href="http://renatobeninatto.blogspot.com/">Renato Beninatto</a>, CEO of <a href="http://www.milengo.com/">milengo</a> posted in his blog his presentation at the ATA conference.  You can check it out <a href="http://www.slideshare.net/renatob/signals-of-shift-in-the-language-industry-are-you-in-or-are-you-out">here</a>.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[How to measure machine-translation quality]]></title>
<link>http://pangeanic.wordpress.com/2009/11/15/how-to-measure-machine-translation-quality/</link>
<pubDate>Sun, 15 Nov 2009 08:52:25 +0000</pubDate>
<dc:creator>pangeanic</dc:creator>
<guid>http://pangeanic.wordpress.com/2009/11/15/how-to-measure-machine-translation-quality/</guid>
<description><![CDATA[Many people have asked me how they can reliably use a system to measure/benchmark the quality of the]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Many people have asked me how they can reliably use a system to measure/benchmark the quality of their translation system (rule-based, example-based or statistical). They have bought some commercial rule-based software and are trying it, building dictionaries and normalization rules or they are having a first try at what it means to deal with a Moses engine.</p>
<p>There are two free systems which can be used as input/output and that will give you an idea of how your system is scoring.</p>
<p>Some people use them to test their system versus <a href="http://translate.google.com" target="_blank">Google Translator</a>, raw MT output or other texts. You can use it, for example, to check how your system is doing in comparison with free GT, Systran online tools, BabelFish, etc. It may give you an idea of your progress as you customize your own tool for a particular application, taking generalist online tools as a basic reference.</p>
<p>The tests are not so difficult to carry out. All you will need is some help at installation stage if you are not familiar with Linux and running a few command lines. Once you get used to it, you can run <em>progress check tests</em> at will.</p>
<p><a href="http://web.science.mq.edu.au/~szwarts/MT-Evaluation.php" target="_blank">BLEU</a> is the standard in the industry. Most MT systems will show some kind of BLEU score to prove their progress and reliability sooner or later. However, there are some drawbacks on BLEU and you may feel some of its high scores do not actually represent the same kind of improvement when you look at the translated files.</p>
<p>We favour <a href="http://www.cs.cmu.edu/~alavie/METEOR/" target="_blank">Meteor</a> at Pangeanic. It not only takes into account word-per-word occurrences. It also takes into account some linguistic tree-like info to the tests.  Your scores in BLEU will usually show as lower results in Meteor, although this is just a very wide rule-of-a-thumb. We have experienced higher Meteor scores at Pangeanic when measuring engines providing marketing texts or general translation. This is because not just the word occurrence was taken into account, but other relations (i.e. whole family). Looking at the results from a post-editing point of view, this may be more relevant because it takes little time to correct the wrong tense in a verb, a singular or a plural.</p>
<p>We recommend you take 60% of the BLEU score as your productivity target initially. Once an MT system is up-and-running, and it has been updated and perfected over some months, your scores and productivity will go up exponentially.</p>
<p>BLEU</p>
<p>http://web.science.mq.edu.au/~szwarts/MT-Evaluation.php</p>
<p>METEOR</p>
<p>http://www.cs.cmu.edu/~alavie/METEOR/</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[How to use GIZA++]]></title>
<link>http://mmannot.wordpress.com/2009/11/10/how-to-use-giza/</link>
<pubDate>Tue, 10 Nov 2009 17:02:11 +0000</pubDate>
<dc:creator>Ali Reza Ebadat</dc:creator>
<guid>http://mmannot.wordpress.com/2009/11/10/how-to-use-giza/</guid>
<description><![CDATA[GIZA++ is a program for aligning words and sequences of words in sentence aligned corpora. You can f]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p><strong>GIZA++</strong> is a program for aligning words and sequences of words in sentence aligned corpora. You can find more information about how to use it from following links:</p>
<p><a href="http://kwang.blogdns.com/research/how-to-compile-install-run-giza.html/comment-page-1" target="_blank">http://kwang.blogdns.com/research/how-to-compile-install-run-giza.html/comment-page-1</a></p>
<p><a href="http://wiki.apertium.org/wiki/Using_GIZA%2B%2B" target="_blank">http://wiki.apertium.org/wiki/Using_GIZA%2B%2B</a></p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Statistical Machine Translation Tutorial Reading]]></title>
<link>http://mmannot.wordpress.com/2009/11/10/statistical-machine-translation-tutorial-reading/</link>
<pubDate>Tue, 10 Nov 2009 15:25:13 +0000</pubDate>
<dc:creator>Ali Reza Ebadat</dc:creator>
<guid>http://mmannot.wordpress.com/2009/11/10/statistical-machine-translation-tutorial-reading/</guid>
<description><![CDATA[I have found a good tutorial about Statistical Machine Translation from Dr. David Kauchak webpage an]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>I have found a good tutorial about Statistical Machine Translation from <a href="http://cseweb.ucsd.edu/~dkauchak/mt-tutorial/">Dr. David Kauchak webpage</a> and <a href="http://www.52nlp.com/statistical-machine-translation-tutorial-reading/">I love NLP weblog</a>.The second one is a copy of the first one.</p>
<p>The items listed are important to read for everyone who are interested to know about SMT.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[How to build a Statistical Machine Translation very fast?]]></title>
<link>http://mmannot.wordpress.com/2009/11/10/how-to-build-a-statistical-machine-translation-very-fast/</link>
<pubDate>Tue, 10 Nov 2009 14:27:14 +0000</pubDate>
<dc:creator>Ali Reza Ebadat</dc:creator>
<guid>http://mmannot.wordpress.com/2009/11/10/how-to-build-a-statistical-machine-translation-very-fast/</guid>
<description><![CDATA[There is a guideline about how to use different toolkit to build aStatistical Machine Translation. Y]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>There is a guideline about how to use different toolkit to build aStatistical Machine Translation. You only need to have enough translation sentences in two languages.</p>
<p>By using</p>
<ul>
<li><a href="http://mi.eng.cam.ac.uk/%7Eprc14/toolkit.html" target="new">CMU Statistical Language Modelling Toolkit</a> (version 2) for language model training</li>
<li><a href="http://www.fjoch.com/mkcls.html" target="new">mkcls</a> for training of word classes</li>
<li><a href="http://www.fjoch.com/GIZA++.html" target="new">GIZA++</a> for training of statistical translation models</li>
<li><a href="http://www.isi.edu/licensed-sw/rewrite-decoder/" target="new">ISI ReWrite Decoder</a> (version 1.0.0a) for decoding (translation)</li>
<li><a href="ftp://jaguar.ncsl.nist.gov/mt/resources/mteval-kit-v09.tar.gz">NIST MT evaluation kit</a> (version 9) for BLEU and NIST evaluation</li>
</ul>
<p>You can also use different tools like :</p>
<ul>
<li> <a class="wiki" href="http://www.isi.edu/licensed-sw/pharaoh/" target="_blank">Pharaoh</a> a decoder for phrase-based SMT</li>
<li> <a class="wiki" href="http://www.isi.edu/licensed-sw/rewrite-decoder/" target="_blank">Rewrite</a> a decoder for IBM Model 4</li>
</ul>
<p><a href="http://ufal.mff.cuni.cz/pcedt/tools/SMT_QuickRun/Doc/SMT_QuickRun.html" target="_blank">More detail about how to build Statistical Machine Translation</a></p>
<div id="_mcePaste" style="overflow:hidden;position:absolute;left:-10000px;top:33px;width:1px;height:1px;">
<ul>
<li> <a class="wiki" href="http://www.isi.edu/licensed-sw/pharaoh/" target="_blank">Pharaoh</a> a decoder for phrase-based SMT</li>
<li> <a class="wiki" href="http://www.isi.edu/licensed-sw/rewrite-decoder/" target="_blank">Rewrite</a> a decoder for IBM Model 4</li>
</ul>
</div>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Will translators become an endangered species? ]]></title>
<link>http://martinoprada.wordpress.com/2009/11/07/will-translators-become-an-endangered-species/</link>
<pubDate>Sun, 08 Nov 2009 06:21:17 +0000</pubDate>
<dc:creator>martinho21</dc:creator>
<guid>http://martinoprada.wordpress.com/2009/11/07/will-translators-become-an-endangered-species/</guid>
<description><![CDATA[Japanese manufacturer NEC unveiled a pair of glasses that can automatically translate spoken words a]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Japanese manufacturer NEC unveiled a pair of glasses that can automatically translate spoken words and phrases. I still think that no machine will be able to translate as a good &#8220;human&#8221; translator. Read <a href="http://www.telegraph.co.uk/technology/news/6493869/NEC-unveils-Tele-Scouter-translation-glasses.html">the article at telegraph.co.uk</a>.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Twitter launches Spanish crowdsourced version]]></title>
<link>http://pangeanic.wordpress.com/2009/11/04/twitter-launches-spanish-crowdsourced-version/</link>
<pubDate>Wed, 04 Nov 2009 13:29:12 +0000</pubDate>
<dc:creator>pangeanic</dc:creator>
<guid>http://pangeanic.wordpress.com/2009/11/04/twitter-launches-spanish-crowdsourced-version/</guid>
<description><![CDATA[Twitter has launched its Spanish-speaking version, a successful, fully crowdsourced project. The rel]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p><a href="http://www.twitter.com" target="_blank">Twitter</a> has launched its Spanish-speaking version, a successful, fully crowdsourced project. The release comes from <a href="http://news.cnet.com/8301-13577_3-10390027-36.html" target="_blank">CNET</a> http://news.cnet.com/8301-13577_3-10390027-36.html</p>
<p>Twitter&#8217;s foreign language release follows <a href="http://www.facebook.com" target="_blank">Facebook</a>&#8217;s initiative, which has been an example of &#8220;community-based&#8221; localization, but that has also been plagued by errors in translation, misinterpretations with some unfavourable comments from some Facebook communities (Italian in particular).</p>
<p>Well, Twitter&#8217;s Spanish interface begins with a typo on the front page, right up the little bird where &#8220;esta pasando&#8221; (is happening) has the accent missing over the está&#8230;.. just click on the above CNET link to check (or change you language setting to Spanish in Twitter). Another one appears when accessing the application from a 3rd-party site with Twithis.<img class="alignleft size-full wp-image-213" title="twitterspanish_610x204" src="http://pangeanic.wordpress.com/files/2009/11/twitterspanish_610x204.png" alt="twitterspanish_610x204" width="500" height="167" /></p>
<p>&#160;</p>
<p><a href="http://pangeanic.wordpress.com/files/2009/11/twitter-aplicaicon.jpg"><img class="alignleft size-medium wp-image-250" title="twitter aplicaicón" src="http://pangeanic.wordpress.com/files/2009/11/twitter-aplicaicon.jpg?w=300" alt="Another typo when accessing from 3rd-party applications" width="300" height="240" /></a>Still, not bad for the cost of it. This brings up the question again about the usefulness of crowdsourcing or community-based translation for brands and how to best apply it. It also brings up the question about how much &#8220;bad&#8221; translation users can put up with for the purpose of usefulness.</p>
<p>There are companies which are undertaking a very professional approach to crowdsourcing (see Adobe&#8217;s efforts, providing even a structure for the Adobe user community). Many are seeing the hype as an opportunity to get localization for free.</p>
<p>Our position at Pangeanic is that whilst crowdsourcing is very good, particularly for under-funded open source projects, it is only complementary to professional efforts when it comes to trusting brand image in a foreign language. Crowdsourcing may be a way of obtaining documentation translated that would otherwise not be translated, and translated by enthusiasts in the field as long as the translated content is not urgent and has a team leader looking after terminology coherence and all the typical issues an experienced project manager might look after.</p>
<p>Other alternatives, such as (statistical) machine-translation with human post-editing can also be considered when time is essential, as long as a proper development within a particular field (or preferably a client-specific development) is in place.</p>
<h4><em>Next time you think languages, think Pangeanic</em></h4>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[A DIY Translation Toolkit]]></title>
<link>http://pangeanic.wordpress.com/2009/11/03/a-diy-translation-toolkit/</link>
<pubDate>Tue, 03 Nov 2009 08:50:13 +0000</pubDate>
<dc:creator>pangeanic</dc:creator>
<guid>http://pangeanic.wordpress.com/2009/11/03/a-diy-translation-toolkit/</guid>
<description><![CDATA[Rule based, but still&#8230;.. good to twikle for entry level. Unexpensive. http://ai-depot.com/noli]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Rule based, but still&#8230;.. good to twikle for entry level. Unexpensive.</p>
<p><a href="http://ai-depot.com/nolimits/snapshot.htm">http://ai-depot.com/nolimits/snapshot.htm</a></p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Machine Translation for Urdu - English]]></title>
<link>http://pangeanic.wordpress.com/2009/11/03/machine-translation-for-urdu-english/</link>
<pubDate>Tue, 03 Nov 2009 07:31:29 +0000</pubDate>
<dc:creator>pangeanic</dc:creator>
<guid>http://pangeanic.wordpress.com/2009/11/03/machine-translation-for-urdu-english/</guid>
<description><![CDATA[If Urdu/English machine translation combination is accurate up to 85%, what is stopping the developm]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p style="text-align:left;">If Urdu/English machine translation combination is accurate up to 85%, what is stopping the development into other, more common pairs? In our opinion, it is not just &#8220;the service&#8221;, online or otherwise, but the &#8220;application&#8221;.</p>
<p style="text-align:left;">New startups like Tradukka may mimic the way Google Translator works, better or worse, and claim to be faster. As it was discussed and learnt during the recent TAUS Summit in Portland, what we need is ways of integrating machine-translation into existing workflows, from TMS to community/crowdsourcing projects.</p>
<p style="text-align:left;">Whilst these efforts are admirable, language output also requires integration with current CAT systems in order to leverage as much legacy as possible. And that is what we do a <a href="http://www.pangeanic.com.mt" target="_blank">pangeanic.com.mt</a> &#8211; integrate statistical machine-translation in an open source <a href="www.lisa.org/tmx" target="_blank">TMX format </a>to leverage both.</p>
<p style="text-align:left;">
<p style="text-align:left;"><em>Quoting the article from </em><a href="http://www.nation.com.pk/pakistan-news-newspaper-daily-english-online/Regional/Islamabad/02-Nov-2009/Now-english-no-barrier-to-use-computer" target="_blank"><em>The Nation</em></a></p>
<p>&#8220;ISLAMABAD (APP) &#8211; ‘English is no more a barrier to learn and use computer’ as National Language Authority (NLA) has introduced the first-ever Urdu computer software for the users.</p>
<p>The software was developed by Center of Excellence for Urdu Informatics of NLA with the help of <a href="http://www.microsoft.com" target="_blank">Microsoft</a> mainly to preserve <a href="http://en.wikipedia.org/wiki/Urdu" target="_blank">Urdu language</a>.</p>
<p>Through this system, one can follow the instructions of computer in Urdu language and work in Microsoft Windows. A single keyboard and font will be usable for Urdu as well as other Pakistani languages.<br />
NLA’s efforts to make Urdu language a part of the computer, internet and informatics, like other world languages, will definitely serve the country and future generations, said Chairman <a href="http://en.wikipedia.org/wiki/National_Language_Authority" target="_blank">National Language Authority (NLA)</a> Iftikhar Arif in an interview with the agency.<br />
The future of Urdu language is linked with the modern technology of computer and Internet. Languages can only be survived if they adopt changes according to the latest developments and modern trends, he remarked.</p>
<p>It is not an achievement but an effort to further maintain the status of Urdu as national language, he said. Besides, &#8220;Machine Translation software&#8221; (MTs) has also been developed to perform the automatic translation from English to Urdu which is a step towards availing all the information in Urdu.</p>
<p>The software has stored Urdu Lughat (Dictionary) which help translate the material from English to Urdu language ensuring correct translation upto 85 percent, he said. &#8220;</p>
<h4><em>Next time you think languages, think Pangeanic</em></h4>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Google to Start Charging for Translations]]></title>
<link>http://pangeanic.wordpress.com/2009/10/28/google-to-start-charging-for-translations/</link>
<pubDate>Wed, 28 Oct 2009 21:56:43 +0000</pubDate>
<dc:creator>pangeanic</dc:creator>
<guid>http://pangeanic.wordpress.com/2009/10/28/google-to-start-charging-for-translations/</guid>
<description><![CDATA[[Quoting David Grunwald From the GTS Blog] &#8220;While reading Google’s website today on this page,]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>[Quoting <span id="yui-gen0" class="miniprofile-container http://www.linkedin.com/miniprofile?vieweeID=9253560&#38;context=anet&#38;view miniprofile-initialized"><strong><a href="http://www.linkedin.com/profile?viewProfile=&#38;key=9253560&#38;authToken=mDuR&#38;authType=name">David Grunwald</a></strong></span> From the GTS Blog]</p>
<p>&#8220;While reading Google’s website today on <a href="http://translate.google.com/support/toolkit/bin/answer.py?answer=147829&#38;cbid=yv67qoc32ejx&#38;src=cb&#38;lev=topic" target="_blank">this page</a>, I was amazed to read that Google plans to start charging for using its Translator Toolkit. Google states: “Google Translator Toolkit is free, but in the future, we plan to charge users whose translations exceed high-volume thresholds.”</p>
<p>This is really am amazing piece of news, since Google gives away all of its services for free. Is this a trend which Google will follow with other services? And why start charging for translation? If anyone has an answer to this I would appreciate hearing it.</p>
<p>In a discussion I initiated on Linkedin’s Automated Language Translation Group back in July 2009, I raised the status of Google’s MT initiative, where they were headed with it and what it means to the commercial MT (machine translation) vendors. Specifically, why should people pay for translation services when Google offers them for free? I guess that this announcement puts Google in the category of commercial MT vendors after all.</p>
<p>This announcement should come as good news to the vendors whose main business is MT, as they should be able to compete with a giant company that “also sells MT.” The announcement should be treated with caution by customers: what looks great today may not be that great tomorrow.</p>
<p>I would like to congratulate Jaap van der Meer of TAUS who predicted that Google will start charging for MT <a href="http://www.translationautomation.com/technology/google-translation-toolkit.html" target="_blank">in his article published in June</a>.  Jaap, you are true visionary – chapeau.&#8221;</p>
<p>Full story and comments:</p>
<p>http://www.linkedin.com/news?viewArticle=&#38;articleID=80906470&#38;gid=148593&#38;srchCat=WOTC&#38;articleURL=http%3A%2F%2Fblog%2Egts-translation%2Ecom%2F2009%2F10%2F28%2Fgoogle-to-start-charging-for-translations%2F&#38;urlhash=DBms</p>
<h4><em><br />
</em></h4>
<h4><em>Next time you think languages, think Pangeanic</em></h4>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[What happened to Google Translate?]]></title>
<link>http://slowfox.wordpress.com/2009/10/19/what-happened-to-google-translate/</link>
<pubDate>Mon, 19 Oct 2009 03:40:53 +0000</pubDate>
<dc:creator>Karl-Erik Tallmo</dc:creator>
<guid>http://slowfox.wordpress.com/2009/10/19/what-happened-to-google-translate/</guid>
<description><![CDATA[Some four months ago, I was planning to write an enthusiastic article about how great Google&#8217;s]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Some four months ago, I was planning to write an enthusiastic article about how great Google&#8217;s translation service had become. Outside of linguistic institutions, translation from Swedish into another language has not been available to the public for so long (I believe Babelfish was the first), and I was extremely impressed with how well Google managed also complex sentence patterns with parenthetical subordinate clauses, and how the translation engine managed to keep track of pronouns and various correlates. Sure, there were errors, but still, it was quite remarkable. </p>
<p>Now, not much of this seems to work any longer, and I get the feeling that maybe a hard drive containing exceptions and phraseology might have crashed in Googleland. This is very sad, since the current translations are almost unreadable. The Swedish word &#8220;hyllningskör&#8221; (approx. &#8216;unanimous praise&#8217;) became &#8220;tribute fragile&#8221; the other day. And today it became &#8220;tribute shoes&#8221;. So, obviously Google learns. But it learns erroneously.</p>
<p>I have recently worked with a few texts in Italian, and I discovered something rather strange. Take the following sentence:</p>
<blockquote><p>Questo diritto non può tuttavia mai devolversi per successioni al fisco, ed è riconosciuto e protetto nei due Stati per trent&#8217;anni dopo la morte dell&#8217;Autore. </p></blockquote>
<p>Google renders this into English thus:</p>
<blockquote><p>This right can not ever give me for succession to the tax authorities, and is recognized and protected in both states for decades after his death.</p></blockquote>
<p>There are several oddities here, but let&#8217;s focus on &#8220;decades&#8221;? As far as I know, <em>trent&#8217;anni</em> means &#8220;30 years&#8221;.</p>
<p>If I simply enter <em>trent&#8217;anni</em> into the source language field, the translation yields: &#8220;thirty&#8221;.</p>
<p>Where did all the years go? If I finally enter only the word <em>anni</em>; then the result is: &#8220;years&#8221;. </p>
<p>Now, take this sentence:</p>
<blockquote><p>Ogni spacciatore di edizione contraffatta, s’egli non è riconosciuto il contraffattore, sarà tenuto di pagare al vero proprietario una somma equivalente al prezzo di quattrocento esemplari della edizione originale. </p></blockquote>
<p>Google now suggests:</p>
<blockquote><p>Each issue of counterfeit <strong>drug dealer</strong>, if he has not recognized the infringer will be required to pay the true owner a sum equivalent to the price of four hundred copies of the original edition. </p></blockquote>
<p>&#8220;Drug dealer&#8221;? I try to strip down the sentence, first in half – not good either – then I enter only <em> Ogni spacciatore di edizione contraffatta</em>, and then Google finally says: &#8220;Every dealer edition counterfeit&#8221;. Not very nice, but at least I got rid of the pusher.</p>
<p>One wonders how the famous algorithms are designed, when the results are so different in different contexts.</p>
<p>Someone suggested that maybe this is not about hard drive crashes or modified algorithms at all, but simply that Google so far has cooperated with some commercial actor specialized in machine translation, but that Google now tries to bring it all home, in-house.</p>
<p>Maybe so. I just hope Google soon will fix this. I can hardly be the only one who has noticed this deterioration in quality. If Google wishes to be the choice of professionals, they can&#8217;t just suddenly reduce the quality of a service without warning its users. Goodbye, goodwill.</p>
<p>Several of my blog entries here end with links to automatic translations from Swedish to English, German, and French. Right now, I can&#8217;t say I am proud of the gibberish that is the result when you click there. Maybe it&#8217;s better than no translation at all. But only maybe.</p>
<div style="font-size:10px;line-height:15px;margin-top:20px;"><strong>PS.</strong> I wrote a little about machine translation back in 1998, see &#8220;<a href="http://www.art-bin.com/art/ababele.html">The Debabelizing of the Internet</a>&#8220;.</div>
<p>
<div style="font-size:10px;">Pingad på <a href="http://intressant.se/intressant">Intressant</a>.</div></p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Probeklinge]]></title>
<link>http://enjoymentandcontemplation.wordpress.com/2009/10/15/probeklinge/</link>
<pubDate>Fri, 16 Oct 2009 02:26:38 +0000</pubDate>
<dc:creator>Chillingworth</dc:creator>
<guid>http://enjoymentandcontemplation.wordpress.com/2009/10/15/probeklinge/</guid>
<description><![CDATA[I recently bought a Merkur old-fashioned safety razor from Vintage Blades (I&#8217;m not getting any]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p><a href="http://www.vintagebladesllc.com/vshop/xcart/home.php?cat=129"><img class="alignleft size-full wp-image-99" title="safety razor" src="http://enjoymentandcontemplation.wordpress.com/files/2009/10/safety-razor.jpg" alt="safety razor" width="150" height="184" /></a>I recently bought a <a href="http://www.vintagebladesllc.com/vshop/xcart/product.php?productid=87&#38;cat=129&#38;page=1">Merkur</a> old-fashioned safety razor from <a href="http://www.vintagebladesllc.com/">Vintage Blades</a> (I&#8217;m not getting anything from either of them, I just thought this was funny), and the &#8220;sample blade&#8221; came wrapped in helpful advice (“After shave rinse only &#8211; do not wipe !&#8221;) and friendly assurances in five languages.  My favorite was this one:  &#8220;Please try this magnificient [sic] stainless steel razor blade.  You will be enthusiastic !&#8221;  <!--more-->In Spanish and French, even better, you&#8217;ll be &#8220;enthusiasmed&#8221;—<em>entusiasmado </em>and <em>enthousiasmés</em>, respectively.  (Note also that in Spanish there&#8217;s only one of you, but in French at least two—you&#8217;ll be sharing the razor with your mistress, I suppose?)</p>
<p>Other highlights:</p>
<p>Apparently “stainless steel&#8221; becomes in German <em>rostfrei</em> (“rust-free&#8221;), and variations on &#8220;inoxidizable&#8221; in Spanish, French, and Italian.</p>
<p>In French, &#8220;After shave rinse only &#8211; do not wipe !&#8221; becomes <em>La barbe faite, rincez la lame &#8211; ne l&#8217;essuyez pas! </em>(“The beard done, rinse the blade—don&#8217;t wipe it!&#8221;—ironically, in this sentence, French, the one language that I&#8217;m sure is supposed to have a space before the exclamation point, is the only one that doesn&#8217;t).</p>
<p>Also, in the last item in English, the spelling of the borrowed French word is incorrectly imported from the actual French below it:  &#8220;This is the connaisseur&#8217;s blade &#8211; uncommonly smooth and lasting in cut !&#8221;</p>
<p>I know, it&#8217;s not news that the instructions packaged with products are often hilariously mistranslated, probably not even by a person.  I just wanted to share the joy!</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Semlab in Machine Translation project: Let'sMT!]]></title>
<link>http://semlab.wordpress.com/2009/10/12/semlab-in-machine-translation-project-letsmt/</link>
<pubDate>Mon, 12 Oct 2009 10:16:40 +0000</pubDate>
<dc:creator>dohmen</dc:creator>
<guid>http://semlab.wordpress.com/2009/10/12/semlab-in-machine-translation-project-letsmt/</guid>
<description><![CDATA[Semlab had joined an international consortium of researchers, linguistic experts and software develo]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p><img class="alignright size-medium wp-image-205" title="Letsmt" src="http://semlab.wordpress.com/files/2009/09/letsmt.jpg?w=300" alt="Letsmt" width="166" height="53" /><br />
Semlab had joined an international consortium of researchers, linguistic experts and software developers to create a system that automatically translates relatively small languages.</p>
<p>The focus in the project lies with small languages such as Latvian, Lithuanian, Croatian, etc. and one of the main methods that will be used is Statistical Machine Translation (SMT). SMT systems are built by analyzing huge volumes of parallel corpus and learning translation models from these data. This is particularly tricky with smaller languages since the available corpus data is much smaller.</p>
<p>The project aims at a.o. developing a widget or browser add-on that translates such small languages and is currently under evaluation at the European union.</p>
<p>For more information: <a href="http://www.letsmt.com/" target="other">Let&#8217;sMT! Project website</a></p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[NLP and Ontologies]]></title>
<link>http://gambari.wordpress.com/2009/10/09/nlp-and-ontologies/</link>
<pubDate>Fri, 09 Oct 2009 19:40:17 +0000</pubDate>
<dc:creator>jyonkov</dc:creator>
<guid>http://gambari.wordpress.com/2009/10/09/nlp-and-ontologies/</guid>
<description><![CDATA[For a while now I&#8217;ve been thinking about using knowledge representation &#8211; Ontologies as ]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>For a while now I&#8217;ve been thinking about using knowledge representation &#8211; Ontologies as a base for creating a modular Natural Language Processing system focused on extracting structured data from unstructured. For example we can create/use Ontologies (models) that describe &#8220;simple&#8221; concepts like: Address, Time, Task, Expense, Transaction etc&#8230; and use them to &#8220;match&#8221; information from a text stream. The reason i&#8217;m writing this is because i think that there is a common ground for collaboration&#8230; I know Stefan is interested in RDF/OWL,  the company that Neven is involved is in a very near domain and finally i was playing with Google Wave which i think is a good platform for creating intelligent bots that will be very easy to distribute if they turn out to be useful :)</p>
<p>Here are some references:<br />
<a href="http://wordnet.princeton.edu/">http://wordnet.princeton.edu/<br />
</a><a href="http://protege.stanford.edu/">http://protege.stanford.edu/<br />
</a><a href="http://jena.sourceforge.net/">http://jena.sourceforge.net/<br />
</a><a href="http://www.openrdf.org/">http://www.openrdf.org/<br />
</a><a href="http://code.google.com/apis/wave/guide.html">http://code.google.com/apis/wave/guide.html</a></p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Something about our company Interlecta]]></title>
<link>http://gambari.wordpress.com/2009/10/04/something-about-our-company-interlecta/</link>
<pubDate>Sun, 04 Oct 2009 20:47:04 +0000</pubDate>
<dc:creator>Neven Boyanov</dc:creator>
<guid>http://gambari.wordpress.com/2009/10/04/something-about-our-company-interlecta/</guid>
<description><![CDATA[Hi guys, As you already know from my post on Facebook our BlackBerry product was promoted on RIM]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Hi guys,</p>
<p>As you already know from my post on <a href="http://www.facebook.com/boyanov" target="_blank">Facebook</a> our <a href="http://appworld.blackberry.com/webstore/content/2009" target="_blank">BlackBerry</a> product was promoted on <a href="http://appworld.blackberry.com/webstore/" target="_blank">RIM&#8217;s App World</a> as a featured application last Friday. We are now getting thousands of downloads and activations, about 400 per hour. That is good.</p>
<p><a href="http://appworld.blackberry.com/webstore/content/2009"><img class="alignright size-full wp-image-95" style="border:0 none;margin:10px;" title="20091002-2118_InterlectaAppWorld_2001_crop320x130rns" src="http://gambari.wordpress.com/files/2009/10/20091002-2118_interlectaappworld_2001_crop320x130rns.jpg" alt="20091002-2118_InterlectaAppWorld_2001_crop320x130rns" width="320" height="130" /></a>We get quite good exposure not only trough App World but also from other mobile portals. Although, we need to develop our business and move the company to the next stage.</p>
<p>It&#8217;s been couple of months already since we started looking for new funding sources. Our company <a href="http://home.interlecta.com/" target="_blank">Interlecta</a> has been privately held for almost 3 years, self funded as well, but it seems it is time for a change.</p>
<p>Right now we are talking to several potential investors (Corp, AI&#8217;s &#38; VC&#8217;s) that are current or potential customers of our products, but not all opportunities look that promising or suitable for us.</p>
<p>So, if you think that you know someone or have friend of a friend who may know someone &#8230; any ideas are welcome.</p>
<p>And of course, the standard finders fee will be applied to everyone that refers an investor that turns to a deal.</p>
<p>According to quite few specialists that I&#8217;m talking to recently next several months will be the best time to invest in start-up&#8217;s simply because there are not that many left and those that survived are expected to have a good value.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Translation Outsourcing Cost Considerations]]></title>
<link>http://multilingualcmsguide.wordpress.com/2009/09/22/cost-considerations-for-outsourcing-translation/</link>
<pubDate>Tue, 22 Sep 2009 21:16:21 +0000</pubDate>
<dc:creator>envoytrading</dc:creator>
<guid>http://multilingualcmsguide.wordpress.com/2009/09/22/cost-considerations-for-outsourcing-translation/</guid>
<description><![CDATA[When it really matters, use people to translate There&#8217;s pretty wide-spread acceptance that a h]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p><strong>When it really matters, use people to translate</strong><br />
There&#8217;s pretty wide-spread acceptance that a human doing the translation is going to yield a better final product than machine translation. There may be times when using machine translation software is an acceptable alternative, or even an effective starting point for human translation, but when it comes to communicating the essence of your company and product&#8230;there &#8217;s really no substitute for thoughtful human translation.</p>
<p><strong>Using internal staff for translation work</strong><br />
Some companies keep costs low by tapping their immediate staff. If a member of your staff is capable of translating your content into a desired language, this can lower costs vs. paying an outside agency and you may even get added benefit if your staff translator is a subject matter expert. As you broaden the languages you wish to offer content in, it&#8217;s almost inevitable that you&#8217;ll need to look outside for human translation assistance. </p>
<p><strong>Several factors affect translation outsourcing costs</strong><br />
The cost of outsourcing the translation from one language to another is based on several factors. These can include: the number of words it takes to convey the information in the target language, the complexity of the languages involved, the subject matter, turn-around time and specifically which languages are involved.</p>
<p><strong>Translation Pairs</strong><br />
The language you are translating from and the language you are translating to also impact costs. For example &#8211; 600 general topic English words translated into Japanese might run about $115 at an agency, whereas 600 Dutch, Czech or Greek words translated into Japanese at the same agency could run 50% more, and French to Japanese could be 15% less than the English. This can be due to availability of translators for the languages in question, agency geographic location and reputation. You can run your own calculations selecting different from and to languages at <a href="http://www.translated.net/en/preventivo.php?refid=4849" target="_blank">Translated.net</a></p>
<p> <a href="http://www.translated.net/en/preventivo.php?refid=4849"><img class="alignnone size-full wp-image-96" title="language translation pairs" src="http://multilingualcmsguide.wordpress.com/files/2009/09/translations.jpg" alt="language translation pairs" width="349" height="200" /></a><br />
<strong></strong><br />
<strong>A web page is easier to translate than an entire web site<br />
</strong>Translating of text from a document or web page is a more straight-forward and lower cost process than translating or localizing an entire web site. &#8220;Isn&#8217;t a web site just a collection of web pages&#8221; you may ask? Yes, but&#8230;localization of a web site is not limited to just the translation of your content. Error messages, alerts and HTML elements like page titles, keywords, page descriptions and &#8220;alt&#8221; tags also need proper localization to ensure visitors have a good experience and to maintain and maximize search engine indexing. Text in images and graphics should be considered as well. Other considerations may include whether your web site uses a database, Flash components, forms and scripting languages.</p>
<p>For more specific costs, see the <a title="Translated.net web site translation estimator" href="http://www.translated.net/en/preventivo-web.php?refid=4849" target="_blank">web site translation estimator</a><br />
<em>(by Translated.net)</p>
<p></em><strong>Using a CMS</strong><br />
A web content management system (or &#8220;CMS&#8221;) can automate many aspects of international language web sites. One translation company even offers a <a title="Human Translation API" href="http://www.translated.net/en/hts.php?refid=4849://" target="_blank">Human Translation API</a>, whereby your content management system can automatically send text out for translation and receive it back into the system. Generally the human translation process is not thought to be ripe for automation, but by using innovative Web Services such as a human translation API or leveraging a <a href="http://multilingualcmsguide.wordpress.com/2009/09/01/usingxliff/" target="_self">CMS that can support XLIFF</a>, the process of including high quality human translated content is speeding up.</p>
<p> </p>
<p>If you have questions about this post, you can <a title="email us" href="http://multilingualcmsguide.wordpress.com/contact-us/" target="_self">send us an email</a>.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Machine Translation - I still do not get it]]></title>
<link>http://pangeanic.wordpress.com/2009/08/24/machine-translation-i-still-do-not-get-it/</link>
<pubDate>Mon, 24 Aug 2009 12:15:57 +0000</pubDate>
<dc:creator>pangeanic</dc:creator>
<guid>http://pangeanic.wordpress.com/2009/08/24/machine-translation-i-still-do-not-get-it/</guid>
<description><![CDATA[I recently received an email from a company I am trying to introduce to the advantages of MT. They d]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p><span style="font-family:georgia;">I recently received an email from a company I am trying to introduce to the advantages of MT. They deal mostly with a closed environment and the source language is EN (even if poor English sometimes). They are a perfect candidate for automation as they deal mostly with user manuals and controlled documentation.</span><br />
<span style="font-family:georgia;"> The comment in question was:</span></p>
<p><span style="font-family:arial;font-style:italic;">&#8220;I understand [...] to share TMs for translation, however, I still do not have an understanding on &#8220;Machine Translation&#8221;</span><span style="font-family:arial;font-style:italic;"> which quality still cannot apply to real job yet in Japanese related translation. Even though the technology developed among European languages, the people who do not know European languages like us still worry about the quality, because we cannot judge them. This is my honest impression.</span><br />
<span style="font-family:arial;font-style:italic;"> </span><span style="font-family:georgia;"><span style="font-family:arial;font-style:italic;">It sounds simple question to you, but it&#8217;s my primitive question!&#8221;</span></span></p>
<p>Creating a solution which is good for everything is out of the question (for now). Many have tried to climb up that mountain only to die in the attempt.</p>
<p>Recently, Google has attempted such a solution with its <a href="translate.google.com" target="_self">Google Translator</a> tool, and it works more or less well. It is particularly useful for general information and gisting. I do know it has become a reference tool for many linguists (novices with lack of knowledge and experts whose brains are too full with information or just can&#8217;t remember). It is much quicker to ask GT than to check terminology in the EU&#8217;s official website <a href="http://iate.europa.eu/" target="_blank">IATE</a>, for example.</p>
<p>GT works well if you try to translate an EULA, for example. The quality is rather good, as it has plenty of material aligned from companies&#8217; websites. For many other areas, the results vary from &#8220;good enough&#8221; (i.e. usable with some post-editing) to &#8220;gisting&#8221; and to purely bad. GT has been extremely valuable for me when I need to know what was being said in certain documents in Polish, Chinese, Russian or Japanese. It wasn&#8217;t a professional translation but hey! it was free and most importantly it was there when I needed it and served an invaluable gisting purpose. It was the difference between not-knowing and knowing, even if badly or <span style="font-style:italic;">mechanically</span> put forward.</p>
<p>Serious machine translation is a rather different concept. I favour statistical to rule-based for many reasons. SMT (statistical machine translation) is based on the concept of logic and maths. Based on the fact that a languages normally has between 10.000 basic words (as in German, the rest are compound words) to around 30.000 (the vast majority), one can guess that with a 2M word corpus everything that language has to say has almost been said. This is not so, as there are numerous repetitions, changes of meaning, technical words, set expressions, etc. Perhaps one reaches 2M and not every verb has been conjugated in all its forms. But a large chunk has -at least the ones we will use for 90% of our daily communications. You can build on this to create a model upon which a machine can expect and compute matches that nowadays are done by hand and in translator&#8217;s heads.</p>
<p>If you reduce the scope of your expectations (I only want electronics/ automotive/ legal/ agricultural/ physics domains) then you can be more precise. You need texts that deal with each domain and can for example disregard words and texts that deal with &#8220;butter&#8221;, &#8220;international relations&#8221;, &#8220;motorbike instructions&#8221;, &#8220;coffee&#8221; or &#8220;fishing rights&#8221; when constructing a model for electronics. &#8220;Motorbike instructions&#8221; would be fine for a computer-based model dealing with engineering, for example.</p>
<p><a href="statmt.org" target="_blank">Moses</a>, the engine best, state-of-the-art, open source engine resolves likely possibilities of a given source word being a target word X by applying a set of equations, which match word occurrence (the number of times foreign word X happens every time source word X happens). It works wonderfully. And the more source material during the training, the better.</p>
<p>My colleague (not new to translation but fairly uncomfortable with the concept of machine translation) mentioned regulating (controlling) the way the source material behaves. This can work well and it is true for rule-based models (Systran) which work on the basis that A is always 1, B is always 2, C is 3, and therefore AB must be 12 and CB 31, etc. Controlling the input helps with statistical models, but there is no such a big need as millions of words are computed every time, in every sentence, i.e. the &#8220;correspondence&#8221; equations are applied on each sentence throughout the 2M-3M-4M-5M words corpus. This is how Google Translator behaves and how our machine PangeaMatic is learning to behave within specific language domains.</p>
<p><em>Next time you think languages, think Pangeanic</em></p>
</div>]]></content:encoded>
</item>

</channel>
</rss>
