<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress.com" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>invisible-web &amp;laquo; WordPress.com Tag Feed</title>
	<link>http://en.wordpress.com/tag/invisible-web/</link>
	<description>Feed of posts on WordPress.com tagged "invisible-web"</description>
	<pubDate>Wed, 10 Feb 2010 06:33:13 +0000</pubDate>

	<generator>http://en.wordpress.com/tags/</generator>
	<language>en</language>

<item>
<title><![CDATA[Investigative reporting 3.0--or, Web stalking]]></title>
<link>http://newscrucible.wordpress.com/2010/01/25/investigative-reporting-or-web-stalking/</link>
<pubDate>Mon, 25 Jan 2010 22:01:01 +0000</pubDate>
<dc:creator>newscrucible</dc:creator>
<guid>http://newscrucible.wordpress.com/2010/01/25/investigative-reporting-or-web-stalking/</guid>
<description><![CDATA[The following is based on an Investigative Reporters and Editors seminar this weekend in Birmingham,]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>The following is based on an <a href="http://www.ire.org/" target="_blank">Investigative Reporters and Editors</a> seminar this weekend in Birmingham, Alabama.</p>
<p style="text-align:center;">
<div id="attachment_259" class="wp-caption aligncenter" style="width: 416px"><a href="http://newscrucible.files.wordpress.com/2010/01/comic-reporters.jpg"><img class="size-full wp-image-259 " title="Comic - reporters" src="http://newscrucible.files.wordpress.com/2010/01/comic-reporters.jpg?w=406&#038;h=264" alt="" width="406" height="264" /></a><p class="wp-caption-text">IRE talked about how the Web--both the Surface (&#34;visible&#34;) and Deep (&#34;invisible&#34;) Webs--can help reporters address the occupational hazard of having to know everything about anything at any given moment.</p></div>
<p>The hour-long presentation, <em>Effective use of the Internet, </em>was fittingly framed by the first word in the title. Mark Horvit, IRE&#8217;s executive director, began by emphasizing that reporters should approach online research armed with a strategy (i.e., key words to search and a general idea of what&#8217;s available and desirable) to avoid getting distracted by the Web&#8217;s potentially cavernous detours. Step one, Horvit said, is not to log on, but to sketch out a plan.</p>
<p>Important for every investigative journalist to know about search engines is that a Google search, for example, does not look through the actual Internet, per se. It searches Google&#8217;s servers, which are stocked with information that the search engine company&#8217;s Web &#8220;crawlers&#8221; have found and stored.</p>
<p><strong>What they&#8217;re missing &#8211; eye-opening stats:</strong></p>
<ul>
<li>Google searches far less than half of what&#8217;s out there</li>
<li>Total shared results of any two search engines: 8.9 percent</li>
<li>Any three search engines: 2.2 percent</li>
<li><em>Above figures from 2007 study by Dogpile, Penn State and Queensland University of Technology</em></li>
<li>Some estimate the &#8220;invisible&#8221; Web is 550 times bigger than the &#8220;visible&#8221; Web.</li>
<li>Google says more than 1,000 federal government sites can&#8217;t be crawled.</li>
</ul>
<p>If (way) more than half the Web isn&#8217;t showing up in a search engine result, then it is important for investigative reporters to know where to go to find it. Here are some of the principles behind efficiently conducting those searches, with both superficial tools and subterraneous means.</p>
<p><strong>Surface Web &#8211; Savvy searching tips:</strong></p>
<ul>
<li>Treat info online as one would any source (confirm)</li>
<li>Find out who owns the Web site</li>
<li>Know Google advanced search options (esp. domain and file type)</li>
<li>Archived Web: Gone doesn&#8217;t mean forever. (Google cache, Wayback Machine)</li>
<li>Consult at least two other search engines&#8211;each has its own strengths and weaknesses.</li>
<li>People finders (i.e., www.pipl.com, www.whitepages.com, etc.)</li>
<li>Social media searches (i.e., www.whostalkin.com&#8230; Who&#8217;s Talkin&#8217;, not Who Stalkin&#8217;&#8230; or so they say)</li>
<li>Use Wikipedia for the footnotes only</li>
</ul>
<p>The session then took Web searches to the next level&#8230; well, at least a step above what amateur voyeurs might use to get information.</p>
<p><strong>Deep Web &#8211; Search like a pro:</strong></p>
<ul>
<li>Know what search engines typically miss (databases, content behind firewalls and registration screens, ASP/dynamically generated pages, Robo.txt excluded pages)</li>
<li>The information is out there, but the key is to find organizations that make is more easily accessible. Bookmark these!</li>
<li>Directories by and for journalists (<a href="http://www.ire.org/resourcecenter/nettour/index.html">&#8216;Net Tour</a> and <a href="http://www.reporter.org/desktop/">Reporter&#8217;s Desktop</a>)</li>
<li>Know the gateways to public records</li>
<li>Pipl actually claims to access the Deep Web. Try it. Pipl yourself. It&#8217;s scary how much information it digs up with just a name.</li>
<li>The census is your friend, especially in 2010</li>
<li>To get fully submerged&#8230; go to IRE&#8217;s Web site!</li>
</ul>
<p>I&#8217;m not going to copy-paste in this post all of the useful links for discovering the &#8220;hidden Web&#8221; and the &#8220;dead Web,&#8221; which were hyper-linked in the PowerPoint presentation that Mark offered to send out to anybody at the day-long seminar who asked for it. All of this stuff is available at the organization&#8217;s site, and I can see what the nominal membership fees pay for, seriously.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[The Dark Web Explained]]></title>
<link>http://witnessthis.wordpress.com/2009/12/14/the-dark-web-explained-2/</link>
<pubDate>Mon, 14 Dec 2009 08:06:31 +0000</pubDate>
<dc:creator>Galen Schultz</dc:creator>
<guid>http://witnessthis.wordpress.com/2009/12/14/the-dark-web-explained-2/</guid>
<description><![CDATA[DEEP NET: The darkness that lies beneath &#8230; I HEARD an interesting fact on Stephen Fry’s quiz s]]></description>
<content:encoded><![CDATA[DEEP NET: The darkness that lies beneath &#8230; I HEARD an interesting fact on Stephen Fry’s quiz s]]></content:encoded>
</item>
<item>
<title><![CDATA[Internet Research Project]]></title>
<link>http://4rxt.wordpress.com/2009/09/28/internet-research-project-fa-09/</link>
<pubDate>Mon, 28 Sep 2009 19:34:38 +0000</pubDate>
<dc:creator>Elizabeth</dc:creator>
<guid>http://4rxt.wordpress.com/2009/09/28/internet-research-project-fa-09/</guid>
<description><![CDATA[For this project, you will be evaluating and reviewing an internet research tool.  You will find the]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>For this project, you will be evaluating and reviewing an <a href="http://4rxt.wetpaint.com/page/Internet+Research" target="_blank">internet research tool</a>.  You will find the name of and link to your assigned tool on the class wiki (<a href="http://eng1020ec.wetpaint.com/page/Students+Pages" target="_blank">ENG 1020</a>, <a href="http://eng122ec.wetpaint.com/page/Students+Pages" target="_blank">ENG 122</a>).</p>
<p>Write a review of the research tool assigned to you and post the review in your blog.  Your review should include all of the following information that is relevant for the type of tool you were assigned:</p>
<ul>
<li>Name of and link to the tool</li>
<li>Summary or description of the tool</li>
<li>Strengths</li>
<li>Weaknesses</li>
<li>Search engines, directories, and other applications searched</li>
<li>Databases</li>
<li>Operators</li>
<li>Case sensitivity</li>
<li>Stop words</li>
<li>Advanced search function</li>
<li>Limits</li>
<li>Sorting</li>
<li>Display</li>
<li>Help function</li>
<li>Special features</li>
</ul>
<p>See the <a href="http://4rxt.wordpress.com/2009/09/28/presentation-internet-research-finding-websites-blogs-wikis-and-more/" target="_blank"></a>column headings of the <a href="http://www.searchengineshowdown.com/features/" target="_blank">Search Engine Features Chart</a> for explanations if needed.  You can see sample reviews by clicking on links <a href="http://www.searchengineshowdown.com/reviews/" target="_blank">here</a> and <a href="http://4rxt.wetpaint.com/page/Meta+and+Multi+Search+Engines" target="_blank">here</a>.</p>
<p>Use the department and course number (<em>ENG 1020</em> or <em>ENG 122</em>) and other appropriate tags (”Labels” on Blogger) for the post.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[הרשת הבלתי נראית - The Invisible Web]]></title>
<link>http://researchci.com/2009/09/14/%d7%94%d7%a8%d7%a9%d7%aa-%d7%94%d7%91%d7%9c%d7%aa%d7%99-%d7%a0%d7%a8%d7%90%d7%99%d7%aa-the-invisible-web/</link>
<pubDate>Mon, 14 Sep 2009 17:49:51 +0000</pubDate>
<dc:creator>Hamutal Schieber</dc:creator>
<guid>http://researchci.com/2009/09/14/%d7%94%d7%a8%d7%a9%d7%aa-%d7%94%d7%91%d7%9c%d7%aa%d7%99-%d7%a0%d7%a8%d7%90%d7%99%d7%aa-the-invisible-web/</guid>
<description><![CDATA[כשאנו תרים אחר מידע ברשת, ניצבים בפנינו שני אתגרים גדולים עיקריים: האתגר הראשון הוא הקושי לאתר מידע ]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>כשאנו תרים אחר מידע ברשת, ניצבים בפנינו שני אתגרים גדולים עיקריים: האתגר הראשון הוא הקושי לאתר מידע במהירות וביעילות – דבר שאמור להיות קל יותר בשימוש באופרטורים עליהם כבר <a href="http://researchci.com/2009/07/15/hello-world/" target="_blank">כתבתי כאן.</a> האתגר השני – שאנחנו בד&#8221;כ פחות מודעים אליו – הוא החיפוש ב&#8221;<a href="http://www.google.com/search?hl=en&#38;rlz=1G1GGLQ_ENIL329&#38;q=&#34;the+invisible+web&#34;&#38;aq=f&#38;oq=&#38;aqi=g8" target="_blank">רשת הבלתי נראית</a>&#8220;.</p>
<p>אפתח בהסבר פשטני על אופן העבודה של מנועי חיפוש כגון גוגל. ובכן, תוכנה שמכונה בשמות המסתוריים והמעקצצים &#8220;זוחלים&#8221;, &#8220;רובוטים&#8221; או &#8220;עכבישים&#8221;, עושה את דרכה ברשת האינטרנט, ומתייגת תכנים. אותם &#8220;זוחלים&#8221; עוברים בין הלינקים, ומאחסנים דפים לשליפה מהירה בזיכרון (בעבר היו הזוחלים הללו מסוגלים לעבור רק בדפים הכתובים ב-HTML או בטקסט, אבל כיום מנועי החיפוש יודעים להמיר גם פורמטים כגון PDF ולאחסן אותם).</p>
<p>ה&#8221;רשת הבלתי נראית&#8221; מורכבת מכל אותם דפים שה&#8221;זוחלים&#8221; של מנועי החיפוש לא מציגים לנו. המדובר במסה עצומה של דפים (ו<a href="http://brightplanet.com/" target="_blank">יש הסבורים </a>כי &#8220;הרשת הבלתי נראית&#8221; גדולה אף פי 500 מהיקף הדפים הזמינים דרך מנועי חיפוש). כך למשל, הזוחלים לא יודעים להיכנס דרך מסך כניסה של בסיסי נתונים, קטלוגים, מאגרי מאמרים ועוד. הם גם מתעלמים בכוונה מחלק מהדפים בחיפוש, שהם סבורים שסתם יעמיסו על הגולש, ומתעלמים גם מהדפים שבעליהם לא רוצים שמנוע החיפוש יאתר אותם.</p>
<p>לכן, כשאנו ניגשים למחקר ברשת, אסור לנו להסתמך רק על חיפוש בגוגל, ואנחנו צריכים לשלב בחיפוש שלנו גם חיפוש ברשת הבלתי נראית.</p>
<p>הנה כמה טיפים:</p>
<p>1. בין מכרות הזהב ברשת ניתן למנות את הבאים:</p>
<ul>
<li><a href="http://www.intute.ac.uk" target="_blank">http://www.intute.ac.uk</a>/ , <a href="http://infomine.ucr.edu" target="_blank">http://infomine.ucr.edu</a>/, <a href="http://lii.org/" target="_blank">http://lii.org</a>/ &#8211; ספריות מקורות בנושאים שונים</li>
<li>הספריות של <a href="http://directory.google.com/" target="_blank">גוגל </a>ו-<a href="http://dir.yahoo.com/" target="_blank">יאהו</a></li>
<li><a href="http://aip.completeplanet.com/" target="_blank">Complete Planet</a> ו-<a href="http://www.oaister.org/" target="_blank">Oaister </a>– מנועי חיפוש לרשת הבלתי נראית.</li>
</ul>
<p>2. הכנת רשימת מקורות שימושית וחיפוש נפרד ברשימת האתרים הללו בכל חיפוש, בנוסף לחיפוש ברשת. למשל: בלוגים מקצועיים או פרטיים בנושא החיפוש, פורומים, אתרי חברות, חברות יעוץ ועוד.</p>
<p>איך מאתרים כאלה? באופן משולב, הכולל:</p>
<ul>
<li>חיפוש כללי של הנושא ואיתור חברות שהאתרים שלהם עולים בהקשר זה, וכן בחינה בתשומת לב של שמות חברות הנזכרות במאמרים / פרופילים על התעשיה או הנושא הרלוונטיים. במקרה שאחת החברות היא חברה ציבורית, כדאי לקרוא דוחות שנתיים שלה, על מנת לצבור ידע נוסף בנושא.</li>
<li>איתור של מקור לפי מאמר אחד שאיתרתם – לעתים יש דפים שמאופסנים על גבי הרשת וניתן להגיע אליהם באמצעות מנועי חיפוש, אך חיפוש ישיר באתר יניב הרבה יותר תוצאות מאלה שהגעתם אליהן דרך מנוע החיפוש, מסיבות שונות. מומלץ תמיד לבחון את האתר שהוא המקור למאמר המעניין שאיתרתם, בין אם ע&#8221;י לחיצה על עמוד הבית של האתר, ובין אם ע&#8221;י מחיקה של סיומת ה-URL עד שתגיעו ל-directory שיש גישה אליה (לעתים זה יהיה רק מהעמוד הראשי, כלומר תצטרכו למחוק את כל מה שמופיע אחרי הסיומת com או org או co.uk וכיו&#8221;ב).</li>
<li>חיפוש של מקור ספציפי באמצעות גוגל – לאחר שזיהיתם אתרים רלוונטיים, ניתן לנסות לערוך בהם חיפוש עמוק ע&#8221;י שימוש באופרטור :site. כלומר, להכניס את מילות החיפוש ואז להכניס את האופרטור ובצמוד לו את כתובת האתר שחשפתם, רצוי בלי ה-www משום שלעתים במקום www  יש שם של ספריה ספציפית (כגון finance.google.com, ir.nestle.com…) כך: site:XYZ.com</li>
<li>איתור של פורום/ ספריה / בלוג באמצעות חיפוש המלים הרלוונטיות, והוספת המילה directory, forum, database, blog וכיו&#8221;ב.</li>
</ul>
<p>כמובן שבאופן אידאלי, מי שעורך מחקר באופן מקצועי, בין אם כפונקציה בחברה או כמידען או חוקר שוק (כמוני, למשל), צריך גם לרכוש מנוי למאגר אחד הכולל מאמרים ממקורות שונים בתשלום כגון מאגר המידע העסקי <a href="http://arad-ophir.co.il/" target="_blank">Nexis.com</a>, וכן מנויים למאגרים מקצועיים אחרים בתשלום, משום שאין דרך לעקוף את דרישת התשלום על מאמרים איכותיים יותר.</p>
<p><strong> אשמח לקבל מכם תגובות והפניות למקורות נוספים, וכמובן לענות על שאלות.</strong></p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Video:  "[OSTI] Deep Web Video"]]></title>
<link>http://4rxt.wordpress.com/2009/05/28/video-osti-deep-web-video/</link>
<pubDate>Fri, 29 May 2009 00:11:59 +0000</pubDate>
<dc:creator>Elizabeth</dc:creator>
<guid>http://4rxt.wordpress.com/2009/05/28/video-osti-deep-web-video/</guid>
<description><![CDATA[Video:  &#8220;[OSTI] Deep Web Video&#8220; Office of Scientific &amp; Technical Information]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Video:  &#8220;<a href="http://www.youtube.com/watch?v=YskdGh8XU5I" target="_blank">[OSTI] Deep Web Video</a>&#8220;</p>
<p style="padding-left:30px;"><span style='text-align:center; display: block;'><object width='425' height='350'><param name='movie' value='http://www.youtube.com/v/YskdGh8XU5I&#038;rel=1&#038;fs=1&#038;showsearch=0&#038;hd=0' /><param name='allowfullscreen' value='true' /><param name='wmode' value='transparent' /><embed src='http://www.youtube.com/v/YskdGh8XU5I&#038;rel=1&#038;fs=1&#038;showsearch=0&#038;hd=0' type='application/x-shockwave-flash' allowfullscreen='true' width='425' height='350' wmode='transparent'></embed></object></span></p>
<p><a href="http://www.osti.gov/" target="_blank">Office of Scientific &#38; Technical Information</a></p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Searching the Deep Web]]></title>
<link>http://librarianbrain.wordpress.com/2009/05/27/searching-the-deep-web/</link>
<pubDate>Wed, 27 May 2009 16:25:47 +0000</pubDate>
<dc:creator>virtualnotes</dc:creator>
<guid>http://librarianbrain.wordpress.com/2009/05/27/searching-the-deep-web/</guid>
<description><![CDATA[Experts say search engines such as Yahoo! and Google only pick up about 1% of the information availa]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p><span style="color:#000000;">Experts say search engines such as Yahoo! and Google only pick up about 1% of the information available on the Internet. The rest of that information is considered to be hidden in the deep web, also referred to as the invisible web. So how can you find all the rest of this information? This list -</span> <a title="Permanent Link: 100 Useful Tips and Tools to Research the Deep Web" rel="bookmark" href="http://www.online-college-blog.com/index.php/features/100-useful-tips-and-tools-to-research-the-deep-web/"><span style="color:#0000ff;"><em><strong>100 Useful Tips and Tools to Research the Deep Web</strong></em></span></a><span style="color:#0000ff;"> </span><span style="color:#000000;">offers 100 tips and tools to help you get the most out of your Internet searches.</span></p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Searching the Deep Web]]></title>
<link>http://virtualnotes.wordpress.com/2009/05/26/searching-the-deep-web/</link>
<pubDate>Tue, 26 May 2009 16:32:34 +0000</pubDate>
<dc:creator>virtualnotes</dc:creator>
<guid>http://virtualnotes.wordpress.com/2009/05/26/searching-the-deep-web/</guid>
<description><![CDATA[Experts say search engines such as Yahoo! and Google only pick up about 1% of the information availa]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p><span style="color:#000000;">Experts say search engines such as Yahoo! and Google only pick up about 1% of the information available on the Internet. The rest of that information is considered to be hidden in the deep web, also referred to as the invisible web. So how can you find all the rest of this information? This list -</span> <a title="Permanent Link: 100 Useful Tips and Tools to Research the Deep Web" rel="bookmark" href="http://www.online-college-blog.com/index.php/features/100-useful-tips-and-tools-to-research-the-deep-web/"><span style="color:#0000ff;"><em><strong>100 Useful Tips and Tools to Research the Deep Web</strong></em></span></a><span style="color:#0000ff;"> </span><span style="color:#000000;">offers 100 tips and tools to help you get the most out of your Internet searches.</span></p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Deep Web]]></title>
<link>http://ziaahmedshaikh.wordpress.com/2009/03/19/deep-web/</link>
<pubDate>Thu, 19 Mar 2009 10:40:06 +0000</pubDate>
<dc:creator>ziaahmedshaikh</dc:creator>
<guid>http://ziaahmedshaikh.wordpress.com/2009/03/19/deep-web/</guid>
<description><![CDATA[Deep Web, is also know as &#8216;Hidden Web&#8217;, &#8216;Dark Web&#8217;, or the &#8216;Invisible ]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p><strong>Deep Web,</strong> is also know as &#8216;Hidden Web&#8217;, &#8216;Dark Web&#8217;, or the &#8216;Invisible Web&#8217;.  It is mostly generated by database-driven pages that are usually visible to authorized members. Who can access the hidden information after logging into the system.</p>
<p>Such data is totally hidden from search enging crawlers (or the &#8217;spiders&#8217;), that means Google, Yahoo, MSN, Altavista and other search endinges cannot search data from these pages.<!--more--></p>
<p>Web crawlers of todays technology can only read &#8216;<a title="Surface Web" href="http://ziaahmedshaikh.wordpress.com/2009/03/19/surface-websurface-web/" target="_blank">surface web</a>&#8216;.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Internet Research Presentation]]></title>
<link>http://4rxt.wordpress.com/2009/03/01/internet-research-presentation-sp-09/</link>
<pubDate>Sun, 01 Mar 2009 20:43:35 +0000</pubDate>
<dc:creator>Elizabeth</dc:creator>
<guid>http://4rxt.wordpress.com/2009/03/01/internet-research-presentation-sp-09/</guid>
<description><![CDATA[Internet Research Presentation]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p><a href="http://www.slideshare.net/eclark131/internet-research-1087597">Internet Research Presentation</a></p>
<p style="padding-left:30px;"><!-- SlideShare error: doc is missing or has illegal characters /[^-_a-zA-Z0-9]/ --></p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Deep Web]]></title>
<link>http://deancorner.wordpress.com/2009/02/27/deep-web/</link>
<pubDate>Fri, 27 Feb 2009 21:23:01 +0000</pubDate>
<dc:creator>Dean</dc:creator>
<guid>http://deancorner.wordpress.com/2009/02/27/deep-web/</guid>
<description><![CDATA[When we perform Google searches how do we know that Google has searched in every database on the Int]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>When we perform Google searches how do we know that Google has searched in every database on the Internet to come up with suggested matches for what we&#8217;re looking for?  The easy answer is that Google can&#8217;t do that and we are presented only with hits for Web pages that Google knows about.   Considering that this past summer Google added the one trillionth address to its list of Web pages, wouldn&#8217;t one think that&#8217;s far and away enough?</p>
<p>That&#8217;s where the Deep Web or the Invisible Web comes into play.  Chris Sherman and Gary Price wrote a terrific book in 2001, <em><a href="http://www.amazon.com/Invisible-Web-Uncovering-Information-Sources/dp/091096551X/ref=sr_1_1?ie=UTF8&#38;s=books&#38;qid=1235764735&#38;sr=1-1" target="_self">The Invisible Web</a></em>, that covered dozens and dozens of information sources that search engines haven&#8217;t found.  There have been a few more books on this topic since then but I wish Sherman &#38; Price would update their book. </p>
<p>Most of the Web pages in the Deep Web are from associations, businesses, libraries, universities and government agencies.  The amazing amount of information, statistics, data, etc. that can be found within these is enormous.</p>
<p> There are some great developments in deeper searching that have popped up recently.  Kosmix (<a href="http://www.kosmix.com">www.kosmix.com</a>) started out as a search engine for health and travel information.  It has since developed a platform for a universal search engine that snags data from a lots of sources &#8211; Flickr, Google, Wikipedia, Yahoo Answers, YouTube, and others. </p>
<p>It then creates sort of a customized web page that breaks your search into segments.  I searched for the topic &#8220;Burma&#8221; and Kosmix returned more information than I knew was available.  Everything from reference, media, news &#38; blogs, to ethnic groups, history, shopping, and books.  Sources included Wikipedia, BBC &#38; CNN, Shopping.com, Flickr, SeeqPod, the blog Backtype, and Slideshare.net.  Yes, my search did uncover <em>Burma Shave </em>but the other riches outshone it.  And, yes, Kosmix is one of those Mountain View, CA companies.</p>
<p>Another Deep Web crawler is DeepPeep (<a href="http://www.deeppeep.org">www.deeppeep.org</a>) which is being developed by a professor at the University of Utah.  When I entered my search term, &#8220;Burma,&#8221; I received 143 documents.  I initially thought the search was totally off the wall but when I investigated each retrieved document I discovered what DeepPeep is trying to do.</p>
<p>I had also told DeepPeep to search in &#8220;all domains&#8221; rather than the more selective subjects airfare, book, rental, job, or biology.  Therefore, I hit the mother lode of stuff.  One of the first hits was for horse jobs and, sure enough, Burma is one of the countries listed in the horse-jobs.biz web site.  I couldn&#8217;t figure out how the Hotel Oscar in Athens could be related to Burma but a very close look at the bottom of its home page listed links to other hotels.  What russiamaritime.com had to do with Burma (and just who knew there was a russiamaritme.com?) was also easily discovered.  This is really deep web searching and totally fascinating for those of us who love to bounce around the web discovering Web databases.</p>
<p>So, is there one search engine that does it all?  Obviously, no.  It&#8217;s great to have multiple search engines which create search strategies so differently.  It makes searchers think harder about how to formulate their keyword strategies.  Now, if I could just whittle down my favorite search engines to a five or six from several dozens.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Business and Sales Speaker Sam Richter Shows to Get the edge in 2009]]></title>
<link>http://motivationalspeakersbyfivestarspeakers.wordpress.com/2009/01/06/marketing-sales-business-speaker-sam-richter-2/</link>
<pubDate>Tue, 06 Jan 2009 06:55:07 +0000</pubDate>
<dc:creator>Paul Schmidt</dc:creator>
<guid>http://motivationalspeakersbyfivestarspeakers.wordpress.com/2009/01/06/marketing-sales-business-speaker-sam-richter-2/</guid>
<description><![CDATA[10, 9, 8, 7, 6, 5, 4, 3, 2, 1… Happy New Year! 2009 is here. Getting off to a fast start in 2009 is ]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p><img class="alignleft size-full wp-image-412" style="margin-left:10px;margin-right:10px;" title="Marketing Speaker, Business Speaker, Sales Speaker Sam Ritcher" src="http://fivestarmotivationalspeakersbureau.files.wordpress.com/2009/01/sales_marketing_business_speaker_sam_richter_cold_book.jpg?w=157&#038;h=201" alt="Marketing Speaker, Business Speaker, Sales Speaker Sam Ritcher" width="157" height="201" />10, 9, 8, 7, 6, 5, 4, 3, 2, 1… Happy New Year!  2009 is here.  Getting off to a fast start in 2009 is paramount for everyone in your sales team.   Every Sales person is looking for an edge to shorten a sales cycle or build a tighter relationship with a prospect.</p>
<p>Recently, <a title="Motivational Speakers Bureau, Keynote Speakers Bureau, Leadership Speakers Bureau, Business Speakers Bureau, Leadership Speakers Bureau" href="http://www.fivestarspeakers.com" target="_blank">FIVE STAR Business Speakers</a> had the opportunity to hear Sam Richter.  Sam is a Business and Sales Speaker who wrote Take The Cold Out of Cold Calling.  Sam has developed a vast knowledge on how to leverage the web efficiently to gain a deep understanding and knowledge about a client or prospect.  The tools are practical, ethical, and available to all who will use them. The tools will help you be more knowledgeable about, connect quicker with, and build confidence with your client or prospect.</p>
<p>Read the review on <a title="Business Speaker, Sales Speaker, Marketing Speaker Blog Entry Sam Richter" href="http://fivestarmotivationalspeakersbureau.wordpress.com/2009/01/08/marketing-sales-business-speaker-sam-richter/" target="_blank"><strong>Business, Marketing and Sales Speaker Sam Richter book&#8217;s</strong>.</a> <em>Take The Cold Call out of Cold Calling</em></p>
<p>To book Sam Richter &#8211; Business, Marketing, and Sales Speaker contact <a title="Motivational Speakers Bureau, Business Speakers Bureau, Leadership Speakers Bureau" href="http://www.myfivestarspeakers.com/inforequestforms/contactrequest.asp" target="_self">FIVE STAR Speakers</a> @ 913.648.6480.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Business and Sales Speaker Sam Richter Shows to Get the edge in 2009]]></title>
<link>http://fivestarmotivationalspeakersbureau.wordpress.com/2009/01/06/marketing-sales-business-speaker-sam-richter-2/</link>
<pubDate>Tue, 06 Jan 2009 06:55:07 +0000</pubDate>
<dc:creator>Paul Schmidt</dc:creator>
<guid>http://fivestarmotivationalspeakersbureau.wordpress.com/2009/01/06/marketing-sales-business-speaker-sam-richter-2/</guid>
<description><![CDATA[10, 9, 8, 7, 6, 5, 4, 3, 2, 1… Happy New Year! 2009 is here. Getting off to a fast start in 2009 is ]]></description>
<content:encoded><![CDATA[10, 9, 8, 7, 6, 5, 4, 3, 2, 1… Happy New Year! 2009 is here. Getting off to a fast start in 2009 is ]]></content:encoded>
</item>
<item>
<title><![CDATA[Deep Web]]></title>
<link>http://answermaven.com/2008/12/30/deep-web/</link>
<pubDate>Wed, 31 Dec 2008 01:14:58 +0000</pubDate>
<dc:creator>answermaven</dc:creator>
<guid>http://answermaven.com/2008/12/30/deep-web/</guid>
<description><![CDATA[Tuesday, December 30, 2008 8:00 p.m. EST This LLX.com article by Marcus P. Zillman offers a wealth o]]></description>
<content:encoded><![CDATA[Tuesday, December 30, 2008 8:00 p.m. EST This LLX.com article by Marcus P. Zillman offers a wealth o]]></content:encoded>
</item>
<item>
<title><![CDATA[Timeline of events related to the Deep Web]]></title>
<link>http://papergirls.wordpress.com/2008/10/07/timeline-deep-web/</link>
<pubDate>Mon, 06 Oct 2008 18:24:43 +0000</pubDate>
<dc:creator>Maureen Flynn-Burhoe</dc:creator>
<guid>http://papergirls.wordpress.com/2008/10/07/timeline-deep-web/</guid>
<description><![CDATA[Timeline of selected events related to the Deep Web (work in progress) 1980 Tim Berners-Lee &#8220;d]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p><strong>Timeline of selected events related to the Deep Web (work in progress)</strong></p>
<p><strong>1980</strong> Tim Berners-Lee &#8220;developed his first hypertext system, &#8220;Enquire&#8221; for his own use (although unaware of the existence of the term HyperText). With a background in text processing, real-time software and communications, Tim decided that high energy physics needed a networked hypertext system and CERN was an ideal site for the development of wide-area hypertext ideas (<a href="http://lost-contact.mit.edu/afs/net/project/afs32/cern.ch/w3.org/www/People.html#17">CERN</a>).&#8221;</p>
<p><strong>1989</strong> Tim Berners-Lee started the WorldWideWeb project at <a href="http://lost-contact.mit.edu/afs/net/project/afs32/cern.ch/w3.org/www/People.html#17">CERN</a>.</p>
<p><strong>1992-09</strong> Arthur Secret at the CERN created the first web gateway to a relational database system RDB (Shestakov 2008-05).</p>
<p><strong>1994</strong> Dr. Jill Ellsworth &#8220;first coined the phrase &#8220;invisible Web&#8221; to refer to information content that was &#8220;invisible&#8221; to conventional search engines (Bergman 2001 citing Garcia 1996).&#8221; See <a href="http://nlplab.kaist.ac.kr/~jwchoi/coreonto_data/Computer_networking/Networking_standards/Internet_standards/Internet_protocols/Application_layer_protocols/World_Wide_Web/00454403.txt">also</a></p>
<p><strong>1996</strong> Frank Garcia (<a href="http://web.archive.org/web/19961205083117/http://tcp.ca/Jan96/BusandMark.html" target="_blank">1996</a>) claimed Texas-based university professor Jill H. Ellsworth (d.2002), Internet consultant for Fortune 500 companies, coined the term “Invisible Web” in 1996 to refer to websites that are not registered with any search engine. ” “Ellsworth is co-author with her husband, Matthew V. Ellsworth, of <em>The Internet Business Book</em> (John Wiley &#38; Sons, Inc., 1994), <em>Marketing on the Internet: Multimedia Strategies for the World Wide Web </em>(John Wiley &#38; Sons, Inc.), and <em>Using CompuServe</em>. She has also explored education on the Internet, and contributed chapters on business and education to the massive tome, <em>The Internet Unleashed</em>.”</p>
<blockquote><p>[S]igns of an unsuccessful or poor site are easily identified, says Jill Ellsworth. “Without picking on any particular sites, I’ll give you a couple of characteristics. It would be a site that’s possibly reasonably designed, but they didn’t bother to register it with any of the search engines. So, no one can find them! You’re hidden. I call that the invisible Web. Ellsworth also makes reference to the “dead Web,” which no one has visited for a long time, and which hasn’t been regularly updated (<a href="http://web.archive.org/web/19961205083117/http://tcp.ca/Jan96/BusandMark.html" target="_blank">Garcia 1996</a>).</p></blockquote>
<p><strong>1996-12-01</strong> &#8220;The first commercial Deep Web tool (although they referred to it as the &#8220;Invisible Web&#8221;) was @1, announced December 12th, 1996 in partnership with large content providers. According to a December 12th, 1996 press release, @1 started with 5.7 terabytes of content which was estimated to be 30 times the size of the nascent World Wide Web. ( &#8220;<a href="http://web.archive.org/web/19971021232106/www.pls.com/news/pr961212_aol.html" target="_blank">America Online to Place AT1 from PLS in Internet Search Area: <em>New AT1 Service Allows AOL Members to Search &#8220;The Invisible Web&#8221;</em></a>).&#8221;See <a href="http://nlplab.kaist.ac.kr/~jwchoi/coreonto_data/Computer_networking/Networking_standards/Internet_standards/Internet_protocols/Application_layer_protocols/World_Wide_Web/00454403.txt">(Choi 2008-01-07)</a>.&#8221;</p>
<p><strong>1996-12-12 &#8220;</strong>Personal Library Software, Inc. (PLS), the leading supplier of search and retrieval software to the online publishing industry, ushered in the next generation of Internet search engines with the introduction of a new Internet based service, AT1 which combines the best of PLS&#8217;s search, agent and database extraction technology to offer publishers and users something they have never had before: the ability to search for content residing in &#8220;hidden&#8221; databases — those large collections of documents managed by publishers not viewable by Web spiders. AT1 also allows users to create intelligent agents to search newsgroups and websites with E-Mail notification of results (<a href="http://web.archive.org/web/19971021232057/www.pls.com/news/pr961212_at1.html" target="_blank">Press release</a>).&#8221;</p>
<p><strong>1997</strong> Michael Lesk wrote an <a href="http://www.lesk.com/mlesk/ksg97/ksg.html" target="_blank">unpublished paper</a> entitled &#8221;How much information is there in the world?&#8221;], in which he estimated that in 1997, the Library of Congress had between 20 terabytes and 3 petabytes.&#8221; See Choi (2008).</p>
<p><strong>1999-02</strong> Lawrence and Giles (1999) claimed that the publicly indexable World Wide Web (PIW) contained about 800 million pages; the search engine with the largest index, Northern Light, indexed roughly 16% of the publicly indexable World Wide Web;  the combined index of 11 large search engines covered (very) roughly 42% of the publicly indexable World Wide Web.</p>
<p><strong>2000-03</strong> c. 43,000–96,000 Deep Web sites existed (Bergman 2001).</p>
<p><strong>2000-07-26 </strong><a href="http://www.brightplanet.com/news/prs/deep-web-500-times-larger.html" target="_blank">BrightPlanet</a> released a study documenting the Deep Web (a massive storehouse of databases and information that was invisible to search engines in 2000) claiming that the Deep Web was 500 times larger than the indexed Web accessible by most search engines. BrightPlanet researchers also released their direct-query search technology called LexiBot™  which automatically identifies, retrieves, qualifies, and classifies content from Deep Web sites. They listed c. 20,000 Deep Web searchable sites. Direct-query search technology that can access searchable databases unlike most search engines, implies that the Invisible Web is not really Invisible just harder to reach.  					<a href="http://www.brightplanet.com/news/prs/deep-web-500-times-larger.html" target="_blank">BrightPlanet Unveils the &#8216;Deep&#8217; Web:  500 Times Larger than the Existing Web</a>.</p>
<p><strong>2001</strong>BrightPlanet</p>
<blockquote><p>&#8220;quantified the size and relevancy of the deep Web in a study based on data collected between March 13 and 30, 2000. Our key findings include: Public information on the deep Web is currently 400 to 550 times larger than the commonly defined World Wide Web; The deep Web contains 7,500 terabytes of information compared to nineteen terabytes of information in the surface Web; The deep Web contains nearly 550 billion individual documents compared to the one billion of the surface Web; More than 200,000 deep Web sites presently exist; Sixty of the largest deep-Web sites collectively contain about 750 terabytes of information — sufficient by themselves to exceed the size of the surface Web forty times; On average, deep Web sites receive fifty per cent greater monthly traffic than surface sites and are more highly linked to than surface sites; however, the typical (median) deep Web site is not well known to the Internet-searching public; The deep Web is the largest growing category of new information on the Internet; Deep Web sites tend to be narrower, with deeper content, than conventional surface sites; Total quality content of the deep Web is 1,000 to 2,000 times greater than that of the surface Web; Deep Web content is highly relevant to every information need, market, and domain; More than half of the deep Web content resides in topic-specific databases; A full ninety-five per cent of the deep Web is publicly accessible information — not subject to fees or subscriptions (<a href="http://quod.lib.umich.edu/cgi/t/text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0007.104" target="_blank">Bergman 2001</a>).&#8221;</p></blockquote>
<p><strong>2001</strong> <a href="http://www.alltheweb.com/" target="_blank">AlltheWeb</a>, public search engine was launched. (AlltheWeb is now owned by Yahoo.com). It was a redesign of Fast (1999-05 to 2001).  <a href="http://www.fastsearch.com" target="_blank">Fast Search &#38; Transfer</a> is a Microsoft Subsidiary.</p>
<p><strong>2000</strong> Shestakov (2008) cites Bergman (2001) as the source for the claim that the term deep Web was coined in 2000. Bergman distinguished the Surface Web from the Deep Web using the metaphor of Surface and Deep water fishing or trawling. Deep Web is preferred over the term Invisible Web.</p>
<p><strong>2000</strong> UC-Berkeley Biologist Michael Eisen, Nobel Laureate Harold Varmus and Stanford biochemist Patrick Brown helped start the <a href="Public Library of Science" target="_blank">Public Library of Science,</a><a href="http://www.plos.org" target="_blank"> PLoS</a> is a &#8220;nonprofit organization of scientists and physicians committed to making the world&#8217;s scientific and medical literature a freely available public resource&#8221; by encouraging scientists to insist on open-access publishing models rather than being forced to sign over their (often publicly-funded research) to expensive scientific journals. Wright  <a title="The next generation of Web search engines will do more than give you a longer list of search results. They will disrupt the information economy.&#34; Salon." href="http://archive.salon.com/tech/feature/2004/03/09/deep_web/index.html" target="_blank">(2004)</a> cited Eisen, Varmus and Brown as examples of scientists who are making making some areas of the Deep Web more accessible to the public.</p>
<p><strong>2001</strong> Raghavan and Garcia-Molina (2001) &#8220;presented an architectural model for a hidden-Web crawler that used key terms provided by users or collected from the query interfaces to query a Web form and crawl the deep Web resources  <a href="http://nlplab.kaist.ac.kr/~jwchoi/coreonto_data/Computer_networking/Networking_standards/Internet_standards/Internet_protocols/Application_layer_protocols/World_Wide_Web/00454403.txt">(Choi 2008-01-07)</a>.&#8221;</p>
<p><strong>2002-02</strong> StumbleUpon began to use human crawlers or human-based computation techniques to uncover data on the Deep Web.  Human crawlers can find relevant links that algorithmic crawlers miss  <a href="http://nlplab.kaist.ac.kr/~jwchoi/coreonto_data/Computer_networking/Networking_standards/Internet_standards/Internet_protocols/Application_layer_protocols/World_Wide_Web/00454403.txt">(Choi 2008-01-07)</a>.&#8221;</p>
<p><strong>2002-12</strong> There were c. 130,000 Deep Web sites (He, Patel, Mitesh, Zhang and Chang 2007, Shestakov 2008).</p>
<p><strong>2003-06-01 </strong>Dorner and Curtis (<a title="Dorner, Daniel G.; Curtis, Anne Marie. 2003-06-01. &#34;A comparative review of common user interface software products for libraries.&#34; National Library of New Zealand. 67 pp." href="http://www.natlib.govt.nz/downloads/Comparative_review_common_user_interface_software.pdf" target="_blank">2003-06-01</a>) conducted a survey (data collected from 2002-12 through 2003-04) of librarians in New Zealand to compare their common user interface software products supplied by vendors: Endeavour, ExLibris, Follet, Fretwell-Downing, Innovative Interfaces, MuseGlobal, OCLC, SIRSI, WebFeat and VTLS. MuseSearch, ENCompass, MetaLib, Single Search and WebFeat received the highest scores in 2003 (Dorner and Curtis 2003-06-01:2). SingleSearch was noted as having the added cost advantage to librairies since it was open access, open source (Dorner and Curtis 2003-06-01:2).  In 2002-2003 a successful common user interface technology software should support formats and protocols other than Z39.50 such as OpenURL, HTTP, SQL, XML, MARC, CrossRef, DOI, EAD, Dublin Core and Telnet (Dorner and Curtis 2003-06-01:8).</p>
<p><strong>2004-04</strong> There were c. 310,000 Deep Web sites (He, Patel, Mitesh, Zhang and Chang 2007, Shestakov 2008).</p>
<p><strong>2004</strong> Between 2000 and 2004 the Deep Web increased in size by 3-7 times (He, Patel, Mitesh, Zhang and Chang 2007, Shestakov 2008).</p>
<p><strong>2004-03-02</strong> Yahoo announced its Content Acquisition Program users paid for enhanced search coverage by &#8220;unlocking&#8221; the deep Web <a title="The next generation of Web search engines will do more than give you a longer list of search results. They will disrupt the information economy.&#34; Salon." href="http://archive.salon.com/tech/feature/2004/03/09/deep_web/index.html" target="_blank">(Wright 2004).</a></p>
<p><strong>2005</strong> Yahoo released Yahoo! Subscriptions which searched a few of the Deep Web&#8217;s subscription-only web sites.</p>
<p><strong>2005</strong> Ntoulas et al. (2005) &#8220;created a hidden-Web crawler that automatically generated meaningful queries to issue against search forms. Their crawler generated promising results, but the problem is far from being solved. Since a large amount of useful data and information resides in the deep Web, search engines have begun exploring alternative methods to crawl the deep Web  <a href="http://nlplab.kaist.ac.kr/~jwchoi/coreonto_data/Computer_networking/Networking_standards/Internet_standards/Internet_protocols/Application_layer_protocols/World_Wide_Web/00454403.txt">(Choi 2008-01-07)</a>.&#8221;</p>
<p>The search engine <a href="http://www.pipl.com" target="_blank">Pipl </a>crawlers can identify, interact and retrieve some information from the deep Web.</p>
<p>Deep Web &#8220;search engines like CloserLookSearch and Northern Light Group&#124;Northern Light create specialty engines by topic to search the deep Web. Because these engines are narrow in their data focus, they are built to access specified deep Web content by topic. These engines can search dynamic or password protected databases that are otherwise closed to search engines    <a href="http://nlplab.kaist.ac.kr/~jwchoi/coreonto_data/Computer_networking/Networking_standards/Internet_standards/Internet_protocols/Application_layer_protocols/World_Wide_Web/00454403.txt">(Choi 2008-01-07)</a>.&#8221;</p>
<p>Google’s &#8220;Sitemap and mod oai are mechanisms that allow search engines and other interested parties to discover deep Web resources on particular Web servers. Both mechanisms allow Web servers to advertise the URLs that are accessible on them, thereby allowing automatic discovery of resources that are not directly linked to the surface Web   <a href="http://nlplab.kaist.ac.kr/~jwchoi/coreonto_data/Computer_networking/Networking_standards/Internet_standards/Internet_protocols/Application_layer_protocols/World_Wide_Web/00454403.txt">(Choi 2008-01-07)</a>.&#8221;</p>
<p><strong>2007-06</strong> WorldWideScience was created to provide access to the Deep Web. When it began it linked to 12 databases from 10 countries. It is a &#8220;science portal developed and maintained by the <a href="http://www.osti.gov/">Office of Scientific and Technical Information (OSTI)</a>, an element of the <a href="http://www.science.doe.gov/">Office of Science</a> within the <a href="http://www.doe.gov/">U.S. Department of Energy</a>. The <a href="http://worldwidescience.org/alliance.html">WorldWideScience Alliance</a>, a partnership consisting of participating member countries provides the governance structure for the WorldWideScience.org portal (<a href="http://www.readwriteweb.com/archives/worldwidescience_like_google_for_deep_web_science_stuff.php" target="_blank">RWW</a>).&#8221;</p>
<p><strong>2007-07-27</strong> &#8220;Indiana University faculty member Javed Mustafa appeared on National Public Radio&#8217;s <a href="http://www.sciencefriday.com/pages/2007/Jul/hour2_072707.html" target="_blank">Science Friday</a>, and drawing on information in a published study from University of California, Berkeley entitled &#8221;How much information is there?&#8221;, estimated that the deep web consists of about 91,000 terabytes. By contrast, the surface web, which is easily reached by search engines, is only about 167 terabytes. The Library of Congress contains about 11 terabytes, for comparison. Mustafa noted that these numbers were a bit dated and were just rough estimates <a href="http://nlplab.kaist.ac.kr/~jwchoi/coreonto_data/Computer_networking/Networking_standards/Internet_standards/Internet_protocols/Application_layer_protocols/World_Wide_Web/00454403.txt">(Choi 2008-01-07)</a>.&#8221;</p>
<p><strong>2008-05-14</strong> ReadWriteWeb contributor Sarah Perez listed a number of &#8220;<a href="http://www.readwriteweb.com/archives/digital_image_resources_on_the_deep_web.php" target="_blank">Digital Image Resources on the Deep Web.</a>&#8220;</p>
<div id="submeta"><a href="http://www.readwriteweb.com/archives/digital_image_resources_on_the_deep_web.php#comments"></a></div>
<p><strong>2008-06</strong> WorldWideScience portal to the Deep Web linked to 32 national, scientific databases and portals from 44 different countries. <a href="http://www.readwriteweb.com/archives/worldwidescience_like_google_for_deep_web_science_stuff.php" target="_blank">RWW</a>.</p>
<p><strong>2008</strong> Several &#8220;Deep Web directories are under development such as <a href="http://www.oaister.org/" target="_blank">OAIster</a> by the University of Michigan, <a href="http://infomine.ucr.edu/" target="_blank">INFOMINE</a>] at the University of California at  Riverside and <a href="http://www.freepint.com/gary/direct.htm" target="_blank">DirectSearch</a> by Gary Price to name a few <a href="http://nlplab.kaist.ac.kr/~jwchoi/coreonto_data/Computer_networking/Networking_standards/Internet_standards/Internet_protocols/Application_layer_protocols/World_Wide_Web/00454403.txt">(Choi 2008-01-07)</a>.&#8221;</p>
<p><strong>2008-09-22</strong> Infovell launched its research engine for the Deep Web. &#8220;Available initially on a subscription basis, Infovell gives users access to hard to find, in-depth, expert information spanning Life Sciences, Medicines, Patents, and other reference categories with more to be added over time.&#8221; &#8220;Infovell’s research engine will be available beginning September 22 as a premium service for individual researchers and corporations who are seeking more affordable access to expert information. The Company is offering a risk-free trial through its website www.infovell.com. Later this year, Infovell will be beta-releasing a free version of its research engine on a limited basis for those individuals who want to search the Deep Web but don’t have the need for some of the advanced features available in the premium version.&#8221;</p>
<p><strong>2009-</strong> United States &#8220;Congressional Representative John Conyers (D-MI) re-introduced a bill (HR801) that essentially would negate the  National Institutes of Health (NIH) policy concerning depositing research in Open Access (OA) repositories. The bill goes further than prohibiting open access requirements, however, as the bill also prohibits government agencies from obtaining a license to publicly distribute, perform, or display such work by, for example, placing it on the Internet, and would repeal the longstanding &#8216;federal purpose&#8217; doctrine, under which all federal agencies that fund the creation of a copyrighted work reserve the &#8216;royalty-free, nonexclusive right to reproduce, publish, or otherwise use the work&#8217; for any federal purpose. The National Institutes of Health require NIH-funded research to be published in open-access repositories (<a href="http://www.boingboing.net/2009/02/16/scientific-publisher.html" target="_blank">Doctorwo 2009</a>).&#8221; HR801 would benefit for-profit science publishers and increase challenges for making the Deep Web more accessible. See Doctorwo, Cory. 2009-02-16. &#8220;<a href="http://www.boingboing.net/2009/02/16/scientific-publisher.html" target="_blank">Scientific publishers get a law introduced to end free publication of govt-funded research.&#8221; </a>&#62;&#62; <a href="http://www.boingboing.net/" target="_blank">Boing Boing</a>. </p>
<h3>Notes</h3>
<p><span>&#8220;<strong>Metasearch</strong> technology, also known as <strong>federated search</strong> or <strong>broadcast search</strong>, creates a portal that could allow the library to become the one-stop shop their users and potential users find so attractive </span><a title="Luther, Judy. 2003-10-01. &#34;Trumping Google? Metasearching's Promise.&#34; Library Journal. " href="http://www.libraryjournal.com/article/CA322627.html" target="_blank">(Luther 2003-10-01)</a>.&#8221;</p>
<p>Joo-Won Choi&#8217;s (2008-01) useful categories of Deep Web resources include:</p>
<p><strong>Dynamic content:</strong> &#8220;Dynamic Web page and/or dynamic pages, which are returned in response to a submitted query or accessed only through a form (especially if open-domain input elements e.g. text fields are used; such fields are hard to navigate without domain knowledge). &#8220;<br />
<strong><br />
Unlinked content:</strong> &#8220;pages which are not linked to by other pages, which may prevent Web crawling programs from accessing the content. This content is referred to as pages without backlinks or inlinks. &#8220;<br />
<strong><br />
Private Web</strong>: &#8220;sites that require registration and login (password-protected resources).</p>
<p><strong>Contextual Web:</strong> &#8220;pages with content varying for different access contexts (e.g. ranges of  client IP addresses or previous navigation sequence).</p>
<p><strong>Limited access content</strong>: &#8220;sites that limit access to their pages in a technical way (e.g., using the Robots Exclusion Standard, CAPTCHAs or HTTP headers, prohibiting search engines from browsing them and creating cached copies.&#8221;</p>
<p><strong>Scripted content: </strong>&#8220;pages that are only accessible through links produced by JavaScript as well as content dynamically downloaded from Web servers via Macromedia Flash  or AJAX solutions.&#8221;</p>
<p><strong>Non-HTML/text content</strong>: &#8220;textual content encoded in multimedia (image or video) files or specific file formats not handled by search engines.&#8221; For more see Choi (2008-01).</p>
<h3>Webliography and Bibliography</h3>
<p>Bergman, Michael K. 2001-09-24. &#8220;<a href="http://www.brightplanet.com/pdf/deepwebwhitepaper.pdf" target="_blank">The Deep Web: Surfacing Hidden Value</a>.&#8221; White Paper.</p>
<p>Bergman, Michael. 2001. &#8220;<a href="http://quod.lib.umich.edu/cgi/t/text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0007.104">The Deep Web: Surfacing Hidden Value</a>.&#8221; <em>Journal of Electronic Publishing</em>. 7:1.</p>
<p>Choi, Joo-Won. 2008-01-07 &#8220;<a href="http://nlplab.kaist.ac.kr/~jwchoi/coreonto_data/Computer_networking/Networking_standards/Internet_standards/Internet_protocols/Application_layer_protocols/World_Wide_Web/00454403.txt">Deep Web</a>.&#8221; KAIST</p>
<p>Dorner, Daniel G.; Curtis, Anne Marie. 2003-06-01. &#8220;<a title="Dorner, Daniel G.; Curtis, AnneMarie. 2003-06-01. &#34;A comparative review of common user interface software products for libraries.&#34; National Library of New Zealand. 67 pp." href="http://www.natlib.govt.nz/downloads/Comparative_review_common_user_interface_software.pdf" target="_blank">A comparative review of common user interface software products for libraries</a>.&#8221; National Library of New Zealand. 67 pp.</p>
<p>Ellsworth, Jill H.; Ellsworth, Matthew V. 1994. <em>The Internet Business Book</em>. John Wiley &#38; Sons, Inc.</p>
<p>Ellsworth, Jill H.; Ellsworth, Matthew V. 1997. <a title="Ellsworth, Jill H.; Ellsworth, Matthew V. 1997. The Internet Business Book. John Wiley &#38; Sons, Inc." href="http://web.archive.org/web/19971012054051/www.oak-ridge.com/topnib.html" target="_blank"><em>The Internet Business Book</em></a>. John Wiley &#38; Sons, Inc.</p>
<p>Ellsworth, Jill H.; Ellsworth, Matthew V. 1995. <a title="Multimedia Strategies for the World Wide Web. John Wiley &#38; Sons, Inc." href="http://www.amazon.co.uk/Marketing-Internet-Multimedia-Strategies-World/dp/0471165042" target="_blank"><em>Marketing on the Internet: Multimedia Strategies for the World Wide Web</em></a>. John Wiley &#38; Sons, Inc.</p>
<p>Ellsworth, Jill H.; Ellsworth, Matthew V. 1996. <a title="Multimedia Strategies for the World Wide Web. John Wiley &#38; Sons, Inc." href="http://www.amazon.co.uk/Marketing-Internet-Multimedia-Strategies-World/dp/0471165042oak-ridge.com/topnib.html" target="_blank"><em>Marketing on the Internet: Multimedia Strategies for the World Wide Web</em></a>. 2nd Edition. John Wiley &#38; Sons, Inc.</p>
<p>Ellsworth, Jill H.; Ellsworth, Matthew V. <em>Using CompuServe</em><em></em>. John Wiley &#38; Sons, Inc.</p>
<p>Ellsworth, Jill H. Chapters? <em>The Internet Unleashed.</em></p>
<p>Garcia, Frank. 1996. “<a href="http://web.archive.org/web/19961205083117/http://tcp.ca/Jan96/BusandMark.html" target="_blank">Business and Marketing on the Internet</a>.” <em>Masthead</em>. 9:1. January. <a href="http://web.archive.org/web/19961205083117/http://tcp.ca/Jan96/BusandMark.html" target="_blank">Alternate url @ web.archive.org</a></p>
<p>Guernsey, Lisa. 2001-01-25. &#8220;<a href="http://query.nytimes.com/gst/fullpage.html?res=9404EEDA1F3CF936A15752C0A9679C8B63" target="_blank">Mining the deep web with sharper shovels</a>&#8220;. <em>New York Times</em>, No.25: pp.G1.</p>
<p><span style="font-family:Arial;">Lawrence, Steve: Giles, C. Lee. 1999-07-08.&#8221;Accessibility of Information on the Web.&#8221; <em>Nature. </em>400:6740:107 &#8211; 109. See <a href="http://www.wwwmetrics.com/">http://www.wwwmetrics.com</a>.</span></p>
<p>Luther, Judy. 2003-10-01. &#8220;<a title="Luther, Judy. 2003-10-01. &#34;Trumping Google? Metasearching's Promise.&#34; Library Journal. " href="http://www.libraryjournal.com/article/CA322627.html" target="_blank">Trumping Google? Metasearching&#8217;s Promise</a>.&#8221; <em>Library Journal</em>.</p>
<p>PLS. 1996-12-01. &#8220;<a href="http://web.archive.org/web/19971021232106/www.pls.com/news/pr961212_aol.html" target="_blank">America Online to Place AT1 from PLS in Internet Search Area: <em>New AT1 Service Allows AOL Members to Search &#8220;The Invisible Web&#8221;</em></a>.&#8221; Press Release.</p>
<p>Shestakov, Dennis. 2008-05. <a href="https://oa.doria.fi/bitstream/handle/10024/38506/diss2008shestakov.pdf?sequence=3">deep web</a></p>
<p>Smith, Richard.  2008-10-07. &#8220;<a href="http://www.plos.org/cms/node/409" target="_blank">More evidence on why we need radical reform of science publishing</a>.&#8221; PLoS.</p>
<p>Wright, Alex. 2004-03-09. &#8220;<a title="The next generation of Web search engines will do more than give you a longer list of search results. They will disrupt the information economy.&#34; Salon." href="http://archive.salon.com/tech/feature/2004/03/09/deep_web/index.html" target="_blank">In search of the deep Web: The next generation of Web search engines will do more than give you a longer list of search results. They will disrupt the information economy</a>.&#8221; Salon.</p>
<p>He, Bin; Patel, Mitesh; Zhang, Zhen; Chang, Kevin Chen-Chuan. 2007. &#8220;Accessing the deep Web. <em>Communications</em>. ACM. 50:5:94–101.</p>
<p>See also http://papergirls.wordpress.com/the-ultimate-guide-to-the-invisible-web</p>
<h3>Joo-Won <a href="http://nlplab.kaist.ac.kr/~jwchoi/coreonto_data/Computer_networking/Networking_standards/Internet_standards/Internet_protocols/Application_layer_protocols/World_Wide_Web/00454403.txt">Choi&#8217;s</a> Bibliography:</h3>
<p>Panagiotis Ipeirotis, Luis Gravano, and Mehran Sahami. 2001. &#8220;<a href="http://qprober.cs.columbia.edu/publications/sigmod2001.pdf" target="_blank">Probe, Count, and Classify: Categorizing Hidden-Web Databases</a>.&#8221;Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. pp. 67-78.</p>
<p>Gary Price &#38; Chris Sherman. July 2001. &#8221;The Invisible Web : Uncovering Information Sources Search Engines Can&#8217;t See.&#8221; <em>CyberAge Books</em>, ISBN 0-910965-51-X.</p>
<p>Michael K. Bergman. 2001-08. &#8220;<a href="http://www.press.umich.edu/jep/07-01/bergman.html" target="_blank">The Deep Web: Surfacing Hidden Value.</a>&#8221; <em>The Journal of Electronic Publishing</em>. 7:1.</p>
<p>Sriram Raghavan and Hector Garcia-Molina. 2001. &#8220;<a href="http://www.dia.uniroma3.it/~vldbproc/017_129.pdf" target="_blank">Crawling the Hidden Web</a>.&#8221; In Proceedings of the 27th International Conference on Very Large Data Bases (VLDB). pp. 129-138</p>
<p>Nigel Hamilton (2003). &#8221;<a href="http://turbo10.com/papers/deepnet.pdf" target="_blank">The Mechanics of a Deep Net Metasearch Engine</a>.&#8221; 12th World Wide Web Conference poster.</p>
<p>Bin He and Kevin Chen-Chuan Chang. 2003. &#8220;<a href="http://eagle.cs.uiuc.edu/pubs/2003/unifiedschema-sigmod03-hc-mar03.pdf" target="_blank">Statistical Schema Matching across Web Query Interfaces</a>.&#8221; In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data</p>
<p>Joe Barker (Jan 2004). &#8221;[ <a href="http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html" target="_blank">Invisible Web: What it is, Why it exists, How to find it, and Its inherent ambiguity.</a> UC Berkeley - Teaching Library Internet Workshops.</p>
<p>Alex Wright (Mar 2004). ''<a href="http://archive.salon.com/tech/feature/2004/03/09/deep_web/index_np.html" target="_blank">In Search of the Deep Web</a>'' <a href="http://www.salon.com/tech/feature/2004/03/09/deep_web" target="_blank">Salon.com</a></p>
<p>Alexandros Ntoulas, Petros Zerfos, and Junghoo Cho. 2005. "<a href="http://oak.cs.ucla.edu/~ntoulas/pubs/ntoulas_hidden_web.pdf" target="_blank">Downloading Textual Hidden Web Content Through Keyword Queries</a>."  In Proceedings of the Joint Conference on Digital Libraries (JCDL). pp 100-109.</p>
<p><a href="http://oak.cs.ucla.edu/~cho/papers/ntoulas-hidden.pdf" target="_blank">Extended version</a>]</p>
<p>Frank McCown, Xiaoming Liu, Michael L. Nelson, and Mohammad Zubair. 2006.-03/4. &#8220;<a href="http://library.lanl.gov/cgi-bin/getfile?LA-UR-05-9158.pdf" target="_blank">Search Engine Coverage of the OAI-PMH Corpus</a>.&#8221; IEEE Internet Computing. pp. 66-73. 10:2.</p>
<h3>Lesk&#8217;s bibliography:</h3>
<p>[Bell 1994]. Alan Bell; <em>IBM Academy Digital Library Workshop</em> (Sept 12-13, 1994).</p>
<p>[Census 1995]. United States Census Bureau <em>Statistical Abstract of the United States</em> Government Printing Office  (1995).</p>
<p>[Fargion 1996]. G. S. Fargion, R. Harberts, and J. G. Masek <a href="http://ecsinfo.hitc.com/cdwg/datamining/overview.html" target="_blank">An Emerging Technology Becomes an Opportunity for EOS</a> From the online file.</p>
<p>[Landauer 1986]. T. K. Landauer; &#8220;How much do people remember?  Some estimates of the quantity of learned information in long-term memory,&#8221; <em>Cognitive Science,</em> <strong>10</strong> (4) pp. 477-493 (Oct-Dec 1986).</p>
<p>[Louis 1996 ]. Steve Louis <a href="http://www.isi.edu/~rdv/conferences/goddard96.html" target="_blank"><em>Cooperative High-Performance Storage in the Accelerated Strategic Computing Initiative</em> </a>5th NASA Goddard Conference on Mass Storage Systems and Technologies  (Sept.  17-19, 1996 ).  As reported by Ron Van Meter,  .</p>
<p>[Markoff 1997]. John Markoff; &#8220;When Big Brother is a Librarian,&#8221; <em>The New York Times</em> pp. 3, sec. 4 (March 9, 1997).</p>
<p>[Mauldin 1995]. Matt Mauldin, &#8220;Measuring the Web with Lycos,&#8221; <em>Third International World-Wide Web Conference</em>, April 1995.</p>
<p>[Mills 1996]. Mike Mills; &#8220;Photo Opportunity,&#8221; <em>Washington Post</em> pp. H01 (January 28, 1996).</p>
<p>[Optitek]. The Need for Holographic Storage http://www.optitek.com/hdss_competition.htm.</p>
<p>[Radding 1990]. Alan Radding; &#8220;Putting data in its proper place,&#8221; <em>Computerworld </em> pp. 61 (August 13, 1990).</p>
<p>[Tenopir 1997]. Carol Tenopir, and Jeff Barry; &#8220;The Data Dealers,&#8221; <em>Library Journal</em> pp. 28-36 (May 15, 1997).</p>
<p>[UNESCO 1995].  <em>UNESCO Statistical Yearbook</em> Bernan Press  (1995).</p>
<p>[Wells 1938]. H. G. Wells <em>World Brain</em> Methuen  (1938).</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Going beyond Google]]></title>
<link>http://lgaqlibrary.wordpress.com/2008/08/27/going-beyond-google/</link>
<pubDate>Wed, 27 Aug 2008 05:48:07 +0000</pubDate>
<dc:creator>lgaqlibrary</dc:creator>
<guid>http://lgaqlibrary.wordpress.com/2008/08/27/going-beyond-google/</guid>
<description><![CDATA[Like most people, my search engine of choice is Google. It&#8217;s the world&#8217;s largest search ]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>Like most people, my search engine of choice is Google. It&#8217;s the world&#8217;s largest search engine, accessing over 8 million web pages. If Google can&#8217;t find what I&#8217;m looking for, it doesn&#8217;t exist, right?</p>
<p>Wrong. There&#8217;s a whole other internet out there that Google just can&#8217;t reach. It&#8217;s called the deep web, invisible web or hidden web. Invisible because most search engines can&#8217;t see it. And it&#8217;s about 500 times bigger than the internet we all know and love.</p>
<p>So, what&#8217;s in the invisible web? Contents of searchable databases such as library catalogues and article databases. There&#8217;s a wealth of information held in these databases that you&#8217;re missing out on by only using Google or other similar search engines.</p>
<p>The good news is that accessing the invisible web is easy. You can &#8216;google&#8217; for databases on a particular subject by typing, for example, &#8216;plane crash database&#8217; into the search bar. Once you&#8217;ve accessed the database itself, you can then interrogate it for the information you&#8217;re looking for.</p>
<p>There are also some dedicated tools out there that specialise in searching the invisible web. Try some of the following next time you come up emptyhanded with Google.</p>
<ul>
<li><a href="http://www.clusty.com" target="_blank">Clusty</a>: a metasearch engine ie it searches several search engines at once</li>
<li><a href="http://www.lii.org" target="_blank">Librarians Internet Index</a>: sites compiled by real librarians, not a computer</li>
<li><a href="http://infomine.ucr.edu/" target="_blank">Infomine</a>: excellent for academic research</li>
<li><a title="Directory of Open Access Journals" href="http://www.doaj.org" target="_blank">Directory of Journal Articles</a>: free access to full text articles from over 1,000 journals in the humanities/social sciences and science fields</li>
</ul>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Search Engines, Your Web Page And The Invisible Web]]></title>
<link>http://bostonwebsitedesigncompany.wordpress.com/2008/08/18/search-engines-your-web-page-and-the-invisible-web/</link>
<pubDate>Mon, 18 Aug 2008 09:28:13 +0000</pubDate>
<dc:creator>Cosmos Creatives  Boston</dc:creator>
<guid>http://bostonwebsitedesigncompany.wordpress.com/2008/08/18/search-engines-your-web-page-and-the-invisible-web/</guid>
<description><![CDATA[They are important ‘creatures’ in the virtual world, the web spiders; more so in the search engine o]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p class="MsoNormal">They are important ‘creatures’ in the virtual world, the <a href="http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/Glossary.html#Spider">web spiders</a>; more so in the <a href="http://www.bostonwebsitedesigncompany.com/seo.html">search engine optimization</a> and <a href="http://www.bostonwebsitedesigncompany.com/seo.html">search engine marketing</a> industry. What these crawlers do is recognize web pages existing in their database using a particular system, and include them in a web search.</p>
<p class="MsoNormal"><a href="http://bostonwebsitedesigncompany.files.wordpress.com/2008/08/697304_blog.jpg"><img class="alignright size-medium wp-image-55" style="border:2px solid black;margin:3px;" src="http://bostonwebsitedesigncompany.wordpress.com/files/2008/08/697304_blog.jpg?w=300" alt="" width="215" height="215" /></a></p>
<p class="MsoNormal">Obtaining new pages, updating existing ones, and deleting obsolete pages are some of the functions they perform. If there has been no linking to the page, then another way to bring a web page to the notice of the crawlers is by sending the site URL to search engine companies and requesting them to include the new page into their search.</p>
<p class="MsoNormal">The next step is indexing, where the pages are sent to another computer program, where links, keywords, etc. play a major role in identifying the relevancy and the relation of a web page to the search. These are then featured in SERPs.</p>
<p class="MsoNormal">Some pages, however, do not come up in search engine result pages. They form that part of the web known as the ‘Invisible Web’ or ‘Deep Web’. In a study, the University of California, Berkeley, estimated that the ‘Deep Web’ contained approximately 91,000 terabytes of data and 550 billion individual documents.</p>
<p class="MsoNormal">One reason this happens is due to pages having random content or it could be irrelevant and badly conceptualized. Another reason is the presence of technical barriers where spiders cannot perform by themselves. For instance, pages where access is granted only by typing manually, member’s only sites, etc.</p>
<p class="MsoNormal">At <a href="http://www.bostonwebsitedesigncompany.com/">Cosmos Creative Boston</a>, our <a href="http://www.bostonwebsitedesigncompany.com/seo.html">SEO</a> team submits your website using accurate techniques that make it easy for algorithmic crawlers [spiders] to notice and provide links to and ultimately result in being picked up by a search engine quickly.</p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[What Search Engines Miss]]></title>
<link>http://snokwami.wordpress.com/2008/07/26/what-search-engines-miss/</link>
<pubDate>Sat, 26 Jul 2008 18:40:59 +0000</pubDate>
<dc:creator>snokwami</dc:creator>
<guid>http://snokwami.wordpress.com/2008/07/26/what-search-engines-miss/</guid>
<description><![CDATA[This U C Berkeley Website offers information about differences between the three main search engines]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>This <a class="wp-caption-dd" title="Differences between Google, Yahoo and Ask!" href="http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/SearchEngines.html" target="_blank">U C Berkeley Website </a>offers information about differences between the three main search engines: Google, Yahoo and Ask.com. But these major search engines generally function in the same way.</p>
<p>There are specialized and unique search engines, for example, <a class="wp-caption" title="Rollyo" href="http://rollyo.com/index.html" target="_blank">RollYo</a> and <a class="wp-caption" title="Mamma" href="http://www.mamma.com/" target="_blank">Mamma</a>that eliminate duplication and make searching easier.  <a class="wp-caption" title="Dogpile" href="http://www.dogpile.com/" target="_blank">Dogpile</a> adds another layer of efficiency as it brings all the search engines together on one convenient page.</p>
<p>Now there is what is called the Deepnet or the Invisible Web which needs attention when we talk about search engines. Search engines are only able to see certain documents on the Internet.  <a class="wp-caption" title="Hidden Web Berkeley" href="http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html" target="_blank">U C Berkeley</a> and <a class="wp-caption" title="Hidden Web Wikipedia" href="http://en.wikipedia.org/wiki/Deep_web" target="_blank">Wikipedia</a> both have articles on the hidden web although a lot of the documents that used to be hidden are becoming more and more visible.</p>
<p>As I was saying, there is so much that other search engines are unable to see or crawl. A new search engine, Cuil pronouced to rhyme with cool, has come up claiming to can &#8220;search 121,617,892,992 web pages&#8221; a lot more websites than Google, the current top search engine. Check out <a class="wp-caption" title="Cool Cuil Search Engine" href="http://www.cuil.com/" target="_blank">this Cuil search engine.</a></p>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[Searching The Surface Web I: Google, Yahoo! And More]]></title>
<link>http://hiddenweb.wordpress.com/2008/06/26/searching-the-surface-web-i-google-yahoo-and-more/</link>
<pubDate>Thu, 26 Jun 2008 18:42:55 +0000</pubDate>
<dc:creator>hiddenweb</dc:creator>
<guid>http://hiddenweb.wordpress.com/2008/06/26/searching-the-surface-web-i-google-yahoo-and-more/</guid>
<description><![CDATA[What the heck is the surface web? It refers to the thin top layer of the Net that the major search e]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>What the heck is the surface web? It refers to the thin top layer of the Net that the major search engines are able to spider.  So, when you&#8217;re working on the surface web, that is, searching only Google, or Yahoo, or MSN, you are only &#8220;scratching the surface&#8221; so to speak.</p>
<p>&#60;A HREF=&#8221;http://searchthehiddenweb.com&#8221; TARGET=&#8221;_blank&#8221;&#62;There&#8217;s a lot more available to you on the hidden web&#60;/A&#62;&#8230; but first, are you sure you&#8217;re getting everything you can out of the surface web?</p>
<p>Here are some basic ways you can better search the Surface Web.  If you&#8217;re doing market research for example,using something like Google TRends, it&#8217;s a good idea to start here anyway, since the vast number of online users are using the &#8220;big three&#8221;.</p>
<p>Once you learn some of these search techniques, you can apply them to the Hidden Web too.  First, let&#8217;s examine some of the major search engines out there right now.</p>
<p>First is of course is Google.  They are growing exponentially, every day indexing more pages, scanning in more documents, and generally sucking up everything they can on the Web. Google&#8217;s even trying to add ways to dig through the &#8220;deep web&#8221; too. (Stay tuned, we&#8217;ll keep you up to date on these new features.)</p>
<p>Next up are Yahoo! and MSN&#8217;s Live Search as the next biggest competitors.  Following those three, there are many others, such as Ask!, Alta Vista, and others, however, Google, Yahoo and MSN Live have acquired and now are the power behind many of the search engines that started out early on the Web.</p>
<p>These three together account for the vast majority of searches conducted on the Web, and webmasters tailor their pages to show up in these results.  But sometimes you will get different results for your search depending on which engine you use, and what search terms you use.</p>
<p>Next post, we&#8217;ll talk about some basic search terms.</p>
<div class="zemanta-pixie" style="margin-top:10px;height:15px;"><a class="zemanta-pixie-a" title="Zemified by Zemanta" href="http://reblog.zemanta.com/zemified/1185d479-097a-4cc4-b739-19c8f61e43c2/"><img class="zemanta-pixie-img" style="border:medium none;float:right;" src="http://img.zemanta.com/reblog_a.png?x-id=1185d479-097a-4cc4-b739-19c8f61e43c2" alt="Zemanta Pixie" /></a></div>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[What is the Hidden Web?]]></title>
<link>http://hiddenweb.wordpress.com/2008/06/26/what-is-the-hidden-web/</link>
<pubDate>Thu, 26 Jun 2008 18:41:20 +0000</pubDate>
<dc:creator>hiddenweb</dc:creator>
<guid>http://hiddenweb.wordpress.com/2008/06/26/what-is-the-hidden-web/</guid>
<description><![CDATA[&lt;A HREF=&#8221;http://searchthehiddenweb.com&#8221; TARGET=&#8221;_self&#8221;&gt;The &#8220;hidd]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>&#60;A HREF=&#8221;http://searchthehiddenweb.com&#8221; TARGET=&#8221;_self&#8221;&#62;The &#8220;hidden web&#8221;  contains the fastest growing storehouse of information on the Internet&#60;/A&#62;. Think you&#8217;re getting everything you need when you search Google? We hear how massive it is but it is only about 1/10th of the full amount of data available, for free, on the Internet.</p>
<p>Google and Yahoo, so-called &#8220;surface web&#8221; engines, are big, yes, but their indexing software (called crawlers, spiders, bots, etc.) is unable to access the vast majority of websites and databases.</p>
<p>Some examples of pages these spiders can&#8217;t reach are: databases, dynamically created pages (using Java or other software), newspaper archives, any site that requires a login, certain file formats that are not index-friendly, and so on.</p>
<p>You may have heard of the &#8220;deep web&#8221; or the &#8220;hidden web&#8221; &#8211; these are the names given to the 99% of information that is available on the Internet but not easily found using the major search engines.  It has by far the greatest amount of best quality information available.</p>
<p>And guess what?  Luckily for the smart researcher, marketer, or product developer, nearly 95% of this content is accessible by the public, for free!</p>
<p>There is an amazing wealth of ideas here, for new hot products, for high-quality market research and analyses by academic experts, massive stores of government information, both national and international, and tools for finding detailed information on competitors – both their businesses and their personal information, scary as that may sound.  And that&#8217;s just the beginning; there are materials available to the public for the taking, in the public domain, which you can use to create new and valuable content for your customers.</p>
<p>If you have an ounce of creativity, just try scanning these sources for one of your market niches, and you will begin to see the possibilities immediately.  It&#8217;s an instant brainstorm!</p>
<div class="zemanta-pixie" style="margin-top:10px;height:15px;"><a class="zemanta-pixie-a" title="Zemified by Zemanta" href="http://reblog.zemanta.com/zemified/aad27681-880f-4357-adfe-2a2915e28575/"><img class="zemanta-pixie-img" style="border:medium none;float:right;" src="http://img.zemanta.com/reblog_a.png?x-id=aad27681-880f-4357-adfe-2a2915e28575" alt="Zemanta Pixie" /></a></div>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[How to mine the invisible web]]></title>
<link>http://devcompage.com/2008/06/16/what-is-the-invisible-web/</link>
<pubDate>Mon, 16 Jun 2008 01:31:57 +0000</pubDate>
<dc:creator>monina escalada</dc:creator>
<guid>http://devcompage.com/2008/06/16/what-is-the-invisible-web/</guid>
<description><![CDATA[With students back for first semester classes in our part of the world, Internet cafes are abuzz wit]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>With students back for first semester classes in our part of the world, Internet cafes are abuzz with students doing their literature search for assignments, term papers, or thesis review of literature.  At the moment, the most popular search engine appears to be Google. As of June 14, 2008, the estimated size of <a href="http://www.worldwidewebsize.com">Google&#8217;s index</a> is about 20 billion web pages, making it the largest crawler-based search engine, based on reported numbers.</p>
<p>So you think that with an Internet search engine like Google or <a href="http://scholar.google.com.ph/">Google Scholar</a>, you&#8217;ve done a comprehensive review of all available information, besides those articles which are pay-per-view or for paid subscribers only.  Think again. Studies have shown that the hidden web has as much as 500 billion web pages.</p>
<p>Search engines crawl only a small portion or the shallow part of the web.  &#8220;Invisible web&#8221; or deep web refers to information available on the world wide web but is not accessible to general all-purpose search engines. Some materials <a href="http://en.wikipedia.org/wiki/Deep_web">hidden</a> from the usual search engines include dynamic content, unlinked content, private web, and limited access content.</p>
<p><strong>How to find the invisible web</strong></p>
<p>To search the invisible web, here&#8217;s a list of some notable databases that we should check out (see Robert Lackie&#8217;s &#8220;<a href="http://www.robertlackie.com/invisible/index.html">Those Dark Hiding Places: Invisible Web Revealed</a>,  <a href="http://websearch.about.com/od/invisibleweb/a/invisible_web.htm">Wendy Boswell)</a>:</p>
<ul>
<li><a href="http://lii.org/">Librarians&#8217; Internet Index</a> &#8211; websites you can trust</li>
<li><a href="http://www.findlaw.com/">FindLaw </a> &#8211; &#8220;The highest-trafficked legal Web site&#8221;</li>
<li><a href="http://www.about.com/">About.com</a></li>
<li><a href="http://www.freepint.com/gary/direct.htm">Direct Search</a> site put together by Gary Price</li>
<li><a href="http://www.invisible-web.net/">Invisible Web Directory</a> -put together by Gary Price and search guru Chris Sherman. This site is a directory of searchable databases, organized by subject</li>
<li><a href="http://www.rdn.ac.uk/">Resource Discovery Network</a> &#8211; has resources mostly from the United Kingdom, and is extremely well-organized and very searchable</li>
<li><a href="http://infomine.ucr.edu/">InfoMine</a> &#8211; an incredible resource that at last count included over 100,000 links and access to hundreds, if not thousands, of databases</li>
<li><a href="http://vlib.org/">Virtual Library</a></li>
<li><a href="http://www.intute.ac.uk/">Intute</a> &#8211; a free online service providing access to the very best Web resources for education and research.</li>
<li><a href="http://www.archive.org/index.php">Internet archive</a> &#8211;  a digital library of internet sites and other cultural artifacts in digital form.</li>
<li><a href="http://www.beaucoup.com/">Beaucoup!</a> &#8211; a search spot to help search the invisible web.</li>
<li><a href="http://www.digital-librarian.com/">Digital Librarian</a> &#8211; a librarian&#8217;s choice of the best of the web.</li>
<li><a href="http://www.scienceresearch.com/search/">ScienceResearch.com</a> &#8211; A portal allowing searchable access to numerous scientific journals and databases.</li>
<li><a href="http://agricola.nal.usda.gov/">Agricola Database</a> &#8211; provides citations to agriculture literature.</li>
<li><a href="http://www.osti.gov/energycitations/">Energy Citations Database</a> &#8211; provides free access to science research to over 2.3 million science research citations.</li>
<li><a href="http://www.epa.gov/enviro/index_java.html">Envirofacts</a> &#8211; EPA&#8217;s one-stop source for environmental information.</li>
<li><a href="http://plants.usda.gov/">Plants Database</a> provides standardized information about the vascular plants, mosses, liverworts, hornworts, and lichens of the U.S. and its territories.</li>
<li><a href="http://plantfacts.osu.edu/">PlantFacts</a> &#8211; an international knowledge bank and multimedia learning center on plants.</li>
<li><a href="http://www.epa.gov/enviro/wme/">Window to My Environment database</a> &#8211; provides a wide range of federal, state, and local information about environmental conditions and features in an area of your choice.</li>
</ul>
</div>]]></content:encoded>
</item>
<item>
<title><![CDATA[AADLIS meeting and seminar on "Deep Web"]]></title>
<link>http://musingsunclassified.wordpress.com/2008/06/01/aadlis-meeting-and-seminar-on-deep-web/</link>
<pubDate>Sun, 01 Jun 2008 11:31:16 +0000</pubDate>
<dc:creator>slfaizal</dc:creator>
<guid>http://musingsunclassified.wordpress.com/2008/06/01/aadlis-meeting-and-seminar-on-deep-web/</guid>
<description><![CDATA[This years AADLIS (Alumni Association of Dept. of Library and Information Science, Univ. of Kerala) ]]></description>
<content:encoded><![CDATA[<div class='snap_preview'><p>This years <a href="http://www.aadlis.org/index.htm">AADLIS (Alumni Association of Dept. of Library and Information Science, Univ. of Kerala)</a> was on 24th of May, 2008.</p>
<p><span style="color:#800080;">Although the annual meeting is the only activity of the Association fo the whole year, they have been doing a good work by honouring the old BLISc batch students and PhD and MLISc rank holders. Some endovments are also instituted such as, one by Smt. K.K.Lalitha Bai.</span></p>
<p><span style="color:#800080;">The concept of Deep web (more suitable to be known as &#8220;invisible web&#8221;) is not clear to most of the professionals working in the field. More classes should be there.</span></p>
<p><span style="color:#800080;">The possibilities of harnessing the information or knowledge hidden in the  deep areas of the surface web (kown or currently searchable web) is enormous. Academic invisible web is a separate area of study.</span></p>
<p><span style="color:#800080;">It is true that, the lion&#8217;s share of authentic information such as in databases, catalogues, digital libraries , etc are only available in the deep web which are not indexed and hence not available to all (search engines).</span></p>
<p><span style="color:#800080;">More explorations must be there in the field.</span></p>
<p> </p>
</div>]]></content:encoded>
</item>

</channel>
</rss>
