Tags » Celerycrawler

Beating Google With CouchDB, Celery and Whoosh (Part 8)

In the previous seven posts I’ve gone through all the stages in building a search engine. If you want to try and run it for yourself and tweak it to make it even better then you can. 440 more words

Django

Beating Google With CouchDB, Celery and Whoosh (Part 7)

The key ingredients of our search engine are now in place, but we face a problem. We can download webpages and store them in CouchDB… 448 more words

Django

Beating Google With CouchDB, Celery and Whoosh (Part 6)

We’re nearing the end of our plot to create a Google-beating search engine (in my dreams at least) and in this post we’ll build the interface to query the index we’ve built up. 722 more words

Django

Beating Google With CouchDB, Celery and Whoosh (Part 5)

In this post we’ll continue building the backend for our search engine by implementing the algorithm we designed in the last post for ranking pages. We’ll also build a index of our pages with… 1,089 more words

Django

Beating Google With CouchDB, Celery and Whoosh (Part 4)

In this series I’m showing you how to build a webcrawler and search engine using standard Python based tools like Django, Celery and Whoosh with a CouchDB backend. 849 more words

Django

Beating Google With CouchDB, Celery and Whoosh (Part 3)

In this series I’ll show you how to build a search engine using standard Python tools like Django, Whoosh and CouchDB. In this post we’ll start crawling the web and filling our database with the contents of pages. 991 more words

Django

Beating Google With CouchDB, Celery and Whoosh (Part 2)

In this series I’ll show you how to build a search engine using standard Python tools like Django, Whoosh and CouchDB. In this post we’ll begin by creating the data structure for storing the pages in the database, and write the first parts of the webcrawler. 1,022 more words

Django