Web Crawling Project

Hamiltonian Constulting develops Custom Web Crawling Application

Web Crawling

For the CityDirect network

Task: A system was developed for web data mining. Users were able to input specialized URLs to guide the crawler, and manually rank data sources for their reliability.

Achievements

  • Gathered large amounts of data (100s of GB) in a relational database (PostgreSQL).
  • Multi-process and multi-threaded application written in Python.
  • Due to code reuse and good practices, delivered system in half the time others had quoted.

Leave a Reply