Hamiltonian Constulting develops Custom Web Crawling Application
Web Crawling
For the CityDirect network
Task: A system was developed for web data mining. Users were able to input specialized URLs to guide the crawler, and manually rank data sources for their reliability.
Achievements
- Gathered large amounts of data (100s of GB) in a relational database (PostgreSQL).
- Multi-process and multi-threaded application written in Python.
- Due to code reuse and good practices, delivered system in half the time others had quoted.
