Archive for the ‘Projects’ Category

Real Time Data Reporting Dashboard

Tuesday, October 26th, 2010

Developing reporting applications using HTML 5 web standards helps users to have access to data anywhere at any time using a variety of devices. Perfect for managers and CRM type systems, these tools can help make decisions based on real time up to date data.

An example of some HTML 5 reports using the canvas tag developed for our clients:

Example of a report generated with HTML 5 canvas

Example of a report generated with HTML 5 canvas

More information on how to use the canvas tag can be found here:

http://programminglinuxgames.blogspot.com/2010/07/using-canvas-in-firefox-and-safari-for.html

Linux HA – Load Balancing and High Availability

Tuesday, September 21st, 2010

Linux HA – Heartbeat, ldirectord, apache, mysql, debian

Today, the internet is booming. Millions of people are viewing billions of pages. Some of the larger sites require robust hardware setups if the content they are serving is remotely complex, involving things like relational databases and dynamic content. We are going to look at a practical solution that is both cost effective and scalable to handle varying loads. High availability and load balancing is a technique that we can implement with apache and mysql using Heartbeat, a small, open source software package.

There are many ways this package can be set up. What we require is a number of physical servers, each behind a router. Our goals here are to have Fault Tolerance, as well as performance scalability.

Simple Linux HA – Scalable Performance

Our first configuration lacks some high availability features. It has a single point of failure, the load balancing machine Linux HA-1. Other possible points of failures are the router, and the network file storage. These two problems can be solved by looking elsewhere and are not necessarily in the scope of this article.

This is our original layout and has served quite well in many cases. It is simpler to set up and scales well as we add more machines.

The MySQL servers are set up in master-master replication mode. So a write on either of the machines will replicate to the other and vice-versa.

Diagram of Linux HA simple layout

The performance here is achieved by using the ldirectord daemon running on Linux HA-1. It acts as both the apache server and the mysql server. It keeps track of open connections on each machine, and routes the next new connection to the machine (apache or mysql) to the least loaded host (an algorithm chosen in the ldirectord configuration).

An example on how to set this up in a how-to fashion can be found at our blog http://programminglinuxgames.blogspot.com/2010/09/load-balancing-web-servers-using.html

Note this example does not use heartbeat. Heartbeat is a useful tool that will give you the fault tolerance in this set up. Below we will discuss more

Linux HA – Fault Tolerance and Scalable Performance

The next logical step in this is to remove the single point of failure, the Linux HA-1 machine. What we can do is set up heartbeat on 2 machines, and allow one to take over if the other fails at any time.

In order for one to take over for another, we will have both of these machines using a “virtual” network interface. This is just a 2nd IP address, using the virtual interface supported in linux. The circle surrounding Linux HA-1 and Linux HA-2 represents this network interface. You must assign it a separate, new IP address, and have your router route all traffic to that virtual IP address.

Highly Available and Performance Scalable set-up

More Information

http://www.linux-ha.org/wiki/Main_Page – Linux HA
http://www.ultramonkey.org/3/linux-ha.html – Ultramonkey (Heartbeat and Ldirectord)
http://en.wikipedia.org/wiki/Linux-HA – Wikipedia

Web Crawling Project

Tuesday, April 13th, 2010

Hamiltonian Consulting develops Custom Web Crawling Application

Web Crawling

For the CityDirect network

Task: A system was developed for web data mining. Users were able to input specialized URLs to guide the crawler, and manually rank data sources for their reliability.

Achievements

  • Gathered large amounts of data (100s of GB) in a relational database (PostgreSQL).
  • Multi-process and multi-threaded application written in Python.
  • Due to code reuse and good practices, delivered system in half the time others had quoted.