ACM – Testing a Distributed System

I want to sing the praises of one of our lead engineers, Phil Maddox, for authoring a very interesting paper, Testing a Distributed System, which was published in Communications of the ACM, Vol. 58 No. 9.

A brief excerpt follows:

“Distributed systems can be especially difficult to program for a variety of reasons. They can be difficult to design, difficult to manage, and, above all, difficult to test. Testing a normal system can be trying even under the best of circumstances, and no matter how diligent the tester is, bugs can still get through. Now take all of the standard issues and multiply them by multiple processes written in multiple languages running on multiple boxes that could potentially all be on different operating systems, and there is potential for a real disaster.

Individual component testing, usually done via automated test suites, certainly helps by verifying that each component is working correctly. Component testing, however, usually does not fully test all of the bits of a distributed system. Testers need to be able to verify that data at one end of a distributed system makes its way to all of the other parts of the system and, perhaps more importantly, is visible to the various components of the distributed system in a manner that meets the consistency requirements of the system as a whole.”

Read the entire paper here: Testing a Distributed System

RELATED POSTS

On Job Scheduling

Read More

Tuning AWS EC2 instances with CloudWatch metric analysis

Read More

Effective Management of High Volume Numeric Data with Histograms

Read More

Subscribe to the Standard Metric

Get new blog posts delivered to your inbox every month