ELT: Efficient Log-based Troubleshooting System for Cloud Computing Infrastructures


Cloud systems have become increasingly popular by obviating the need for users to maintain complex computing infrastructures themselves. Users can dynamically reserve resources in a pay-as-you-go fashion. Many cloud systems such as Amazon Elastic Computing Cloud (EC2) and NCSU Virtual Computing Lab (VCL) provision resources in a form of virtual machines (VMs) that are installed with desired application software and operating systems. Cloud systems can encounter different runtime problems such as hardware failures, software misconfigurations and corrupted VM images. Cloud system management nodes often continuously produce console logs to record important runtime operations and their status. We address the challenge of troubleshooting cloud systems using the management console logs. Our research aims to devise automated techniques and tools that can perform system troubleshooting using the management console logs.

Figure: Cloud system operation