E-Cloud: Resource and Energy Efficient Cloud Computing Infrastructure

Background

Cloud computing allows users to access computing resources and services from the Internet without worrying about the complex infrastructure that supports them. To reduce resource and energy cost, application consolidation must be performed to use the fewest physical hosts to hold all running applications. However, manual system monitoring and tuning are infeasible given the scale and diversity of cloud systems and application workloads. Therefore, one fundamental problem here is to automatically find a good match between system resources and application tasks on-the-fly. The key challenges include handling the variability and heterogeneity of both application requirements and system resources. Previous approaches often use coarse-grained information (e.g., mean, max, min) to perform resource allocation for dynamic applications, which forces the system to either over-provisioning or under-provisioning resources.

The objective of this project is to develop a new predictive elastic load management system for reducing the resource and energy cost of cloud systems. Our approach dynamically captures precise patterns, called signatures, of both application tasks and system resources using fine-grained time series of multi-dimensional metrics (e.g. CPU, memory, disk). The system then performs similarity matching between the signatures of available resources on physical hosts and those of running application tasks. By similarity matching, we mean that the time series of the available resources a particular system node have similar shape as that of a job/task (i.e., peak and bottom at the same time). Thus, system can always use just enough resources for running application jobs without the over-provisioning or under-provisioning problems. However, in dynamic environments, the signature patterns of both application tasks and system resources will vary over time. Thus, we need to periodically perform the similarity matching to find good matching between hosts and jobs/tasks. In contrast to instant value based dynamic resource allocation scheme, our approach strives to look ahead over an extended period of time so that we can reduce unnecessary job migration or resource scaling.

People

Faculty

Xiaohui (Helen) Gu

Students

Hiep Nguyen (PhD student)
Zhiming Shen (PhD student)
Sethuraman Subbiah (MS student, graduated, first employment: NetApp ATG)

Collaborators

John Wilkes (Google)

Publications

Hiep Nguyen, Zhiming Shen, Xiaohui Gu, Sethuraman Subbiah, John Wilkes, "AGILE: elastic distributed resource scaling for Infrastructure-as-a-Service", USENIX International Conference on Autonomic Computing (ICAC), San Jose, CA, June, 2013 (acceptance rate: 16/73 = 21%).
Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes, "CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems", ACM Symposium on Cloud Computing (SOCC) in conjunction with SOSP, Cascais, Portugal, October, 2011.
Zhenhuan Gong, Xiaohui Gu, "Predictive Elastic Load Management for Cloud Computing Infrastructures", ACM Symposium on Operating Systems Principles (SOSP) poster session, Big Sky, MT, October, 2009.
Zhenhuan Gong, Prakash Ramaswamy, Xiaohui Gu, Xiaosong Ma,"SigLM: Signature-Driven Load Management for Cloud Computing Infrastructures", Proc. of IEEE International Conference on Quality of Service (IWQoS), Charleston, South Carolina, July, 2009.

Related Projects

Virtual Computing Lab (VCL)
Amazon EC2

E-Cloud: Resource and Energy Efficient Cloud Computing Infrastructure

Background

People

Related Projects

Sponsors