E-Cloud: Resource and Energy Efficient Cloud
Computing Infrastructure
Background
Cloud computing allows users to access computing
resources and services from the Internet without worrying
about the complex infrastructure that supports them. To reduce
resource and energy cost, application consolidation must be
performed to use the fewest physical hosts to hold all running
applications. However, manual system monitoring and tuning are
infeasible given the scale and diversity of cloud systems and
application workloads. Therefore, one fundamental problem here
is to automatically find a good match between system resources
and application tasks on-the-fly. The key challenges include
handling the variability and heterogeneity of both application
requirements and system resources. Previous approaches often
use coarse-grained information (e.g., mean, max, min) to
perform resource allocation for dynamic applications, which
forces the system to either over-provisioning or
under-provisioning resources.
The objective of this project
is to develop a new predictive elastic load management system
for reducing the resource and energy cost of cloud systems.
Our approach dynamically captures precise patterns, called
signatures, of both application tasks and system resources
using fine-grained time series of multi-dimensional metrics
(e.g. CPU, memory, disk). The system then performs similarity
matching between the signatures of available resources on
physical hosts and those of running application tasks. By
similarity matching, we mean that the time series of the
available resources a particular system node have similar
shape as that of a job/task (i.e., peak and bottom at the same
time). Thus, system can always use just enough resources for
running application jobs without the over-provisioning or
under-provisioning problems. However, in dynamic environments,
the signature patterns of both application tasks and system
resources will vary over time. Thus, we need to periodically
perform the similarity matching to find good matching between
hosts and jobs/tasks. In contrast to instant value based
dynamic resource allocation scheme, our approach strives to
look ahead over an extended period of time so that we can
reduce unnecessary job migration or resource scaling.
People
Faculty
Students
- Hiep Nguyen (PhD student)
- Zhiming Shen (PhD student)
- Sethuraman Subbiah (MS student, graduated, first
employment: NetApp ATG)
Collaborators
Publications
- Hiep Nguyen, Zhiming Shen, Xiaohui Gu, Sethuraman Subbiah, John Wilkes, "AGILE: elastic distributed resource scaling for Infrastructure-as-a-Service", USENIX International Conference on Autonomic Computing (ICAC), San Jose, CA, June, 2013 (acceptance rate: 16/73 = 21%).
- Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes, "CloudScale:
Elastic Resource Scaling for Multi-Tenant Cloud Systems",
ACM Symposium on Cloud Computing (SOCC) in conjunction with
SOSP, Cascais, Portugal, October, 2011.
- Zhenhuan Gong, Xiaohui Gu, "Predictive
Elastic
Load Management for Cloud Computing Infrastructures",
ACM Symposium on Operating Systems Principles (SOSP) poster
session, Big Sky, MT, October, 2009.
- Zhenhuan Gong, Prakash Ramaswamy, Xiaohui Gu, Xiaosong
Ma,"SigLM:
Signature-Driven
Load Management for Cloud Computing Infrastructures",
Proc. of IEEE International Conference on Quality of Service
(IWQoS), Charleston, South Carolina, July, 2009.
Related Projects
Sponsors
- This research is surpported by NSF and two Google Research
Awards.