InfoScope: Continuous Information
Monitoring for Large-Scale Distributed Systems
Background
Large-scale distributed computing
infrastructures have become important
platforms for many real world applications such as corporate data
centers, multi-tier web servers, massive data analysis, and
service-oriented cloud computing. Information management service is one
of the fundamental building blocks of automatic system management,
which collects dynamic system information (e.g., resource
availability, service response time, virtual machine (VM) states,
application component states) and resolves information queries issued
by system administrators and other system controllers. However, it is a
challenging task to provide scalable and efficient information
management for large-scale distributed systems. Existing system
monitoring solutions lack scalability, adaptability, and query support,
which make them insufficient for managing large-scale computing
infrastructures. The goal of this project is to develop a
scalable information management service that performs adaptive
information collection to resolve various information queries (e.g.,
multi-attribute range queries, aggregation queries, top-k queries) with
minimum monitoring overhead. InfoScope aims at achieving the following
goals:
- Self-compressing information tracking that explores inherent
redundancy properties in distributed monitoring data streams to reduce
information monitoring overhead;
- Provides continous and instant query support (e.g.,
multi-attribute range queries, aggregation queries);
- Query-aware information monitoring that dynamically configures
the report operations of different monitoring sensors based on query
patterns;
- Comprehensive and extensible monitoring that supports
user-supplied monitoring sensors and collects OS-level, VM-level, and
application-level metrics.
Live Demo
People
Faculty
Students
Collaborators
- Ying Zhao (Tsinghua University)
- Mike Wamboldt (IBM)
- Brent Miller (IBM)
- Jin Liang (Google)
- Klara Nahrstedt (UIUC)
Publications
- Yongmin Tan, Vinay
Venkatesh, Xiaohui Gu, "Resilient
Self-Compressive Monitoring for Large-Scale Hosting Infrastructures,"
IEEE Transaction on Parallel and
Distributed Systems (TPDS),
2012,
preprint. (view supplemental material here)
- Yongmin Tan, Vinay
Venkatesh, Xiaohui Gu, "OLIC:
OnLine Information Compression for Scalable Hosting
Infrastructure Monitoring", ACM/IEEE
International
Workshop
on Quality of Service (IWQoS), San
Jose, CA, June, 2011. (acceptance rate: 23/80)
- Ying Zhao, Yongmin
Tan, Zhenhuan Gong, Xiaohui Gu, Mike Wamboldt, "Self-Correlating
Predictive
Information
Tracking
for Large-Scale Production Systems", IEEE International Conference on
Autonomic Computing and Communications
(ICAC),
Barcelona, Spain, June, 2009. (acceptance rate: 15/96 = 15.6%)
- Jin Liang, Xiaohui Gu, Klara Nahrstedt, " Self-Configuring
Information
Management
for
Large-Scale
Service Overlays", IEEE INFOCOM, Anchorage,
Alaska, May, 2007. (Acceptance
rate: 18%)
Related
Projects:
Data
Release
The following data are downloadable at our project website. They are
collected at different times using our InfoScope software. We have a
PlanetLab node pool of about 400 nodes. We deploy a rdaemon sensor on
each of them and collect up to 66 system-level metrics. The sampling
interval is 10 seconds. All information are sent to a central
management node which is a dedicated server of our research group
through UDP. We would be glad if anyone finds our data helpful to his
or her research work and publications. Please cite the following paper
if you use our data:
Ying Zhao,
Yongmin Tan, Zhenhuan Gong, Xiaohui Gu, Mike Wamboldt, "Self-Correlating Predictive Information Tracking for
Large-Scale Production Systems", IEEE International Conference
on Autonomic Computing and Communications (ICAC), Barcelona, Spain, June,
2009.
The format of each log file is as follows:
Timestamp MetricName1 MetricValue1 .... MetricNameN MetricValueN
PlanetLab1:
PlanetLab
traces
collected
from 01/29/2009 - 02/06/2009
PlanetLab2:
PlanetLab
traces
collected
from 09/20/2008 - 10/02/2008
PlanetLab3:
PlanetLab
traces
collected
from 10/02/2008 - 10/15/2008