DScope: Detecting Real-World Data Corruption Hang Bugs in Cloud Server Systems


Background

Cloud server systems such as Hadoop and Cassandra have enabled many real-world data-intensive applications ranging from security attack detection to business intelligence. However, due to their inherent complexity, those cloud server systems present many performance challenges. Particularly, previous studies have shown that many tricky performance bugs in cloud server systems are caused by unexpected data corruptions which are more likely to be overlooked by the developer. For example, in May 2017, a data corruption bug triggered in a data center failover operation brought down the British Airway service for hours.

Performance bugs are notoriously difficult to debug because they typically produce little useful debugging information. The problem exacerbates in cloud server systems since the developer typically does not have the access to the original input data that triggered the performance bug or the large scale infrastructure to replay the failed production run. Although previous work has extensively studied data corruptions and performance bugs, little research has been done to study the intersection between the two, that is, the performance problems caused by data corruptions. Particularly, our work focuses on detecting software hang bugs that are triggered by data corruptions in cloud server systems. Software hang bugs make the system become unavailable to either part of or all of the users, which is one of the most severe performance problems production systems try to avoid.

Publications


Benchmark

  #            Bug name           System version       Bug type       Known or new    Detected
  DScope    Findbugs    Infer 
  1   Cassandra-7330  v2.0.8 #1 known
  2   Cassandra-9881  v2.0.8 #3 known
  3   Compress-87  v1.0 #1 known
  4   Compress-451  v1.0 #2 new
  5   Hadoop-8614  v0.23.0 #1 known
  6   Hadoop-15088  v2.5.0 #1 new
  7   Hadoop-15415  v0.23.0 #2 new
  8   Hadoop-15415  v2.5.0 #2 new
  9   Hadoop-15417  v0.23.0 #2 new
  10   Hadoop-15417  v2.5.0 #2 new
  11   Hadoop-15424  v2.5.0 #2* new
  12   Hadoop-15425  v2.5.0 #2* new
  13   Hadoop-15429  v0.23.0 #2 new
  14   Hadoop-15429  v2.5.0 #2 new
  15   HDFS-4882  v0.23.0 #3 known
  16   HDFS-5892  v2.5.0 #2 known
  17   HDFS-13513  v2.5.0 #2 new
  18   HDFS-13514  v2.5.0 #2 new
  19   Mapreduce-2185  v0.23.0 #2 known
  20   Mapreduce-2862  v0.23.0 #2 known
  21   Mapreduce-6990  v0.23.0 #1 new
  22   Mapreduce-7088  v2.5.0 #2* new
  23   Mapreduce-7089  v2.5.0 #2* new
  24   Yarn-163  v0.23.0 #1 known
  25   Yarn-2905  v2.5.0 #1 known
  26   Yarn-6991  v0.23.0 #4 new
  27   Yarn-6991  v2.5.0 #4 new
  28   Hive-5235  v1.0.0 #1* known
  29   Hive-13397  v1.0.0 #2 known
  30   Hive-18142  v1.0.0 #2 new
  31   Hive-18216  v2.3.2 #1* new
  32   Hive-18217  v2.3.2 #1* new
  33   Hive-18219  v1.0.0 #2 new
  34   Hive-18219  v2.3.2 #2 new
  35   Hive-19391  v1.0.0 #2 new
  36   Hive-19392  v1.0.0 #2 new
  37   Hive-19392  v2.3.2 #2 new
  38   Hive-19395  v1.0.0 #1* new
  39   Hive-19406  v2.3.2 #2 new
  40   Kafka-6271  v0.10.0 #1 new
  41   Lucene-772  v2.1.0 #2* known
  42   Lucene-8294  v2.1.0 #2 new


Sponsors