This is a limited proof of concept to search for research data, not a production system.

Search the MIT Libraries

Title: Resource and Performance Distribution Prediction for Large Scale Analytics Queries

Type Software Alireza Khoshkbarforoushha, Rajiv Ranjan (2016): Resource and Performance Distribution Prediction for Large Scale Analytics Queries. Zenodo. Software. https://zenodo.org/record/44902

Authors: Alireza Khoshkbarforoushha (Australian National University and CSIRO) ; Rajiv Ranjan (Newcastle University, UK) ;

Links

Summary

Efficient resource consumption and performance estimation of data-intensive workloads is central to the design and development of workload management techniques. Recent work has explored the efficacy of using distribution-based estimation of workload performance as opposed to single point prediction for a number of workload management problems such as query scheduling, admission control, and the like. However, the proposed approaches lack an efficient workload performance distribution prediction in that they simply assume that the probability distribution function (pdf) of the target value is already available. This paper aims to address this problem for an inseparable portion of big data analytics workloads, Hive queries. To this end, we combine knowledge of Hive query executions with the novel usage of mixture density networks to predict the whole spectrum of resource and performance as probability density functions. We evaluate our technique using the TPC-H benchmark, showing that it not only produces accurate pdf predictions but outperforms the state of the art single point techniques in half of experiments. 

More information

  • DOI: 10.5281/zenodo.44902

Subjects

  • MDN, HiveQL, Big Data

Dates

  • Publication date: 2016
  • Issued: January 17, 2016

Notes

Other: The MDN code is based on the Netlab toolbox [1] which is designed for the simulation of neural network algorithms and related models, in particular MDN. [1] I. Nabney. NETLAB: algorithms for pattern recognition. Springer Science & Business Media, 2002.

Rights

  • info:eu-repo/semantics/openAccess Open Access

Much of the data past this point we don't have good examples of yet. Please share in #rdi slack if you have good examples for anything that appears below. Thanks!

Format

electronic resource

Relateditems

DescriptionItem typeRelationshipUri
IsPartOfhttps://zenodo.org/communities/spec-rg
IsPartOfhttps://zenodo.org/communities/zenodo