X. Xu, W. Dou, X. Zhang, and J. Chen, Enreal: An energyaware resource allocation method for scientific workflow executions in cloud environment, IEEE Transactions on Cloud Computing, vol.4, issue.2, pp.166-179, 2016.

Z. Liu and T. S. Ng, Leaky buffer: A novel abstraction for relieving memory pressure from cluster data processing frameworks, IEEE Transactions on Parallel Distributed System, vol.28, issue.1, pp.128-140, 2017.

H. Moniz, J. Leitão, R. J. Dias, J. Gehrke, N. M. Preguiça et al., Blotter: Low latency transactions for geo-replicated storage, Proceedings of WWW, pp.3-7, 2017.

Q. Gan, X. Wang, and X. Fang, Efficient and secure auditing scheme for outsourced big data with dynamicity in cloud, SCIENCE CHINA Information Sciences, vol.61, issue.12, p.15, 2018.

K. Wang, C. Xu, Y. Zhang, S. Guo, and A. Y. Zomaya, Robust big data analytics for electricity price forecasting in the smart grid, IEEE Transactions on Big Data, vol.5, issue.1, pp.34-45, 2019.

J. Dean and S. Ghemawat, Mapreduce: simplified data processing on large clusters, Communications of the ACM, vol.51, issue.1, pp.107-113, 2008.

W. Wang, K. Zhu, L. Ying, J. Tan, and L. Zhang, Maptask scheduling in mapreduce with data locality: Throughput and heavy-traffic optimality, IEEE/ACM Transactions on Networking, vol.24, issue.1, pp.190-203, 2016.

S. M. Nabavinejad, M. Goudarzi, and S. Mozaffari, The memory challenge in reduce phase of mapreduce applications, IEEE Transactions on Big Data, vol.2, issue.4, pp.380-386, 2016.

Y. Zhu, Y. Jiang, W. Wu, L. Ding, A. Teredesai et al., Minimizing makespan and total completion time in mapreduce-like systems, Proceedings of INFOCOM, 2014.

M. Chowdhury, Y. Zhong, and I. Stoica, Efficient coflow scheduling with varys, Proceedings of SIGCOMM, pp.17-22, 2014.

A. G. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim et al., VL2: a scalable and flexible data center network, Proceedings of SIGCOMM, pp.16-21, 2009.

M. Chowdhury, S. Kandula, and I. Stoica, Leveraging endpoint flexibility in data-intensive clusters, Proceedings of SIGCOMM, pp.12-16, 2013.

M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker et al., Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling, Proceedings of EuroSys, pp.13-16

F. Ahmad, S. T. Chakradhar, A. Raghunathan, and T. N. Vijaykumar, Shufflewatcher: Shuffle-aware scheduling in multi-tenant mapreduce clusters, Proceedings of USENIX ATC, 2014.

S. Venkataraman, A. Panda, G. Ananthanarayanan, M. J. Franklin, and I. Stoica, The power of choice in data-aware cluster scheduling, Proceedings of OSDI, 2014.

Y. Chen, A. Ganapathi, R. Griffith, and R. H. Katz, The case for evaluating mapreduce performance using workload suites, Proceedings of MASCOTS, pp.25-27

I. Goiri, R. Bianchini, S. Nagarakatte, and T. D. Nguyen, Approxhadoop: Bringing approximations to mapreduce frameworks, Proceedings of ASPLOS, 2015.

M. Mitzenmacher, The power of two choices in randomized load balancing, IEEE Transactions on Parallel and Distributed Systems, vol.12, issue.10, pp.1094-1104, 2001.

Y. Lu, A. Shanbhag, A. Jindal, and S. Madden, Adaptdb: Adaptive partitioning for distributed joins, PVLDB, vol.10, issue.5, pp.589-600, 2017.

S. Ghemawat, H. Gobioff, and S. Leung, The google file system, Proceedings of SOSP, 2003.

B. Dong, K. Wu, S. Byna, J. Liu, W. Zhao et al., Arrayudf: User-defined scientific data analysis on arrays, Proceedings of HPDC, 2017.

W. Fan, J. Xu, Y. Wu, W. Yu, J. Jiang et al., Parallelizing sequential graph computations, Proceedings of SIGMOD, 2017.

M. Chowdhury, M. Zaharia, J. Ma, M. I. Jordan, and I. Stoica, Managing data transfers in computer clusters with orchestra, Proceedings of SIGCOMM, 2011.

V. Jalaparti, P. Bodík, I. Menache, S. Rao, K. Makarychev et al., Network-aware scheduling for dataparallel jobs: Plan when you can, Proceedings of SIG-COMM, pp.17-21, 2015.

F. R. Dogar, T. Karagiannis, H. Ballani, and A. I. Rowstron, Decentralized task-aware scheduling for data center networks, Proceedings of SIGCOMM, pp.17-22, 2014.

Y. Ying, R. Birke, C. Wang, L. Y. Chen, and N. Gautam, Optimizing energy, locality and priority in a mapreduce cluster, Proceedings of International Conference on Autonomic Computing, 2015.

X. Ma, X. Fan, J. Liu, H. Jiang, and K. Peng, vlocality: Revisiting data locality for mapreduce in virtualized clouds, IEEE Network, vol.31, issue.1, pp.28-35, 2017.

M. Hammoud and M. F. Sakr, Locality-aware reduce task scheduling for mapreduce, Proceedings of IEEE CloudCom, 2011.

F. Liang and F. C. Lau, Bashuffler: Maximizing network bandwidth utilization in the shuffle of YARN, Proceedings of HPDC, 2016.

H. Zheng, Z. Wan, and J. Wu, Optimizing mapreduce framework through joint scheduling of overlapping phases, Proceedings of ICCCN, pp.1-4

H. Zhang, L. Chen, B. Yi, K. Chen, M. Chowdhury et al., CODA: toward automatically identifying and scheduling coflows in the dark, Proceedings of SIG-COMM, pp.22-26, 2016.

G. Ananthanarayanan, S. Kandula, A. G. Greenberg, I. Stoica, Y. Lu et al., Reining in the outliers in map-reduce clusters using mantri, Proceedings of OSDI, pp.4-6, 2010.

S. Rao, R. Ramakrishnan, A. Silberstein, M. Ovsiannikov, and D. Reeves, Sailfish: a framework for large scale data processing, Proceedings of SOCC, pp.14-17, 2012.

A. Rasmussen, V. T. Lam, M. Conley, G. Porter, R. Kapoor et al., Themis: an i/o-efficient mapreduce, Proceedings of SOCC, pp.14-17, 2012.

H. Zhang, B. Cho, E. Seyfe, A. Ching, and M. J. , Riffle: optimized shuffle service for large-scale data analytics, Proceedings of EuroSys, pp.23-26

A. , , 2018.

Z. Fu, T. Song, Z. Qi, and H. Guan, Efficient shuffle management with scache for DAG computing frameworks, Proceedings of PPoPP, pp.24-28, 2018.

K. Ousterhout, R. Rasti, S. Ratnasamy, S. Shenker, and B. Chun, Making sense of performance in data analytics frameworks, Proceedings of NSDI, pp.4-6, 2015.

P. X. Gao, A. Narayan, S. Karandikar, J. Carreira, S. Han et al., Network requirements for resource disaggregation, Proceedings of OSDI, 2016.

A. Trivedi, P. Stuedi, J. Pfefferle, R. Stoica, B. Metzler et al., On the [ir]relevance of network performance for data processing, Proceedings of HotCloud, pp.20-21, 2016.

A. Vahdat, M. Al-fares, N. Farrington, R. N. Mysore, G. Porter et al., Scale-out networking in the data center, IEEE Micro, vol.30, issue.4, pp.29-41, 2010.

M. Alizadeh, A. G. Greenberg, D. A. Maltz, J. Padhye, P. Patel et al., Proceedings of SIGCOMM, 2010.

A. Roy, H. Zeng, J. Bagga, G. Porter, and A. C. Snoeren, Inside the social network's (datacenter) network, Proceedings of SIGCOMM, pp.17-21, 2015.

P. Bodík, I. Menache, M. Chowdhury, P. Mani, D. A. Maltz et al., Surviving failures in bandwidthconstrained datacenters, Proceedings of SIGCOMM, pp.13-17, 2012.

P. Song, Y. Liu, T. Liu, and D. Qian, Flow stealer: lightweight load balancing by stealing flows in distributed SDN controllers, SCIENCE CHINA Information Sciences, vol.60, issue.3, p.32202, 2017.

S. Venkataraman, Z. Yang, M. J. Franklin, B. Recht, and I. Stoica, Ernest: Efficient performance prediction for large-scale advanced analytics, Proceedings of NSDI, pp.16-18, 2016.

F. Ahmad, S. Lee, M. Thottethodi, and T. N. Vijaykumar, Mapreduce with communication overlap (marco), Journal of Parallel and Distributed Computing, vol.73, pp.608-620, 2013.

X. Ren, G. Ananthanarayanan, A. Wierman, and M. Yu, Hopper: Decentralized speculation-aware cluster scheduling at scale, Proceedings of SIGCOMM, pp.17-21, 2015.

G. Amvrosiadis, J. W. Park, G. R. Ganger, G. A. Gibson, E. Baseman et al., On the diversity of cluster workloads and its impact on research results, Proceedings of USENIX ATC, pp.11-13, 2018.

G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica, Effective straggler mitigation: Attack of the clones, Proceedings of NSDI, 2013.

A. C. Zhou, T. Phan, S. Ibrahim, and B. He, Energyefficient speculative execution using advanced reservation for heterogeneous clusters, Proceedings of ICPP, pp.13-16, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01807496

Y. Chen, S. Alspaugh, and R. H. Katz, Interactive analytical processing in big data systems: A cross-industry study of mapreduce workloads, PVLDB, vol.5, issue.12, pp.1802-1813, 2012.