Statistiques
| Branche: | Révision :

xlcloud / papers / 2014 / reservation / paper.tex @ c3cc6a4e

Historique | Voir | Annoter | Télécharger (32,29 ko)

1 c3cc6a4e Marcos Assuncao
\documentclass[times]{speauth}
2 c3cc6a4e Marcos Assuncao
\usepackage{relsize}
3 c3cc6a4e Marcos Assuncao
% \usepackage{moreverb}
4 c3cc6a4e Marcos Assuncao
% \usepackage[dvips,colorlinks,bookmarksopen,bookmarksnumbered,citecolor=red,urlcolor=red]{hyperref}
5 6862e602 Marcos Assuncao
\usepackage{ctable}
6 6862e602 Marcos Assuncao
\usepackage{cite}
7 c3cc6a4e Marcos Assuncao
% \usepackage[cmex10]{amsmath}
8 f7ac2a11 Marcos Assuncao
\usepackage{acronym}
9 6862e602 Marcos Assuncao
\usepackage{graphicx}
10 6862e602 Marcos Assuncao
\usepackage{multirow}
11 c3cc6a4e Marcos Assuncao
% \usepackage{balance}
12 848e7701 Marcos Assuncao
\usepackage{algorithm2e}
13 6862e602 Marcos Assuncao
14 c3cc6a4e Marcos Assuncao
\def\volumeyear{2014}
15 c3cc6a4e Marcos Assuncao
16 f7ac2a11 Marcos Assuncao
\acrodef{KWAPI}{KiloWatt API}
17 f7ac2a11 Marcos Assuncao
\acrodef{KWRanking}{KiloWatt Ranking}
18 f7ac2a11 Marcos Assuncao
\acrodef{VPC}{Virtual Private Cloud}
19 f7ac2a11 Marcos Assuncao
\acrodef{HPC}{High Performance Computing}
20 f7ac2a11 Marcos Assuncao
\acrodef{GPU}{Graphical Processing Unit}
21 81c69c05 Marcos Assuncao
\acrodef{AWS}{Amazon Web Services}
22 81c69c05 Marcos Assuncao
\acrodef{VM}{Virtual Machine}
23 81c69c05 Marcos Assuncao
\acrodef{REST}{REpresentational State Transfer}
24 c3cc6a4e Marcos Assuncao
\acrodef{RPC}{Remote Procedure Call}
25 c3cc6a4e Marcos Assuncao
\acrodef{PUE}{Power Usage Effectiveness}
26 c3cc6a4e Marcos Assuncao
\acrodef{IPMI}{Intelligent Platform Management Interface}
27 c3cc6a4e Marcos Assuncao
\acrodef{PDU}{Power Distribution Unit}
28 c3cc6a4e Marcos Assuncao
\acrodef{ePDU}{enclosure PDU}
29 c3cc6a4e Marcos Assuncao
\acrodef{JSON}{JavaScript Object Notation}
30 c3cc6a4e Marcos Assuncao
\acrodef{RRD}{Round-Robin Database}
31 f7ac2a11 Marcos Assuncao
32 c3cc6a4e Marcos Assuncao
\begin{document}
33 6862e602 Marcos Assuncao
34 c3cc6a4e Marcos Assuncao
\runningheads{F. Rossigneux \textit{et al.}}{A Resource Reservation System for OpenStack}
35 6862e602 Marcos Assuncao
36 c3cc6a4e Marcos Assuncao
\title{A Resource Reservation System to Improve the\\Support for HPC Applications in OpenStack}
37 6862e602 Marcos Assuncao
38 c3cc6a4e Marcos Assuncao
\author{Fran\c{c}ois Rossigneux, Laurent Lef\`{e}vre and Marcos Dias de Assun\c{c}\~ao}
39 6862e602 Marcos Assuncao
40 c3cc6a4e Marcos Assuncao
\address{LIP, ENS de Lyon, University of Lyon, France}
41 6862e602 Marcos Assuncao
42 c3cc6a4e Marcos Assuncao
\corraddr{46 all\'{e} d'Italie, 69364, Lyon, FRANCE}
43 6862e602 Marcos Assuncao
44 6862e602 Marcos Assuncao
45 6862e602 Marcos Assuncao
\begin{abstract}
46 c3cc6a4e Marcos Assuncao
A reservation system is important to enable users to plan the execution of their applications and for providers to deliver better performance guarantees. We present a reservation framework called Climate that interfaces with Nova, OpenStack's compute scheduler, and enables the provisioning of bare-metal resources. Climate manages reservations and their placement on physical hosts taking into account several factors, including resource constraints and energy efficiency. For selecting the most energy-efficient resources, Climate uses a software framework termed as \ac{KWAPI}. This work describes the overall software system that can be used for managing advance reservations and minimising energy consumption in OpenStack clouds. 
47 6862e602 Marcos Assuncao
\end{abstract}
48 6862e602 Marcos Assuncao
49 c3cc6a4e Marcos Assuncao
\keywords{resource reservation; high-performance computing; OpenStack}
50 c3cc6a4e Marcos Assuncao
51 c3cc6a4e Marcos Assuncao
\maketitle
52 6862e602 Marcos Assuncao
53 6862e602 Marcos Assuncao
\section{Introduction}
54 f7ac2a11 Marcos Assuncao
\acresetall
55 6862e602 Marcos Assuncao
56 c3cc6a4e Marcos Assuncao
Cloud computing \cite{ArmbrustCloud:2009} has become an important model for delivering the IT resources and services that organisationg require to run their businesses. Amongst the claimed benefits of the cloud computing model, the most appealing derive from economies of scale and often include resource consolidation, elasticity, good availability, and wide geographical coverage. The on-demand resource provisioning scheme explored since the early days of cloud computing enables customers to request a number of resources --- often
57 c3cc6a4e Marcos Assuncao
virtual machines, or storage and network capacity --- and pay by the hour of use. As technology and resource management techniques matured, more elaborated economic models have become available, where customers can reserve a number of machines for a long period at a discount price, or bid for resources whose price varies dynamically according to current demand (\textit{e.g.} AWS EC2 spot
58 c3cc6a4e Marcos Assuncao
instances \cite{AmazonEC2Spot}).
59 c3cc6a4e Marcos Assuncao
60 c3cc6a4e Marcos Assuncao
Current cloud models pose challenges to providers who need to offer the resource elasticity that customers expect. Techniques such as advance reservation, which have been widely explored in other systems such as clusters and grids, are not commonplace in current cloud computing offers. Resource reservation may be advantageous to certain customers as it enables them to specify a start and finish time dictating when they need resources, and can help providers by offering means through which they can have better estimates on future resource usage. 
61 676b6feb Marcos Assuncao
62 c3cc6a4e Marcos Assuncao
In addition, although a large number of current applications and services can deal with the workload consolidation explored by providers via resource virtualisation, certain applications that demand \ac{HPC} are not fully portable to this scenario. They are generally resource intensive and sensitive to performance variations. The means employed by cloud providers to offer customers with high and predictable performance mostly consist in deploying bare-metal resources or using specialised virtual machines placed in groups where high network throughput and low latency can be guaranteed. This model may seem in contrast with traditional cloud use cases as it results in large operational cost and provides little flexibility in terms of workload consolidation and resource elasticity. Reservations provide means for reliable allocation and allow customers to plan the execution of their applications. Current reservation models employed by public clouds, however, rely on reserving resources in advance for a long time period (\textit{i.e.} from one to three years).
63 676b6feb Marcos Assuncao
64 c3cc6a4e Marcos Assuncao
In this paper, we describe a reservation framework for reserving and provisioning of bare-metal resources using a popular open source cloud platform. The proposed solution, implemented as a configurable component of OpenStack\footnote{https://wiki.openstack.org}, provides reservation models that are more sophisticated and flexible than those currently offered by public cloud providers. The framework leverages \ac{KWAPI}, a software framework that monitors the energy consumption of data centre resources and interfaces with OpenStack's telemetry infrastructure.  
65 6862e602 Marcos Assuncao
66 6862e602 Marcos Assuncao
% ----------------------------------------------------------------------------------------
67 6862e602 Marcos Assuncao
68 6862e602 Marcos Assuncao
\section{Background and Related Work}
69 6862e602 Marcos Assuncao
\label{sec:related_work}
70 6862e602 Marcos Assuncao
71 c3cc6a4e Marcos Assuncao
Bursts of requests during a few hours of the day have been noticed in several systems whose logs have been extensively studied \cite{FeitelsonPWA:2014}. Although bare-metal or specialised \acp{VM} minimise performance penalties for running \ac{HPC} applications on a cloud, providing the elasticity with which other cloud users are familiar with can be prohibitive, and advance reservations may be explored to help minimising these costs. This section describes some of the current reservation models available for clouds and discusses their limitations. It also provides background information on OpenStack and the main components leveraged by the reservation framework. 
72 3200320d Marcos Assuncao
73 3200320d Marcos Assuncao
\subsection{Reservation in Clouds}
74 3200320d Marcos Assuncao
75 c3cc6a4e Marcos Assuncao
The benefits and drawbacks of resource reservations have been extensively studied for various systems, such as clusters of computers \cite{SmithSchedARs:2000, LawsonMultiQueue:2002, MargoReservations:2000, RoblitzARPlacement:2006}, meta-schedulers \cite{SnellARMetaScheduling:2000}, computational grids \cite{ElmrothBrokerCCPE:2009,FarooqlaxityARs:2005,FosterGara:1999}, virtual clusters and virtual infrastructure \cite{ChaseCOD:2003}; and have been applied under multiple scenarios including co-allocation of resources \cite{NettoFlexibleARs:2007}, and improving performance predictability of certain applications \cite{WieczorekARWorkflow:2006}. As of writing, \ac{AWS}\footnote{http://aws.amazon.com/ec2/} offers cloud services that suit several of today's use cases and provides the richest set of reservation options for virtual machine instances. \ac{AWS} offers four models of allocating \ac{VM} instances, namely on-demand, reserved instances, spot instances and dedicated --- the latter are allocated within a \ac{VPC}. Under on-demand use, customers are charged on an hourly basis for the instances they allocate. The performance is not guaranteed as resources are shared among customers and their performance depends on the cloud workload. Reserved instances can be requested at a discount price under the establishment of long term contracts, and provide more guarantees in terms of performance than their on-demand counterparts. To request spot instances, a customer must specify a limit price --- often referred to as bid price --- she is willing to pay for a given instance type. If the bid price surpasses the spot market price, the user receives the requested instances. Existing spot instances can be destroyed if the spot price exceeds a user's bid. Instances created within a \ac{VPC} provide some degree of isolation at the network level, but their physical hosts may contain instances from multiple customers. To improve fault tolerance, it is possible to request dedicated instances at a premium, so that instances will not share the host. Under all the models described here, \ac{AWS} allows users to request \ac{HPC} instances, optimised for processing, memory use, I/O, and instances with \acp{GPU}.
76 3200320d Marcos Assuncao
77 f7ac2a11 Marcos Assuncao
OpenNebula cloud management tool \cite{OpenNebula:2011} provides reservation and scheduling services by using Haizea \cite{SotomayorLeases:2008}. Haizea supports multiple types of reservations (\textit{e.g.} immediate, advanced and best-effort) and takes into account the time required to prepare and configure the resources occupied by the virtual machines (\textit{e.g.} time to transfer virtual machine image files) so that a reservation can start at the exact time a user required. When used in simulation mode, Haizea offers means to evaluate the scheduling impact of accepting a set of reservations. We found that the  Haizea model --- where a scheduler able to handle reservations was incorporated as a module of a cloud resource manager --- as good starting point to provide reservation and \ac{HPC} support for OpenStack. In the following sections we provide background information on OpenStack's compute management architecture, and how we extend it by providing loosely coupled components to handle resource reservations.
78 3200320d Marcos Assuncao
79 3200320d Marcos Assuncao
\subsection{OpenStack}
80 3200320d Marcos Assuncao
81 c3cc6a4e Marcos Assuncao
OpenStack is an open source cloud computing platform suitable to both public and private settings. It manages and automates deployment of virtual machines on pools of compute resources, can work with a range of virtualisation technologies and can handle a variety of tenants. OpenStack has gained traction and received support from a wide development community who incorporated several features, including block storage, image library, network provisioning framework, authentication and authorisation, among other services. In this work we focus mainly on the compute management service, called Nova \footnote{https://wiki.openstack.org/wiki/Nova}, and the telemetry infrastructure termed as Ceilometer\footnote{http://docs.openstack.org/developer/ceilometer/architecture.html}. 
82 3200320d Marcos Assuncao
83 81c69c05 Marcos Assuncao
Nova manages a cloud computing fabric and is responsible for instantiating virtual machines on a pool of available hosts running \textit{nova-compute}. Nova uses its \textit{nova-scheduler} service to determine how to place virtual machine requests. Its API (\textit{i.e. nova-api}) is used by clients to request virtual machine instances. In order to make a request for a \ac{VM}, a client needs to specify a number of requirements (\textit{e.g.} the flavour), which determine the instance to be created. By default, Nova uses a filter scheduler that initially obtains a list of physical hosts, applies a set of filters on the list to select servers that match the client's criteria, and then ranks the remaining servers according to a pre-configured weighing scheme. Nova comes with a set of pre-configured filters, but the choice of filters and weighing is configurable. Nova can also be configured to accept hints from a client to influence how hosts are filtered and ranked upon a new request.
84 3200320d Marcos Assuncao
85 c3cc6a4e Marcos Assuncao
Ceilometer is OpenStack's framework for collecting performance metrics and information on resource consumption. It allows for data collection under three methods:
86 c3cc6a4e Marcos Assuncao
87 c3cc6a4e Marcos Assuncao
\begin{itemize}
88 c3cc6a4e Marcos Assuncao
\item \textbf{Bus listener agent}, which picks events on OpenStack's notification bus and turns them into Ceilometer samples (\textit{e.g.} cumulative type, gauge or delta) that can then be stored into the database or provided to an external system via publishing pipeline.
89 c3cc6a4e Marcos Assuncao
90 c3cc6a4e Marcos Assuncao
\item \textbf{Push agents}, more intrusive, consist in deploying agents on the monitored nodes to push data remotely to be taken by the collector.
91 c3cc6a4e Marcos Assuncao
92 c3cc6a4e Marcos Assuncao
\item \textbf{Polling agents} that poll APIs or other tools to collect information about monitored resources.
93 c3cc6a4e Marcos Assuncao
\end{itemize} 
94 c3cc6a4e Marcos Assuncao
95 c3cc6a4e Marcos Assuncao
The last two methods depend on a combination of central agent, computer agents and collector. The compute agents run on nodes and retrieve information about resource usage related to a given virtual machine instance and a resource owner. The central agent, on the other hand, executes \textit{pollsters} on the management server to retrieve data that is not linked to a particular instance. Pollsters are software components executed, for example, to poll resources by using an API or other methods. The Ceilometer database, which can be queried via Ceilometer API, allows an external system to view the history of a resource's metrics, and enables the system to set and receive alarms.
96 c3cc6a4e Marcos Assuncao
97 3200320d Marcos Assuncao
% ----------------------------------------------------------------------------------------
98 3200320d Marcos Assuncao
99 c3cc6a4e Marcos Assuncao
\section{Reservation Framework for OpenStack}
100 c3cc6a4e Marcos Assuncao
\label{sec:reservation_system}
101 c3cc6a4e Marcos Assuncao
102 c3cc6a4e Marcos Assuncao
The reservation system --- termed as Climate --- provides means for reserving and deploying bare-metal resources using OpenStack. The framework, whose architecture is depicted in Figure~\ref{fig:reservation_architecture}, has been used as basis for implementing other types of reservation systems for OpenStack. Climate aims to provide support for scheduling advance reservations, taking into account the energy efficiency of underlying resources and without being intrusive to Nova. In order to do so, Climate comprises the following components:
103 c3cc6a4e Marcos Assuncao
104 c3cc6a4e Marcos Assuncao
\begin{itemize}
105 c3cc6a4e Marcos Assuncao
\item \textbf{Reservation API}: used by client applications and users to reserve resources and query the status of reservations.
106 c3cc6a4e Marcos Assuncao
\item \textbf{Climate Inventory}: a service that stores information about the physical nodes that can be used for reservations.
107 c3cc6a4e Marcos Assuncao
\item \textbf{Climate Scheduler}: responsible for scheduling reservation requests on the available nodes.
108 c3cc6a4e Marcos Assuncao
\item \textbf{Energy-Consumption Monitoring Framework}: component responsible for monitoring the energy consumption of physical resources and interfacing with OpenStack telemetry infrastructure. 
109 c3cc6a4e Marcos Assuncao
\end{itemize} 
110 c3cc6a4e Marcos Assuncao
111 c3cc6a4e Marcos Assuncao
The next sections provide more details about each Climate component and how they interact with one another.
112 c3cc6a4e Marcos Assuncao
113 c3cc6a4e Marcos Assuncao
\begin{figure}[htb]
114 3200320d Marcos Assuncao
\centering 
115 c3cc6a4e Marcos Assuncao
\includegraphics[width=0.85\linewidth]{figs/architecture.pdf} 
116 7e78f797 Marcos Assuncao
\caption{Architecture of the proposed reservation framework.}
117 3200320d Marcos Assuncao
\label{fig:reservation_architecture}
118 c3cc6a4e Marcos Assuncao
\end{figure}
119 7e78f797 Marcos Assuncao
120 3200320d Marcos Assuncao
\subsection{Reservation API} 
121 3200320d Marcos Assuncao
122 c3cc6a4e Marcos Assuncao
Climate provides a \ac{REST} API that enables users and client applications to manage reservation requests. When requesting a reservation --- Step 1 in \ref{fig:reservation_architecture} --- a client should supply the following parameters:
123 3200320d Marcos Assuncao
124 3200320d Marcos Assuncao
\begin{itemize}
125 3200320d Marcos Assuncao
\item \textbf{host\_properties}: characteristics of required servers;
126 c3cc6a4e Marcos Assuncao
\item \textbf{start\_time}: earliest time at which the reservation can start;
127 3200320d Marcos Assuncao
\item \textbf{end\_time}: latest time for completing the reservation;
128 3200320d Marcos Assuncao
\item \textbf{duration}: time duration of the reservation; and
129 3200320d Marcos Assuncao
\item \textbf{quantity}: number of servers required;
130 3200320d Marcos Assuncao
\end{itemize}
131 3200320d Marcos Assuncao
132 c3cc6a4e Marcos Assuncao
If \textbf{start\_time} and \textbf{end\_time} are not specified, the request is treated as an immediate reservation, thus starting at the current time. The \ac{REST} API implements the calls described in Table~\ref{tab:climate_api} and relies on two backend services, namely \textbf{Climate Inventory} and \textbf{Climate Scheduler}, for storying information about hosts and managing reservations respectively. Handling a reservation request is performed in two phases. First, the API queries Climate Inventory to discover the available hosts that match the criteria specified by \textbf{host\_properties}. The Inventory interfaces with OpenStack Nova to keep the list of available hosts up-to-date --- as demonstrated by Step 2 in Figure \ref{fig:reservation_architecture}. Second, a filtered list of hosts, along with the other request parameters, are given to Climate Scheduler, which then finds a time period over which the reservation can be granted.   
133 3200320d Marcos Assuncao
134 c3cc6a4e Marcos Assuncao
\begin{table}[hbt]
135 3200320d Marcos Assuncao
\caption{Climate REST API calls}
136 3200320d Marcos Assuncao
\label{tab:climate_api}
137 3200320d Marcos Assuncao
\centering
138 3200320d Marcos Assuncao
\begin{tabular}{lll}
139 3200320d Marcos Assuncao
\toprule
140 3200320d Marcos Assuncao
\textbf{Method} & \textbf{URL} & \textbf{Description}\\
141 3200320d Marcos Assuncao
\toprule
142 3200320d Marcos Assuncao
GET & /properties/ & Lists resource properties\\
143 3200320d Marcos Assuncao
\midrule
144 3200320d Marcos Assuncao
POST & /reservations/ & Creates a reservation\\
145 3200320d Marcos Assuncao
\midrule
146 3200320d Marcos Assuncao
GET & /reservations/ & Lists reservations\\
147 3200320d Marcos Assuncao
\midrule
148 3200320d Marcos Assuncao
GET & /reservations/\{reservation-id\} & Describes a reservation\\
149 3200320d Marcos Assuncao
\midrule
150 3200320d Marcos Assuncao
DELETE & /reservations/\{reservation-id\} & Cancels a reservation\\
151 3200320d Marcos Assuncao
\bottomrule
152 3200320d Marcos Assuncao
\end{tabular}
153 3200320d Marcos Assuncao
\end{table}
154 3200320d Marcos Assuncao
155 c3cc6a4e Marcos Assuncao
The API is not only an interface for tenants. Nova uses it to find available hosts and to determine a set of resources associated with a reservation when a client claims the reserved resources --- Step 3 in Figure \ref{fig:reservation_architecture} --- and modules that need to query the reservation schedule do so via the API. Moreover, the API uses the same security infrastructure provided by OpenStack, including messages carrying Keystone tokens\footnote{http://docs.openstack.org/developer/keystone/}, which are used to allow a client application to discover the hosts associated with a reservation.
156 c3cc6a4e Marcos Assuncao
157 c3cc6a4e Marcos Assuncao
158 c3cc6a4e Marcos Assuncao
\subsection{Climate Inventory} 
159 3200320d Marcos Assuncao
 
160 c3cc6a4e Marcos Assuncao
Climate Inventory is a \ac{RPC} service used by the reservation API to discover the hosts that are possible candidates to serve a reservation request. The candidates are servers that both match the host properties specified in the request and are available during the requested time (\textit{i.e.} their \textbf{running\_vm} field in Nova's database is set to 0). To do so, the Inventory used the NovaClient, which queries Nova's API and filters the list of potential candidates using the \textbf{json\_filter} syntax specified by Nova.
161 3200320d Marcos Assuncao
162 c3cc6a4e Marcos Assuncao
As mentioner beforehand, Climate Inventory uses \textbf{host\_properties} as filtering criteria. In order to specify the required hosts, a user needs to create such a filter. To ease this task, the \textbf{/properties/} call of the reservation API provides a catalogue of properties used for filtering. By default, the call shows the properties used for filtering based on the list of hosts registered with Nova, but an admin can choose to disable or enable certain properties.
163 3200320d Marcos Assuncao
164 3200320d Marcos Assuncao
\subsection{Climate Scheduler}
165 3200320d Marcos Assuncao
166 f7ac2a11 Marcos Assuncao
This component manages the reservation schedule and extends Nova's filtering scheduler by providing a set of resource filters and ranking (or weighing) criteria for handling reservation requests, as described as follows.
167 3200320d Marcos Assuncao
168 3200320d Marcos Assuncao
\subsubsection{Filtering}
169 c3cc6a4e Marcos Assuncao
Nova filter accepts a scheduling hint --- in our case used to provide a reservation ID created by Climate. When an ID is provided, the filter uses the reservation API --- also providing an admin Keystone token --- to retrieve the list of hosts associated with the reservation, Step 4 in Figure \ref{fig:reservation_architecture}. If no reservation ID is given, Nova still uses the reservation API to establish the list of hosts that have been reserved. Only the hosts that are not in this list can be used to serve the request.
170 3200320d Marcos Assuncao
171 3200320d Marcos Assuncao
\subsubsection{Ranking}
172 c3cc6a4e Marcos Assuncao
We created two Nova weighers. The first weigher is ignored by reservation requests and ranks machines according to their free time until the next reservation. If handling a request for a non-reserved instance, the weigher tries to place the instance on a host that is available for the longest period. This helps minimise the chance of having to migrate the instance at a later time to vacate its host for a reservation. The second weigher, termed as \ac{KWRanking} ranks machines by their power efficiency (\textit{i.e.} FLOPS/Watt) and relies on:
173 3200320d Marcos Assuncao
174 3200320d Marcos Assuncao
\begin{itemize}
175 c3cc6a4e Marcos Assuncao
\item A software infrastructure called \ac{KWAPI} built for monitoring the power consumed by resources of a data centre and for interfacing with Ceilometer to provide power consumption data. Ceilometer is OpenStack's telemetry infrastructure used to monitor performance metrics\footnote{https://wiki.openstack.org/wiki/Ceilometer}.  
176 3200320d Marcos Assuncao
177 f7ac2a11 Marcos Assuncao
\item A benchmark executed on the machines to determine their delivered performance by watt.
178 3200320d Marcos Assuncao
\end{itemize}
179 6862e602 Marcos Assuncao
180 f7ac2a11 Marcos Assuncao
The goal of this weigher is to prioritise the use of the most power efficient machines, and create windows during which the least efficient resources could be powered off or placed in low power consumption modes. Climate provides an API that enables switching resources on/off, or putting them into standby mode. Choosing between placing a resource in standby mode or switching it off depends on the length of time during which it is expected to remain idle. As switching a resource back on to serve an impending request often takes time, means to estimate future workload are generally important.
181 6862e602 Marcos Assuncao
182 c3cc6a4e Marcos Assuncao
To determine the most efficient hosts, \ac{KWRanking} queries the \textbf{Benchmark Execution} module --- Step 5 of Figure \ref{fig:reservation_architecture} --- which returns the hosts FLOPS/Watt information. The Benchmark Execution obtains the performance per watt information about hosts by triggering the execution benchmarks requested by the scheduler to measure the hosts performance, and gathering information on their power consumption from Ceilometer --- Step 6 and 7 respectivelly. 
183 a04cd5c3 Marcos Assuncao
184 c3cc6a4e Marcos Assuncao
% Detail the benchmarks here...
185 a04cd5c3 Marcos Assuncao
186 c3cc6a4e Marcos Assuncao
The information on the power consumed by hosts is provided to Ceilometer by \ac{KWAPI} as depicted by Step 8. Although \ac{KWAPI} is described in detail in previous work \cite{Rossigneux:2014}, Section~\ref{sec:kwapi} presents an overview and describes how it fits the reservation system for OpenStack.
187 a04cd5c3 Marcos Assuncao
188 a04cd5c3 Marcos Assuncao
% ----------------------------------------------------------------------------------------
189 c3cc6a4e Marcos Assuncao
  
190 c3cc6a4e Marcos Assuncao
\section{Energy-Consumption Monitoring Framework}
191 c3cc6a4e Marcos Assuncao
\label{sec:kwapi}
192 a04cd5c3 Marcos Assuncao
193 c3cc6a4e Marcos Assuncao
\ac{KWAPI} is a generic and flexible framework that interfaces with OpenStack to provide it with power consumption information collected from multiple heterogeneous probes. It is integrated with Ceilometer; OpenStack's  component conceived to provide a framework to collect a large range of metrics for metering purposes\footnote{https://wiki.openstack.org/wiki/Ceilometer}. The \ac{KWAPI} architecture, depicted in Figure~\ref{fig:architecture}, follows a publish/subscribe model based on a set of layers:
194 a04cd5c3 Marcos Assuncao
195 c3cc6a4e Marcos Assuncao
\begin{itemize} 
196 c3cc6a4e Marcos Assuncao
\item \textbf{Drivers}: data producers responsible for measuring the power consumption of monitored resources and providing the collected data to consumers via a communication bus. 
197 c3cc6a4e Marcos Assuncao
\item \textbf{Data Consumers}: or \textbf{Consumers} for short, that subscribe to receive and process the measurement information. 
198 c3cc6a4e Marcos Assuncao
\end{itemize}
199 a04cd5c3 Marcos Assuncao
200 c3cc6a4e Marcos Assuncao
\begin{figure}[!htb]
201 c3cc6a4e Marcos Assuncao
\center
202 c3cc6a4e Marcos Assuncao
\includegraphics[width=0.95\linewidth]{figs/kwapi_architecture.pdf}
203 c3cc6a4e Marcos Assuncao
\caption{Overview of \ac{KWAPI}'s architecture.}
204 c3cc6a4e Marcos Assuncao
\label{fig:architecture}
205 c3cc6a4e Marcos Assuncao
\end{figure}
206 ec68a261 Marcos Assuncao
207 c3cc6a4e Marcos Assuncao
The communication between layers is handled by a bus. Data consumers can subscribe to receive information collected by drivers from multiple sites. Both drivers and consumers are easily extensible to support, respectively, several types of wattmeters (\textit{i.e.} energy consumption probes) and provide additional data processing services. A \ac{REST} API is designed as a data consumer to provide a programming interface for developers and system administrators. It interfaces with OpenStack by providing the information (\textit{i.e.} by polling monitored devices) required by a \textit{\ac{KWAPI} Pollster} to feed Ceilometer. The following sections provide more details on the main architecture components and their relationship with OpenStack Ceilometer.
208 a04cd5c3 Marcos Assuncao
209 c3cc6a4e Marcos Assuncao
\subsection{Driver Layer}
210 a04cd5c3 Marcos Assuncao
211 c3cc6a4e Marcos Assuncao
Drivers are threads initialised by a Driver Manager with a set of parameters loaded from a file compliant with the OpenStack configuration format. These parameters are used to query the meters (\textit{e.g.} IP address and port) and determine the sensor ID to be used in the collected metrics. The measurements that a driver obtains are represented as \ac{JSON} dictionaries that maintain a small footprint and that can be easily parsed. The size of dictionaries varies depending on the number of fields set by drivers (\textit{i.e.} whether message signing is enabled). Drivers can manage incidents themselves, but the manager also checks periodically if all threads are active, restarting them if necessary. It is important to avoid losing measurements because the reported information is in W instead of kWh. 
212 a04cd5c3 Marcos Assuncao
213 c3cc6a4e Marcos Assuncao
Wattmeters available in the market vary in terms of physical interconnection, communication protocols, packaging and precision of measurements they take. They are mostly packaged in multiple outlet power strips called \acp{PDU} or \acp{ePDU}, and more recently in the \ac{IPMI} cards embedded in the computers themselves. Support for several types of wattmeter has been implemented, which drivers can use to interface with a wide range of equipments. In our work, we used \ac{IPMI} initially at Nova to shutdown and turn on compute nodes, but nowadays we also use it to query a computer chassis remotely.
214 a04cd5c3 Marcos Assuncao
215 c3cc6a4e Marcos Assuncao
Although Ethernet is generally used to transport \ac{IPMI} or SNMP packets over IP, USB and RS-232 serial links are also common. Wattmeters that use Ethernet are generally connected to an administration network (isolated from the data centre main data network). Moreover, wattmeters may differ in the manner they operate; some equipments send measurements to a management node on a regularly basis (push mode), whereas others respond to queries (pull mode). Other characteristics that differ across wattmeters include: 
216 ec68a261 Marcos Assuncao
217 c3cc6a4e Marcos Assuncao
\begin{itemize}
218 c3cc6a4e Marcos Assuncao
\item refresh rate (\textit{i.e.} maximum number of measurements per second);
219 c3cc6a4e Marcos Assuncao
\item measurement precision; and 
220 c3cc6a4e Marcos Assuncao
\item methodology applied to each measurement (\textit{e.g.} mean of several measurements, instantaneous values, and exponential moving averages).
221 c3cc6a4e Marcos Assuncao
\end{itemize}
222 ec68a261 Marcos Assuncao
223 c3cc6a4e Marcos Assuncao
Table \ref{tab:wattmeters} shows the characteristics of equipments we deployed and used with \ac{KWAPI} in our cloud infrastructure.
224 ec68a261 Marcos Assuncao
225 c3cc6a4e Marcos Assuncao
\begin{table}
226 c3cc6a4e Marcos Assuncao
\centering
227 c3cc6a4e Marcos Assuncao
\caption{Wattmeter infrastructure}
228 c3cc6a4e Marcos Assuncao
\label{tab:wattmeters}
229 c3cc6a4e Marcos Assuncao
\begin{footnotesize}
230 c3cc6a4e Marcos Assuncao
\begin{tabular}{llcc}
231 c3cc6a4e Marcos Assuncao
\toprule
232 c3cc6a4e Marcos Assuncao
\multirow{2}{18mm}{\textbf{Device Name}} & \multirow{2}{30mm}{\textbf{Interface}} & \multirow{2}{12mm}{\centering{\textbf{Refresh Time (s)}}} & \multirow{2}{10mm}{\centering{\textbf{Precision (W)}}}  \\
233 c3cc6a4e Marcos Assuncao
& & & \\
234 c3cc6a4e Marcos Assuncao
\toprule
235 c3cc6a4e Marcos Assuncao
Dell iDrac6    & IPMI / Ethernet           & 5    & 7 \\
236 c3cc6a4e Marcos Assuncao
\midrule
237 c3cc6a4e Marcos Assuncao
Eaton          & Serial, SNMP via Ethernet & 5    & 1 \\
238 c3cc6a4e Marcos Assuncao
\midrule
239 c3cc6a4e Marcos Assuncao
OmegaWatt      & IrDA Serial               & 1    & 0.125 \\
240 c3cc6a4e Marcos Assuncao
\midrule
241 c3cc6a4e Marcos Assuncao
Schleifenbauer & SNMP via Ethernet         & 3    & 0.1 \\
242 c3cc6a4e Marcos Assuncao
\midrule
243 c3cc6a4e Marcos Assuncao
Watts Up?      & Proprietary via USB       & 1    & 0.1 \\
244 c3cc6a4e Marcos Assuncao
\midrule
245 c3cc6a4e Marcos Assuncao
ZEZ LMG450     & Serial                    & 0.05 & 0.01 \\
246 c3cc6a4e Marcos Assuncao
\bottomrule
247 c3cc6a4e Marcos Assuncao
\end{tabular}
248 c3cc6a4e Marcos Assuncao
\end{footnotesize}
249 c3cc6a4e Marcos Assuncao
\end{table}
250 a04cd5c3 Marcos Assuncao
251 a04cd5c3 Marcos Assuncao
252 c3cc6a4e Marcos Assuncao
\subsection{Data Consumers}
253 ec68a261 Marcos Assuncao
254 c3cc6a4e Marcos Assuncao
A data consumer retrieves and processes measurements taken by drivers and provided via bus. Consumers expose the information to other services including Ceilometer and visualisation tools. By using a system of prefixes, consumers can subscribe to all producers or a subset of them. When receiving a message, a consumer verifies the signature, extracts the content and processes the data. By default \ac{KWAPI} provides two data consumers, namely the KWAPI REST API (used to interface with Ceilometer) and a visualisation consumer.
255 ec68a261 Marcos Assuncao
256 c3cc6a4e Marcos Assuncao
\subsubsection{REST API:}
257 ec68a261 Marcos Assuncao
258 c3cc6a4e Marcos Assuncao
The API consumer computes the number of kWh of each driver probe, adds a timestamp, and stores the last value in watts. If a driver has not provided measurements for a long time, the corresponding data is removed. The REST API allows an external system to retrieve the name of probes, measurements in W or kWh, and timestamps. The API is secured by OpenStack Keystone tokens\footnote{http://keystone.openstack.org}, whereby the consumer needs to ensure the validity of a token before sending a response to the system. 
259 ec68a261 Marcos Assuncao
260 c3cc6a4e Marcos Assuncao
\subsubsection{Visualisation:}
261 ec68a261 Marcos Assuncao
262 c3cc6a4e Marcos Assuncao
The visualisation consumer builds \ac{RRD} files from received measurements, and generates graphs that show the energy consumption over a given period, with additional information such as average electricity consumption, minimum and maximum watt values, last value, total energy and cost in Euros. \ac{RRD} files are of fixed size and store several collections of metrics with different granularities. A web interface displays the generated graphics and a cache mechanism triggers the creation of graphs during queries only if they are out of date. These visualisation resources offer quick feedback to administrators and users during execution of tasks and applications.
263 81c69c05 Marcos Assuncao
264 c3cc6a4e Marcos Assuncao
\subsection{Internal Communication Bus}
265 a04cd5c3 Marcos Assuncao
266 c3cc6a4e Marcos Assuncao
\ac{KWAPI} uses ZeroMQ \cite{HintjensZeroMQ:2013}, a fast broker-less messaging framework written in C++, where transmitters play the role of buffers. ZeroMQ supports a wide range of bus modes, including cross-thread communication, IPC, and TCP. Switching from one mode to another is straightforward. ZeroMQ also provides several design patterns such as publish/subscribe and request/response. As mentioner earlier, in our publish/subscribe architecture drivers are publishers, and data consumers are subscribers. If no data consumer is subscribed to receive data from a given driver, the latter will not send any information through the network.
267 a04cd5c3 Marcos Assuncao
268 c3cc6a4e Marcos Assuncao
Moreover, one or more optional forwarders can be installed between drivers and data consumers to minimise network usage. Forwarders are designed to act as especial data consumers who subscribe to receive information from a driver and multicast it to all normal data consumers subscribed to receive the information. Forwarders enable the design of complex topologies and optimisation of network usage when handling data from multiple sites. They can also be used to bypass network isolation problems and perform load balancing.
269 c3cc6a4e Marcos Assuncao
270 c3cc6a4e Marcos Assuncao
\subsection{Interface with Ceilometer}
271 c3cc6a4e Marcos Assuncao
272 c3cc6a4e Marcos Assuncao
We opted for integrating KWAPI with an existing open source cloud platform to ease deployment and use. Leveraging the capabilities offered OpenStack can help in the adoption of a monitoring system and reduce its learning curve.
273 c3cc6a4e Marcos Assuncao
274 c3cc6a4e Marcos Assuncao
Ceilometer's central agent and a dedicated pollster (\textit{i.e.} \ac{KWAPI} Pollster) are used to publish and store energy metrics into Ceilometer's database. They query the REST API data consumer and publish cumulative (kWh) and gauge (W) counters that are not associated with a particular tenant, since a server can host multiple clients simultaneously. 
275 c3cc6a4e Marcos Assuncao
276 c3cc6a4e Marcos Assuncao
Depending on the number of monitored devices and the frequency at which measurements are taken, wattmeters can generate a large amount of data thus demanding storage capacity for further processing and analysis. Management systems often store and perform pre-processing locally on monitored nodes, but such an approach can impact on CPU utilisation and influence the power consumption. In addition, resource managers may switch off idle nodes or set them to stand by mode to save energy, which make them unavailable for processing. Centralised storage, on the other hand, allows for faster data access and processing, but can generate more traffic given that measurements need to be continuously transferred over the network to a central point.  
277 c3cc6a4e Marcos Assuncao
278 c3cc6a4e Marcos Assuncao
Ceilometer using its own central database, which is used here to store the energy consumption metrics. In this way, systems that interface with OpenStack's Ceilometer, including Nova, can easily retrieve the data. It is important to notice that, even though Ceilometer provides the notion of a central repository for metrics, it also uses a database abstraction that enables the use of distributed systems such as Apache Hadoop HDFS, Apache Cassandra, and MongoDB. 
279 c3cc6a4e Marcos Assuncao
280 c3cc6a4e Marcos Assuncao
The granularity at which measurements are taken and metrics are computed is another important factor because user needs vary depending on what they wish to evaluate. Taking measurements at one-second interval or smaller is common under several scenarios, which can be a challenge in an infrastructure comprising hundreds or thousands of nodes, demanding efficient and scalable mechanisms for transferring information on power consumption. Hence, in the next section we evaluate the throughput of KWAPI under a few scenarios.
281 c3cc6a4e Marcos Assuncao
282 c3cc6a4e Marcos Assuncao
283 c3cc6a4e Marcos Assuncao
% Describe machine wake-up/shutdown here...
284 81c69c05 Marcos Assuncao
285 8c7406f8 Marcos Assuncao
% ----------------------------------------------------------------------------------------
286 ec68a261 Marcos Assuncao
287 c3cc6a4e Marcos Assuncao
288 6862e602 Marcos Assuncao
\section{Conclusion}
289 6862e602 Marcos Assuncao
\label{sec:conclusion}
290 6862e602 Marcos Assuncao
291 a04cd5c3 Marcos Assuncao
This work discussed the need for reservation support in cloud resource management. It introduced an OpenStack framework for enabling resource reservation, with a focus on bare-metal provisioning for certain high performance computing applications.
292 8c7406f8 Marcos Assuncao
293 6862e602 Marcos Assuncao
% ----------------------------------------------------------------------------------------
294 6862e602 Marcos Assuncao
295 81c69c05 Marcos Assuncao
\section*{Acknowledgments}
296 6862e602 Marcos Assuncao
297 f7ac2a11 Marcos Assuncao
This research is supported by the French Fonds national pour la Soci\'{e}t\'{e} Num\'{e}rique (FSN) XLcloud project. Some experiments presented in this paper were carried out using the Grid'5000 experimental testbed, being developed under the Inria ALADDIN development action with support from CNRS, RENATER and several Universities as well as other funding bodies (see https://www.grid5000.fr). 
298 6862e602 Marcos Assuncao
299 c3cc6a4e Marcos Assuncao
\bibliographystyle{wileyj}
300 6862e602 Marcos Assuncao
%\balance
301 6862e602 Marcos Assuncao
\bibliography{references}
302 6862e602 Marcos Assuncao
303 6862e602 Marcos Assuncao
\end{document}