Statistiques
| Branche: | Révision :

xlcloud / papers / 2014 / kwapi / paper.tex @ master

Historique | Voir | Annoter | Télécharger (30,6 ko)

1 e542267e Marcos Assuncao
2 46564e42 Marcos Assuncao
\documentclass[conference]{IEEEtran}
3 46564e42 Marcos Assuncao
% Add the compsoc option for Computer Society conferences.
4 46564e42 Marcos Assuncao
5 46564e42 Marcos Assuncao
\usepackage{ctable}
6 46564e42 Marcos Assuncao
\usepackage{cite}
7 e542267e Marcos Assuncao
\usepackage[cmex10]{amsmath}
8 c286aa3e Marcos Assuncao
 \usepackage{acronym}
9 e542267e Marcos Assuncao
\usepackage{graphicx}
10 e542267e Marcos Assuncao
\usepackage{multirow}
11 46564e42 Marcos Assuncao
\usepackage{listings}
12 46564e42 Marcos Assuncao
\usepackage{color}
13 46564e42 Marcos Assuncao
\usepackage{xcolor}
14 46564e42 Marcos Assuncao
\usepackage{balance}
15 46564e42 Marcos Assuncao
16 46564e42 Marcos Assuncao
\colorlet{@punct}{red!60!black}
17 46564e42 Marcos Assuncao
\definecolor{@delim}{RGB}{20,105,176}
18 46564e42 Marcos Assuncao
19 46564e42 Marcos Assuncao
\lstdefinelanguage{json}{
20 46564e42 Marcos Assuncao
    basicstyle=\footnotesize\ttfamily,
21 46564e42 Marcos Assuncao
    literate=
22 46564e42 Marcos Assuncao
     *{\ }{{{\ }}}{1}
23 46564e42 Marcos Assuncao
      {:}{{{\color{@punct}{:}}}}{1}
24 46564e42 Marcos Assuncao
      {,}{{{\color{@punct}{,}}}}{1}
25 46564e42 Marcos Assuncao
      {\{}{{{\color{@delim}{\{}}}}{1}
26 46564e42 Marcos Assuncao
      {\}}{{{\color{@delim}{\}}}}}{1}
27 46564e42 Marcos Assuncao
      {[}{{{\color{@delim}{[}}}}{1}
28 46564e42 Marcos Assuncao
      {]}{{{\color{@delim}{]}}}}{1},
29 46564e42 Marcos Assuncao
}
30 46564e42 Marcos Assuncao
31 46564e42 Marcos Assuncao
\newcommand{\includeJSON}[1]{\lstinputlisting[language=json,firstnumber=1]{#1}}
32 46564e42 Marcos Assuncao
33 c286aa3e Marcos Assuncao
\acrodef{KWAPI}{KiloWatt API}
34 c286aa3e Marcos Assuncao
\acrodef{PUE}{Power Usage Effectiveness}
35 c286aa3e Marcos Assuncao
\acrodef{IPMI}{Intelligent Platform Management Interface}
36 c286aa3e Marcos Assuncao
\acrodef{PDU}{Power Distribution Unit}
37 c286aa3e Marcos Assuncao
\acrodef{ePDU}{enclosure PDU}
38 4fcab614 Marcos Assuncao
\acrodef{JSON}{JavaScript Object Notation}
39 4fcab614 Marcos Assuncao
\acrodef{RRD}{Round-Robin Database}
40 c286aa3e Marcos Assuncao
41 46564e42 Marcos Assuncao
% correct bad hyphenation here
42 46564e42 Marcos Assuncao
\hyphenation{op-tical net-works semi-conduc-tor}
43 46564e42 Marcos Assuncao
44 46564e42 Marcos Assuncao
\begin{document}
45 46564e42 Marcos Assuncao
46 4fcab614 Marcos Assuncao
\title{A Generic and Extensible Framework for Monitoring Energy Consumption of OpenStack Clouds}
47 46564e42 Marcos Assuncao
48 46564e42 Marcos Assuncao
49 e19818fe Marcos Assuncao
\author{\IEEEauthorblockN{Fran\c{c}ois Rossigneux, Jean-Patrick Gelas, Laurent Lef\`{e}vre, Marcos Dias de Assun\c{c}\~ao}
50 4fcab614 Marcos Assuncao
\IEEEauthorblockA{Inria Avalon, LIP Laboratory\\
51 4fcab614 Marcos Assuncao
Ecole Normale Sup\'{e}rieure de Lyon\\
52 46564e42 Marcos Assuncao
University of Lyon, France}
53 46564e42 Marcos Assuncao
}
54 46564e42 Marcos Assuncao
55 46564e42 Marcos Assuncao
56 46564e42 Marcos Assuncao
\maketitle
57 46564e42 Marcos Assuncao
58 46564e42 Marcos Assuncao
59 46564e42 Marcos Assuncao
\begin{abstract}
60 eb611012 Marcos Assuncao
Although cloud computing has been transformational to the IT industry, it is built on large data centres that often consume massive amounts of electrical power. Efforts have been made to reduce the energy clouds consume, with certain data centres now approaching a \ac{PUE} factor of 1.08. While this is an incredible mark, it also means that the IT infrastructure accounts for a large part of the power consumed by a data centre. Hence, means to monitor and analyse how energy is spent have never been so crucial. Such monitoring is required not only for understanding how power is consumed, but also for assessing the impact of energy management policies. In this article, we draw lessons from experience on monitoring large-scale systems and introduce an energy monitoring software framework called \ac{KWAPI}, able to handle OpenStack clouds. The framework --- whose architecture is scalable, extensible, and completely integrated into OpenStack --- supports several wattmeter devices, multiple measurement formats, and minimises communication overhead.
61 46564e42 Marcos Assuncao
\end{abstract}
62 46564e42 Marcos Assuncao
63 46564e42 Marcos Assuncao
64 46564e42 Marcos Assuncao
\IEEEpeerreviewmaketitle
65 46564e42 Marcos Assuncao
66 46564e42 Marcos Assuncao
67 46564e42 Marcos Assuncao
\section{Introduction}
68 c286aa3e Marcos Assuncao
\acresetall
69 46564e42 Marcos Assuncao
70 2091b022 Laurent Lefevre
Cloud computing \cite{ArmbrustCloud:2009} has become a key building block in providing IT resources and services to organisations of all sizes. Among the claimed benefits of clouds, the most appealing derive from economies of scale and often include a pay-as-you-go business model, resource consolidation, elasticity, good availability, and wide geographical coverage. Despite these advantages when compared to other provisioning models, in order to serve customers with the resources and elasticity they need, clouds generally rely on large data centres that consume massive amounts of electrical power \cite{BaligaInternet:2011}\cite{GreenbergCostCloud:2009}.
71 e542267e Marcos Assuncao
 
72 eb611012 Marcos Assuncao
Although some data centres now approach a \ac{PUE} factor of 1.08\footnote{http://gigaom.com/2012/03/26/whose-data-centers-are-more-efficient-facebooks-or-googles/}, such a mark means that the IT infrastructure is now responsible for a large part of the consumed power. Means to monitor and analyse how energy is spent are crucial to further improvement, but our previous work in this area has demonstrated that monitoring the power consumed by large systems is not always an easy task \cite{OrgerieSaveWatts:2008,AssuncaoIngrid:2010,DaCostaGreenNet:2010}. There are multiple power probes available in the market, generally with their own APIs, physical connections, precision, and communication protocols\cite{eelsd2013}. Moreover, cost related constraints can lead data centre operators to acquire and deploy equipments at multiple stages, or to monitor the power consumption of only part of an infrastructure.
73 e542267e Marcos Assuncao
74 eb611012 Marcos Assuncao
From a cost perspective, monitoring the power consumption of only a small part of deployed equipments is sound, but it prevents one from capturing important nuances of the infrastructure. Previous work has shown that as a computer cluster ages, certain components wear out, while others are replaced, leading to heterogeneous power consumption among nodes that were seemingly homogeneous \cite{DeanGoogleFailures:2008}. The difference between nodes that consume the least power and nodes that consume the most can reach 20\% \cite{MehdiHeterogeneous:2013}, which reinforces the idea that monitoring the consumption of all equipments is required for exploring further improvement in energy efficiency and evaluate the impact of system-wide policies. Monitoring a great number of nodes, however, requires the design of an efficient infrastructure for collecting and processing the power consumption data.
75 e542267e Marcos Assuncao
76 eb611012 Marcos Assuncao
This paper describes the design and architecture of a generic and flexible framework, termed as \ac{KWAPI}, that interfaces with OpenStack to provide it with power consumption information collected from multiple heterogeneous probes. OpenStack is a project that aims to provide ubiquitous open source cloud computing platform and is currently used by many corporations, researchers and global data centres\footnote{http://www.openstack.org/user-stories/}. We describe how \ac{KWAPI} is integrated into Ceilometer; OpenStack's  component conceived to provide a framework to collect a large range of metrics for metering purposes\footnote{https://wiki.openstack.org/wiki/Ceilometer}. With the increasing use of Ceilometer as the \textit{de facto} metering tool for OpenStack, we believe that such an integration of a power monitoring framework into OpenStack can be of great value to the research community and practitioners.
77 46564e42 Marcos Assuncao
78 eb611012 Marcos Assuncao
The remaining part of this paper is organised as follows. Section~\ref{sec:related_work} describes background and related work, whereas Section~\ref{sec:architecture} presents the \ac{KWAPI} architecture. Section~\ref{sec:performance} discusses experimental results on measuring the throughput of \ac{KWAPI}, and Section~\ref{sec:conclusion} concludes the paper.
79 e542267e Marcos Assuncao
80 e542267e Marcos Assuncao
% ----------------------------------------------------------------------------------------
81 e542267e Marcos Assuncao
82 389115e3 Marcos Assuncao
\section{Background and Related Work}
83 e542267e Marcos Assuncao
\label{sec:related_work}
84 e542267e Marcos Assuncao
85 05872168 Marcos Assuncao
This section provides an overview of Ceilometer's architecture and describes related work on monitoring power consumption of large-scale computing infrastructure.
86 389115e3 Marcos Assuncao
87 389115e3 Marcos Assuncao
\subsection{OpenStack Ceilometer}
88 389115e3 Marcos Assuncao
89 36f170ac Marcos Assuncao
Ceilometer --- whose logical architecture\footnote{http://docs.openstack.org/developer/ceilometer/architecture.html} is depicted in Figure~\ref{fig:arch_ceilometer} --- is OpenStack's framework for collecting performance metrics and information on resource consumption. As of writing, it allows for data collection under three methods:
90 389115e3 Marcos Assuncao
91 389115e3 Marcos Assuncao
\begin{itemize}
92 eb611012 Marcos Assuncao
\item \textbf{Bus listener agent}, which picks events on OpenStack's notification bus and turns them into Ceilometer samples (\textit{e.g.} cumulative type, gauge or delta) that can then be stored into the database or provided to an external system via publishing pipeline.
93 68fb6bef Marcos Assuncao
94 4fcab614 Marcos Assuncao
\item \textbf{Push agents}, more intrusive, consist in deploying agents on the monitored nodes to push data remotely to be taken by the collector.
95 68fb6bef Marcos Assuncao
96 4fcab614 Marcos Assuncao
\item \textbf{Polling agents} that poll APIs or other tools to collect information about monitored resources.
97 389115e3 Marcos Assuncao
\end{itemize} 
98 389115e3 Marcos Assuncao
99 389115e3 Marcos Assuncao
\begin{figure}[!htb]
100 389115e3 Marcos Assuncao
\center
101 389115e3 Marcos Assuncao
\includegraphics[width=1.\columnwidth]{figs/ceilometer_logical_architecture.pdf}
102 389115e3 Marcos Assuncao
\caption{Overview of Ceilometer's logical architecture.}
103 389115e3 Marcos Assuncao
\label{fig:arch_ceilometer}
104 389115e3 Marcos Assuncao
\end{figure}
105 389115e3 Marcos Assuncao
106 eb611012 Marcos Assuncao
The last two methods depend on a combination of central agent, computer agents and collector. The compute agents run on nodes and retrieve information about resource usage related to a given virtual machine instance and a resource owner. The central agent, on the other hand, executes \textit{pollsters} on the management server to retrieve data that is not linked to a particular instance. Pollsters are software components executed, for example, to poll resources by using an API or other methods. The Ceilometer database, which can be queried via Ceilometer API, allows an external system to view the history of a resource's metrics, and enables the system to set and receive alarms.
107 389115e3 Marcos Assuncao
108 4fcab614 Marcos Assuncao
The \textit{hmac} module of Python's library can be used for signing metering messages, and a shared secret value can be provided in the configuration settings. The collector and systems accessing the API use signatures included in the messages for verification.
109 389115e3 Marcos Assuncao
 
110 389115e3 Marcos Assuncao
111 eb611012 Marcos Assuncao
\subsection{Energy Monitoring and Efficiency in Clouds}
112 389115e3 Marcos Assuncao
113 e542267e Marcos Assuncao
Over the past years, several techniques have been provided to minimise the energy consumed by computing infrastructure. At the hardware level, for instance, processors are able to operate at multiple frequency and voltage levels, and the operating systems or resource managers can choose the level that matches the current workload \cite{LaszewskiDVFS:2009}. At the resource management level, several approaches are proposed, including resource consolidation \cite{BeloglazovOpenStack:2014} and rescheduling requests \cite{OrgerieSaveWatts:2008}, generally with the goal of switching off unused resources or setting them to low power consumption modes. Attempts have also been made to assess the power consumed by individual applications \cite{NoureddineThesis:2014}.
114 e542267e Marcos Assuncao
115 eb611012 Marcos Assuncao
A means to monitor the energy consumption is key to assess potential gains of techniques to improve software and cloud resource management systems. Cloud monitoring is not a new topic \cite{AcetoMonitoring:2013} as tools to monitor computing infrastructure \cite{BrinkmannMonitoring:2013,VarretteICPP:2014} as well as ways to address some of the usual issues of management systems have been introduced \cite{WardMonitoring:2013,TanMonitoring:2013}. Moreover, several systems for measuring the power consumed by compute clusters have been described in the literature \cite{AssuncaoIngrid:2010}. As traditional system and network monitoring techniques lack the capability to interface with wattmeters, most approaches for measuring energy consumption have been tailored to the needs of projects for which they were conceived.
116 e542267e Marcos Assuncao
117 10f7c1c4 Marcos Assuncao
In our work, we draw lessons from previous approaches to monitor and analyse energy consumption of large-scale distributed systems \cite{OrgerieSaveWatts:2008,DaCostaGreenNet:2010,AssuncaoIngrid:2010,MehdiHeterogeneous:2013,CGC2012}. We opt for creating a framework and integrating it with a successful cloud platform (\textit{i.e.} OpenStack), which we believe is of value to the research community and practitioners working on the topic. To the best of our knowledge, this is the first generic energy monitoring framework to be integrated with OpenStack. 
118 e542267e Marcos Assuncao
119 e542267e Marcos Assuncao
% ----------------------------------------------------------------------------------------
120 e542267e Marcos Assuncao
121 c286aa3e Marcos Assuncao
\section{The \ac{KWAPI} Architecture}
122 e542267e Marcos Assuncao
\label{sec:architecture}
123 e542267e Marcos Assuncao
124 eb611012 Marcos Assuncao
An overview of the \ac{KWAPI} architecture is presented in Figure~\ref{fig:architecture}. The architecture follows a publish/subscribe model based on a set of layers comprising:
125 46564e42 Marcos Assuncao
126 3e01a9b3 Marcos Assuncao
\begin{itemize} 
127 3e01a9b3 Marcos Assuncao
\item \textbf{Drivers}, considered data producers responsible for measuring the power consumption of monitored resources and providing the collected data to consumers via a communication bus; and 
128 3e01a9b3 Marcos Assuncao
\item \textbf{Data Consumers} --- or \textbf{Consumers} for short --- that subscribe to receive and process the measurement information. 
129 3e01a9b3 Marcos Assuncao
\end{itemize}
130 3e01a9b3 Marcos Assuncao
131 3e01a9b3 Marcos Assuncao
\begin{figure*}[!htb]
132 3e01a9b3 Marcos Assuncao
\center
133 3e01a9b3 Marcos Assuncao
\includegraphics[width=0.7\linewidth]{figs/architecture.pdf}
134 3e01a9b3 Marcos Assuncao
\caption{Overview of \ac{KWAPI}'s architecture.}
135 3e01a9b3 Marcos Assuncao
\label{fig:architecture}
136 3e01a9b3 Marcos Assuncao
\end{figure*}
137 3e01a9b3 Marcos Assuncao
138 10f7c1c4 Marcos Assuncao
The communication between layers is handled by a bus, as explained in detail later. Data consumers can subscribe to receive information collected by drivers from multiple sites. Both drivers and consumers are easily extensible to support, respectively, several types of wattmeters and provide additional data processing services. A REST API is designed as a data consumer to provide a programming interface for developers and system administrators. In this work it is used to interface with OpenStack by providing the information (\textit{i.e.} by polling monitored devices) required by a \textit{\ac{KWAPI} Pollster} that feeds Ceilometer.
139 3e01a9b3 Marcos Assuncao
140 eb611012 Marcos Assuncao
The following sections provide more details on the main architecture components and their relationship with OpenStack Ceilometer.
141 3e01a9b3 Marcos Assuncao
142 3e01a9b3 Marcos Assuncao
\subsection{Driver Layer}
143 3e01a9b3 Marcos Assuncao
144 3e01a9b3 Marcos Assuncao
Drivers are threads initialised by a Driver Manager with a set of parameters loaded from a file compliant with the OpenStack configuration format. These parameters are used to query the meters (\textit{e.g.} IP address and port) and determine the sensor ID to be used in the collected metrics. The measurements that a driver obtains are represented as \ac{JSON} dictionaries that maintain a small footprint and that can be easily parsed. The size of dictionaries varies depending on the number of fields set by drivers (\textit{i.e.} whether message signing is enabled). 
145 3e01a9b3 Marcos Assuncao
146 3e01a9b3 Marcos Assuncao
Figure~\ref{fig:json} shows a simple example of a \ac{JSON} payload containing one measurement. Optional fields such as voltage and current can be included. ACK messages have a fixed size of 66 bytes when using TCP connection; drivers and data consumers communicate via IPC sockets when running on the same machine.
147 e542267e Marcos Assuncao
148 3e01a9b3 Marcos Assuncao
\begin{figure}
149 3e01a9b3 Marcos Assuncao
\includeJSON{figs/measurement.json}
150 3e01a9b3 Marcos Assuncao
\caption{Example of \ac{JSON} payload.}
151 3e01a9b3 Marcos Assuncao
\label{fig:json}
152 3e01a9b3 Marcos Assuncao
\end{figure}
153 3e01a9b3 Marcos Assuncao
154 3e01a9b3 Marcos Assuncao
Drivers can manage incidents themselves, but the manager also checks periodically if all threads are active, restarting them if necessary. It is important to avoid losing measurements because the reported information is in W instead of kWh. The loss of a measurement may be significant. 
155 3e01a9b3 Marcos Assuncao
156 3e01a9b3 Marcos Assuncao
Wattmeters available in the market vary in terms of physical interconnection, communication protocols, packaging and precision of measurements they take. They are mostly packaged in multiple outlet power strips called \acp{PDU} or \acp{ePDU}, and more recently in the \ac{IPMI} cards embedded in the computers themselves. Support for several types of wattmeter has been implemented, which drivers can use to interface with a wide range of equipments. In our work, we used \ac{IPMI} initially at Nova to shutdown and turn on compute nodes, but nowadays we also use it to query a computer chassis remotely.
157 4fcab614 Marcos Assuncao
158 10f7c1c4 Marcos Assuncao
Although Ethernet is generally used to transport \ac{IPMI} or SNMP packets over IP, USB and RS-232 serial links are also common. Wattmeters that use Ethernet are generally connected to an administration network (isolated from the data centre main data network). Moreover, wattmeters may differ in the manner they operate; some equipments send measurements to a management node on a regular basis (push mode), whereas others respond to queries (pull mode). Other characteristics that differ across wattmeters include: 
159 e542267e Marcos Assuncao
160 46564e42 Marcos Assuncao
\begin{itemize}
161 05872168 Marcos Assuncao
\item refresh rate (\textit{i.e.} maximum number of measurements per second);
162 46564e42 Marcos Assuncao
\item measurement precision; and 
163 05872168 Marcos Assuncao
\item methodology applied to each measurement (\textit{e.g.} mean of several measurements, instantaneous values, and exponential moving averages).
164 46564e42 Marcos Assuncao
\end{itemize}
165 e542267e Marcos Assuncao
166 eb611012 Marcos Assuncao
Table \ref{tab:wattmeters} shows the characteristics of equipments we deployed and used with Kwapi in our cloud infrastructure.
167 36f170ac Marcos Assuncao
168 e542267e Marcos Assuncao
\begin{table}
169 e542267e Marcos Assuncao
\centering
170 eb611012 Marcos Assuncao
\caption{Wattmeter infrastructure}
171 e542267e Marcos Assuncao
\label{tab:wattmeters}
172 e542267e Marcos Assuncao
\begin{footnotesize}
173 e542267e Marcos Assuncao
\begin{tabular}{llcc}
174 e542267e Marcos Assuncao
\toprule
175 e542267e Marcos Assuncao
\multirow{2}{18mm}{\textbf{Device Name}} & \multirow{2}{30mm}{\textbf{Interface}} & \multirow{2}{12mm}{\centering{\textbf{Refresh Time (s)}}} & \multirow{2}{10mm}{\centering{\textbf{Precision (W)}}}  \\
176 e542267e Marcos Assuncao
& & & \\
177 e542267e Marcos Assuncao
\toprule
178 610b40cd Laurent Lefevre
Dell iDrac6    & IPMI / Ethernet           & 5    & 7 \\
179 e542267e Marcos Assuncao
\midrule
180 610b40cd Laurent Lefevre
Eaton          & Serial, SNMP via Ethernet & 5    & 1 \\
181 e542267e Marcos Assuncao
\midrule
182 e542267e Marcos Assuncao
OmegaWatt      & IrDA Serial               & 1    & 0.125 \\
183 e542267e Marcos Assuncao
\midrule
184 610b40cd Laurent Lefevre
Schleifenbauer & SNMP via Ethernet         & 3    & 0.1 \\
185 e542267e Marcos Assuncao
\midrule
186 e542267e Marcos Assuncao
Watts Up?      & Proprietary via USB       & 1    & 0.1 \\
187 e542267e Marcos Assuncao
\midrule
188 e542267e Marcos Assuncao
ZEZ LMG450     & Serial                    & 0.05 & 0.01 \\
189 e542267e Marcos Assuncao
\bottomrule
190 e542267e Marcos Assuncao
\end{tabular}
191 e542267e Marcos Assuncao
\end{footnotesize}
192 e542267e Marcos Assuncao
\end{table}
193 e542267e Marcos Assuncao
194 e542267e Marcos Assuncao
195 3e01a9b3 Marcos Assuncao
\subsection{Data Consumers}
196 e542267e Marcos Assuncao
197 3e01a9b3 Marcos Assuncao
A data consumer retrieves and processes measurements taken by drivers and provided via bus. Consumers expose the information to other services including Ceilometer and visualisation tools. By using a system of prefixes, consumers can subscribe to all producers or a subset of them. When receiving a message, a consumer verifies the signature, extracts the content and processes the data. By default \ac{KWAPI} provides two data consumers, namely the REST API (used to interface with Ceilometer) and a visualisation consumer.
198 46564e42 Marcos Assuncao
199 3e01a9b3 Marcos Assuncao
\subsubsection{REST API}
200 e542267e Marcos Assuncao
201 3e01a9b3 Marcos Assuncao
The API consumer computes the number of kWh of each driver probe, adds a timestamp, and stores the last value in watts. If a driver has not provided measurements for a long time, the corresponding data is removed. The REST API allows an external system to retrieve the name of probes, measurements in W or kWh, and timestamps. The API is secured by OpenStack Keystone tokens\footnote{http://keystone.openstack.org}, whereby the consumer needs to ensure the validity of a token before sending a response to the system. 
202 e542267e Marcos Assuncao
203 3e01a9b3 Marcos Assuncao
\subsubsection{Visualisation}
204 46564e42 Marcos Assuncao
205 3e01a9b3 Marcos Assuncao
The visualisation consumer builds \ac{RRD} files from received measurements, and generates graphs that show the energy consumption over a given period, with additional information such as average electricity consumption, minimum and maximum watt values, last value, total energy and cost in Euros. \ac{RRD} files are of fixed size and store several collections of metrics with different granularities. A web interface displays the generated graphics and a cache mechanism triggers the creation of graphs during queries only if they are out of date. These visualisation resources offer quick feedback to administrators and users during execution of tasks and applications. Figure~\ref{fig:graph_example} shows an example of generated graph. 
206 e542267e Marcos Assuncao
207 389115e3 Marcos Assuncao
\begin{figure*}[!htb]
208 389115e3 Marcos Assuncao
\center
209 68fb6bef Marcos Assuncao
\includegraphics[width=0.9\linewidth]{figs/graph_example.jpg}
210 389115e3 Marcos Assuncao
\caption{Example of graph generated by the visualisation plug-in (4 monitored servers).}
211 389115e3 Marcos Assuncao
\label{fig:graph_example}
212 389115e3 Marcos Assuncao
\end{figure*}
213 389115e3 Marcos Assuncao
214 3e01a9b3 Marcos Assuncao
\subsection{Internal Communication Bus}
215 46564e42 Marcos Assuncao
216 10f7c1c4 Marcos Assuncao
\ac{KWAPI} uses ZeroMQ \cite{HintjensZeroMQ:2013}, a fast broker-less messaging framework written in C++, where transmitters play the role of buffers. ZeroMQ supports a wide range of bus modes, including cross-thread communication, IPC, and TCP. Switching from one mode to another is straightforward. ZeroMQ also provides several design patterns such as publish/subscribe and request/response. As mentioned earlier, in our publish/subscribe architecture drivers are publishers, and data consumers are subscribers. If no data consumer is subscribed to receive data from a given driver, the latter will not send any information through the network.
217 3e01a9b3 Marcos Assuncao
218 eb611012 Marcos Assuncao
Moreover, one or more optional forwarders can be installed between drivers and data consumers to minimise network usage. Forwarders are designed to act as especial data consumers who subscribe to receive information from a driver and multicast it to all normal data consumers subscribed to receive the information. Forwarders enable the design of complex topologies and optimisation of network usage when handling data from multiple sites. They can also be used to bypass network isolation problems and perform load balancing.
219 3e01a9b3 Marcos Assuncao
220 3e01a9b3 Marcos Assuncao
\subsection{Interface with Ceilometer}
221 3e01a9b3 Marcos Assuncao
222 10f7c1c4 Marcos Assuncao
We opted for integrating KWAPI with an existing open source cloud platform to ease deployment and use. Leveraging the capabilities offered by OpenStack can help in the adoption of a monitoring system and reduce its learning curve.
223 3e01a9b3 Marcos Assuncao
224 3e01a9b3 Marcos Assuncao
Ceilometer's central agent and a dedicated pollster (\textit{i.e.} \ac{KWAPI} Pollster) are used to publish and store energy metrics into Ceilometer's database. They query the REST API data consumer and publish cumulative (kWh) and gauge (W) counters that are not associated with a particular tenant, since a server can host multiple clients simultaneously. 
225 e542267e Marcos Assuncao
226 10f7c1c4 Marcos Assuncao
Depending on the number of monitored devices and the frequency at which measurements are taken, wattmeters can generate a large amount of data, thus demanding storage capacity for further processing and analysis. Management systems often store and perform pre-processing locally on monitored nodes, but such an approach can impact on CPU utilisation and influence the power consumption. In addition, resource managers may switch off idle nodes or set them to stand by mode to save energy, which make them unavailable for processing. Centralised storage, on the other hand, allows for faster data access and processing, but can generate more traffic given that measurements need to be continuously transferred over the network to a central point.  
227 610b40cd Laurent Lefevre
228 10f7c1c4 Marcos Assuncao
Ceilometer uses its own central database, which is leveraged here to store the energy consumption metrics. In this way, systems that interface with OpenStack's Ceilometer, including Nova, can easily retrieve the data. It is important to notice that, even though Ceilometer provides the notion of a central repository for metrics, it also uses a database abstraction that enables the use of distributed systems such as Apache Hadoop HDFS\footnote{http://hadoop.apache.org/}, Apache Cassandra\footnote{http://cassandra.apache.org/}, and MongoDB\footnote{http://www.mongodb.org/}. 
229 e542267e Marcos Assuncao
230 10f7c1c4 Marcos Assuncao
The granularity at which measurements are taken and metrics are computed is another important factor because user needs vary depending on what they wish to evaluate. Taking one or more measurements  per second is not common under certain scenarios, which can be a challenge in an infrastructure comprising hundreds or thousands of nodes, demanding efficient and scalable mechanisms for transferring information on power consumption. Hence, in the next section we evaluate the throughput of KWAPI under a few scenarios.
231 e542267e Marcos Assuncao
232 e542267e Marcos Assuncao
233 46564e42 Marcos Assuncao
\section{Performance Evaluation}
234 46564e42 Marcos Assuncao
\label{sec:performance}
235 9d39d328 François Rossigneux
236 10f7c1c4 Marcos Assuncao
This section provides results of a performance evaluation carried out in our testbed. The goal is not to compare publish/subscribe systems since such work has already been performed elsewhere \cite{EugsterSurvey:2003,FabretPS:2001}. The evaluation demonstrates that the framework serves well the needs of a large range of users of the Grid'5000 platform \cite{Grid5000} --- the infrastructure we use and where the \ac{KWAPI} framework is currently deployed in production mode as the means for collecting and providing energy consumption information to users. 
237 9d39d328 François Rossigneux
238 10f7c1c4 Marcos Assuncao
First we want to evaluate the CPU and network usage of a typical driver to observe the framework's throughput, since provisioning a large number of resources for monitoring purposes is not desirable. For this experiment we deployed the \ac{KWAPI} drivers and API on a machine with a Core 2 Duo P8770 2.53Ghz processor and 4GB of RAM. We considered:
239 4fcab614 Marcos Assuncao
240 4fcab614 Marcos Assuncao
\begin{itemize}
241 4fcab614 Marcos Assuncao
 \item a scenario where we emulated 1,000 \ac{IPMI} cards, each card monitored by a driver thread placing a measurement per second on the communication bus.
242 10f7c1c4 Marcos Assuncao
 \item a case with 100 ten-outlet \acp{PDU}, each monitored by a driver thread placing ten values per second on the communication bus. 
243 4fcab614 Marcos Assuncao
\end{itemize}
244 4fcab614 Marcos Assuncao
245 10f7c1c4 Marcos Assuncao
Under both scenarios, 1,000 measurements per second were placed on the bus, even though monitoring was done using different types of probes. We have evaluated these scenarios considering both with and without message signature. Table~\ref{tab:parameters_usage} summarises the considered scenarios.
246 4fcab614 Marcos Assuncao
247 9d39d328 François Rossigneux
\begin{table}
248 9d39d328 François Rossigneux
\centering
249 389115e3 Marcos Assuncao
\caption{Scenarios considered in the experiments.}
250 46564e42 Marcos Assuncao
\label{tab:parameters_usage}
251 389115e3 Marcos Assuncao
\begin{tabular}{lcc}
252 46564e42 Marcos Assuncao
\toprule
253 389115e3 Marcos Assuncao
\textbf{Scenario name} & \textbf{Agent thread scheme} & \textbf{Message signature}  \\
254 46564e42 Marcos Assuncao
\toprule
255 389115e3 Marcos Assuncao
IPMI message signed     & 1 thread per card & Enabled\\
256 389115e3 Marcos Assuncao
\midrule
257 389115e3 Marcos Assuncao
IPMI message unsigned   & 1 thread per card & Disabled\\
258 46564e42 Marcos Assuncao
\midrule
259 389115e3 Marcos Assuncao
PDU message signed     & 1 thread per PDU & Enabled\\
260 46564e42 Marcos Assuncao
\midrule
261 389115e3 Marcos Assuncao
PDU message unsigned   & 1 thread per PDU & Disabled\\
262 46564e42 Marcos Assuncao
\bottomrule
263 9d39d328 François Rossigneux
\end{tabular}
264 9d39d328 François Rossigneux
\end{table}
265 9d39d328 François Rossigneux
266 10f7c1c4 Marcos Assuncao
Figure~\ref{fig:cpu_usage} shows the results of CPU usage of drivers under the evaluated scenarios. The socket type and number of driver threads do not have a distinguishable impact on the CPU usage. On the test machine, the \ac{KWAPI} drivers with message signature disabled (\textit{i.e.} \ac{IPMI} cards unsigned and \acp{PDU} unsigned) consumed on average 20\% of the total CPU power. 
267 4fcab614 Marcos Assuncao
268 4fcab614 Marcos Assuncao
% The \ac{KWAPI} API consumed around 10\% with message signature disabled and 16\% when making one request per second querying the last measurements of all probes. Message signature overall increases CPU usage by 30\% (see \ac{IPMI} cards signed and \acp{PDU} signed scenarios).
269 9d39d328 François Rossigneux
270 46564e42 Marcos Assuncao
\begin{figure}[!ht]
271 46564e42 Marcos Assuncao
\center
272 4fcab614 Marcos Assuncao
\includegraphics[width=1.\columnwidth]{figs/cpu_usage_driver.pdf}
273 4fcab614 Marcos Assuncao
\caption{Driver CPU usage under the evaluated scenarios.}
274 46564e42 Marcos Assuncao
\label{fig:cpu_usage}
275 cb81debf Marcos Assuncao
\end{figure}
276 cb81debf Marcos Assuncao
277 10f7c1c4 Marcos Assuncao
We also evaluated the CPU consumption of the REST API data consumer under the scenarios described in Table \ref{tab:parameters_usage}. In addition to these scenarios, two conditions were assessed, namely (i) the REST API working as a consumer requesting data from drivers at a one-second time interval (REST API only); and (ii) the API requesting data at one-second interval and also answering a call every second to provide the collected data to an external system (REST API + 1 req/s). Figure \ref{fig:cpu_usage_consumer} summarises the obtained results. The CPU consumption is in general low. Even when message signing is enabled and the API serves a query, its consumption is below 20\%. The small variation between the scenarios without message signing is caused by the manner ZeroMQ accumulates data on nodes prior to transmission. 
278 cb81debf Marcos Assuncao
279 cb81debf Marcos Assuncao
\begin{figure}[!ht]
280 cb81debf Marcos Assuncao
\center
281 cb81debf Marcos Assuncao
\includegraphics[width=1.\columnwidth]{figs/cpu_usage_api.pdf}
282 cb81debf Marcos Assuncao
\caption{API consumer CPU usage under the evaluated scenarios.}
283 cb81debf Marcos Assuncao
\label{fig:cpu_usage_consumer}
284 46564e42 Marcos Assuncao
\end{figure}
285 46564e42 Marcos Assuncao
286 eb611012 Marcos Assuncao
Although the CPU usage often depends on the drivers, data consumers, and their complexity, and whether message signature is enabled, the experiments show that a large number of probes can be managed by a single machine. In our environment, a management machine per site is more than enough to accommodate the users' monitoring needs. The drivers and API can reuse a machine that already serves other monitoring purposes.
287 46564e42 Marcos Assuncao
288 634d8a74 Marcos Assuncao
\begin{figure}[!ht]
289 634d8a74 Marcos Assuncao
\center
290 634d8a74 Marcos Assuncao
\includegraphics[width=1.0\columnwidth]{figs/measurement_intervals.pdf}
291 e19818fe Marcos Assuncao
\caption{Number of observations received over a 60 second interval under different multiple intervals.}
292 634d8a74 Marcos Assuncao
\label{fig:measurement_intervals}
293 634d8a74 Marcos Assuncao
\end{figure}
294 634d8a74 Marcos Assuncao
295 c286aa3e Marcos Assuncao
%\begin{figure}[!ht]
296 c286aa3e Marcos Assuncao
%\center
297 c286aa3e Marcos Assuncao
%\includegraphics[width=1.0\columnwidth]{figs/packet_size.pdf}
298 c286aa3e Marcos Assuncao
%\caption{Packet sizes under the evaluated scenarios.}
299 c286aa3e Marcos Assuncao
%\label{fig:packet_size}
300 c286aa3e Marcos Assuncao
%\end{figure}
301 9d39d328 François Rossigneux
302 c286aa3e Marcos Assuncao
%While measuring the network usage, our experiments showed a transfer rate of around 230KB/s with message signing enabled and around 135KBs/s otherwise. Message signing overall introduces an overhead of 70\%. Sending large packets can be explored to decrease the packet overhead. If several drivers send measurements simultaneously, ZeroMQ provides an optimisation mechanism that aggregates the data into a single TCP datagram. Figure~\ref{fig:packet_size} shows the number of packets under the evaluated scenarios. We noticed that certain packets contain up to forty measurements.
303 9d39d328 François Rossigneux
304 10f7c1c4 Marcos Assuncao
Although a measurement interval of one second meets the requirements of users in our platform, we wanted to evaluate the impact of using a communication bus in the transfer of observations between drivers and the REST API consumer. In a second experiment we used two machines. On the first machine we instantiated 1,000 driver threads placing random observations on the communication bus. On the second machine we measured the number of measurements that the API is able to receive over a minute. We varied the time between measurements from 0.2 to 1.0 seconds. Figure~\ref{fig:measurement_intervals} summarises the obtained results. Though the number of observations generated in this experiment is much higher than what we currently need to handle in our platform, we observe that the framework is able to transfer measurements from drivers to API under a 0.4 second interval without adding much jitter. Under smaller measurement intervals, however, observations start to accumulate and are transferred at large chunks. We believe that under small measurement intervals, and consequently a very large number of observations per second, an architecture based on stream processing systems that guarantees data processing might be more appropriate. Hence, although the framework suits the purposes of large range of users, if measurements are to be taken at very small time intervals, a stream processing architecture would probably yield better performance by enabling the placement of elements to pre-process data closer to where it is generated.
305 634d8a74 Marcos Assuncao
306 eb611012 Marcos Assuncao
% As mentioned earlier, plug-ins can subscribe and select probes from which they want to receive information. If multiple plug-ins select a node, information from the node is sent only once through the network. The architecture also allows for a hierarchy of plug-ins to be established, where a plug-in can be deployed on a site to summarise or compute average values that are placed on the bus to be consumed by higher level plug-ins. 
307 9d39d328 François Rossigneux
308 e542267e Marcos Assuncao
% ----------------------------------------------------------------------------------------
309 46564e42 Marcos Assuncao
310 46564e42 Marcos Assuncao
\section{Conclusion}
311 e542267e Marcos Assuncao
\label{sec:conclusion}
312 46564e42 Marcos Assuncao
313 10f7c1c4 Marcos Assuncao
In this paper, we described a novel framework (\ac{KWAPI}) for monitoring the power consumed by resources of an Openstack cloud. Based on lessons learned by monitoring the power consumption of large distributed infrastructure, we proposed an energy monitoring architecture based on a publish/subscribe model. The framework works in tandem with OpenStack's Ceilometer. Experimental results demonstrate that the overhead posed by the monitoring framework is small, allowing us to serve the users' monitoring needs of our large scale infrastructure. 
314 46564e42 Marcos Assuncao
315 10f7c1c4 Marcos Assuncao
As future work, we intend to explore means to increase the monitoring granularity and the number of measured devices by applying a hierarchy of plug-ins, and a stream processing system with guarantees on data processing \cite{Storm,S4} for processing streams of measurement tuples. 
316 46564e42 Marcos Assuncao
317 e542267e Marcos Assuncao
% ----------------------------------------------------------------------------------------
318 46564e42 Marcos Assuncao
319 4fcab614 Marcos Assuncao
\section*{Acknowledgments}
320 46564e42 Marcos Assuncao
321 1371b7bf Laurent Lefevre
This research is supported by the French Fonds national pour la Soci\'{e}t\'{e} Num\'{e}rique (FSN) XLCloud project. Some experiments presented in this paper were carried out using the Grid'5000 experimental testbed, being developed under the Inria ALADDIN development action with support from CNRS, RENATER and several Universities as well as other funding bodies (see https://grid5000.fr). Authors wish to thank Julien Danjou for his help during the integration of \ac{KWAPI} with OpenStack and Ceilometer.
322 46564e42 Marcos Assuncao
323 e542267e Marcos Assuncao
\bibliographystyle{IEEEtran}
324 e19818fe Marcos Assuncao
\balance
325 46564e42 Marcos Assuncao
\bibliography{references}
326 46564e42 Marcos Assuncao
327 46564e42 Marcos Assuncao
\end{document}