xlcloud / papers / 2014 / kwapi / cloudam2014.tex @ 68fb6bef
Historique | Voir | Annoter | Télécharger (28,19 ko)
1 | e542267e | Marcos Assuncao | |
---|---|---|---|
2 | 46564e42 | Marcos Assuncao | \documentclass[conference]{IEEEtran} |
3 | 46564e42 | Marcos Assuncao | % Add the compsoc option for Computer Society conferences. |
4 | 46564e42 | Marcos Assuncao | |
5 | 46564e42 | Marcos Assuncao | \usepackage{ctable} |
6 | 46564e42 | Marcos Assuncao | \usepackage{cite} |
7 | e542267e | Marcos Assuncao | \usepackage[cmex10]{amsmath} |
8 | e542267e | Marcos Assuncao | % \usepackage{acronym} |
9 | e542267e | Marcos Assuncao | \usepackage{graphicx} |
10 | e542267e | Marcos Assuncao | \usepackage{multirow} |
11 | 46564e42 | Marcos Assuncao | \usepackage{listings} |
12 | 46564e42 | Marcos Assuncao | \usepackage{color} |
13 | 46564e42 | Marcos Assuncao | \usepackage{xcolor} |
14 | 46564e42 | Marcos Assuncao | \usepackage{balance} |
15 | 46564e42 | Marcos Assuncao | |
16 | 46564e42 | Marcos Assuncao | \colorlet{@punct}{red!60!black} |
17 | 46564e42 | Marcos Assuncao | \definecolor{@delim}{RGB}{20,105,176} |
18 | 46564e42 | Marcos Assuncao | |
19 | 46564e42 | Marcos Assuncao | \lstdefinelanguage{json}{ |
20 | 46564e42 | Marcos Assuncao | basicstyle=\footnotesize\ttfamily, |
21 | 46564e42 | Marcos Assuncao | literate= |
22 | 46564e42 | Marcos Assuncao | *{\ }{{{\ }}}{1} |
23 | 46564e42 | Marcos Assuncao | {:}{{{\color{@punct}{:}}}}{1} |
24 | 46564e42 | Marcos Assuncao | {,}{{{\color{@punct}{,}}}}{1} |
25 | 46564e42 | Marcos Assuncao | {\{}{{{\color{@delim}{\{}}}}{1} |
26 | 46564e42 | Marcos Assuncao | {\}}{{{\color{@delim}{\}}}}}{1} |
27 | 46564e42 | Marcos Assuncao | {[}{{{\color{@delim}{[}}}}{1} |
28 | 46564e42 | Marcos Assuncao | {]}{{{\color{@delim}{]}}}}{1}, |
29 | 46564e42 | Marcos Assuncao | } |
30 | 46564e42 | Marcos Assuncao | |
31 | 46564e42 | Marcos Assuncao | \newcommand{\includeJSON}[1]{\lstinputlisting[language=json,firstnumber=1]{#1}} |
32 | 46564e42 | Marcos Assuncao | |
33 | 46564e42 | Marcos Assuncao | % correct bad hyphenation here |
34 | 46564e42 | Marcos Assuncao | \hyphenation{op-tical net-works semi-conduc-tor} |
35 | 46564e42 | Marcos Assuncao | |
36 | 46564e42 | Marcos Assuncao | \begin{document} |
37 | 46564e42 | Marcos Assuncao | |
38 | 46564e42 | Marcos Assuncao | \title{A Generic and Extensible Framework for Monitoring Energy Consumption in OpenStack Clouds} |
39 | 46564e42 | Marcos Assuncao | |
40 | 46564e42 | Marcos Assuncao | |
41 | 46564e42 | Marcos Assuncao | \author{\IEEEauthorblockN{Francois Rossigneux, Jean-Patrick Gelas, Laurent Lef\`{e}vre, Marcos D. Assun\c{c}\~ao} |
42 | 46564e42 | Marcos Assuncao | \IEEEauthorblockA{Inria Avalon team, LIP Laboratory\\ |
43 | 46564e42 | Marcos Assuncao | Ecole Normale Superieure of Lyon\\ |
44 | 46564e42 | Marcos Assuncao | University of Lyon, France} |
45 | 46564e42 | Marcos Assuncao | } |
46 | 46564e42 | Marcos Assuncao | |
47 | 46564e42 | Marcos Assuncao | |
48 | 46564e42 | Marcos Assuncao | \maketitle |
49 | 46564e42 | Marcos Assuncao | |
50 | 46564e42 | Marcos Assuncao | |
51 | 46564e42 | Marcos Assuncao | \begin{abstract} |
52 | 389115e3 | Marcos Assuncao | Although cloud computing has been transformational in the IT industry, it often relies on large data centres that consume massive amounts of electrical power. Efforts have been made to reduce the power consumed by Clouds, with certain data centres now approaching a PUE factor of 1.08. That means that the IT infrastructure is now responsible for a large amount of the power a data centre consumes, and hence means to monitor and analyse how energy is spent have never been so crucial. Such monitoring is required for a better understanding of how power is consumed by the IT infrastructure and for assessing the impact of energy management policies. In this article, we draw some lessons from previous experience in monitoring large-scale systems and introduce an energy monitoring software framework called Kwapi. The framework supports several wattmeter devices, multiple measurement formats, and reduces communication overhead. Kwapi architecture is scalable and extensible and completly integrated into OpenStack. |
53 | 46564e42 | Marcos Assuncao | |
54 | 46564e42 | Marcos Assuncao | \end{abstract} |
55 | 46564e42 | Marcos Assuncao | |
56 | 46564e42 | Marcos Assuncao | |
57 | 46564e42 | Marcos Assuncao | \IEEEpeerreviewmaketitle |
58 | 46564e42 | Marcos Assuncao | |
59 | 46564e42 | Marcos Assuncao | |
60 | 46564e42 | Marcos Assuncao | \section{Introduction} |
61 | 46564e42 | Marcos Assuncao | % no \IEEEPARstart |
62 | 46564e42 | Marcos Assuncao | |
63 | 46564e42 | Marcos Assuncao | Cloud computing \cite{ArmbrustCloud:2009} has become a key building block in providing IT resources and services to organisations of all sizes. Amongst its claimed benefits, the most appealing derive from economies of scale and often include a pay-as-you-go business model, resource consolidation, elasticity, good availability, and wide geographical coverage. Despite the advantages when compared to other provisioning models, to serve customers with the resources they need Clouds often rely on large data centres that consume massive amounts of electrical power \cite{BaligaInternet:2011}. |
64 | e542267e | Marcos Assuncao | |
65 | 610b40cd | Laurent Lefevre | Numerous efforts have been made to curb the energy consumed by Clouds, with some data centres now approaching a Power Usage Effectiveness (PUE) factor of 1.08\footnote{http://gigaom.com/2012/03/26/whose-data-centers-are-more-efficient-facebooks-or-googles/}. As a result, the IT infrastructure is now responsible for a large share of the power consumed by current data centres, and hence means to monitor and analyse how energy is spent have never been so crucial. Our experience in this area, however, has demonstrated that monitoring the power consumed by large systems is not always an easy task \cite{OrgerieSaveWatts:2008,AssuncaoIngrid:2010,DaCostaGreenNet:2010}. There are multiple power probes available in the market, generally with their own APIs, physical connections, precision, and communication protocols\cite{eelsd2013}. Moreover, cost related constraints can lead to decisions to acquire and deploy equipments at multiple stages or to monitor the power consumption of only part of the infrastructure. |
66 | e542267e | Marcos Assuncao | |
67 | 46564e42 | Marcos Assuncao | Although from a cost perspective, monitoring the power consumption of only part of deployed equipments is sound, it prevents one from capturing certain nuances of the infrastructure. Previous work has shown that as a computer cluster ages, certain components wear out, while others are replaced, leading to heterogeneous power consumption among nodes that were seemingly homogeneous. The difference between nodes that consume the least power and nodes that consume the most can reach 20\% \cite{MehdiHeterogeneous:2013}, which reinforces the idea that monitoring the consumption of the whole set of IT equipments can allow for further improvements in energy efficiency. Monitoring a great number of nodes, however, require the design of an efficient infrastructure for collecting and processing the power consumption data. |
68 | e542267e | Marcos Assuncao | |
69 | 68fb6bef | Marcos Assuncao | This paper describes the design and architecture of a generic and flexible framework, termed as Kilowatt API (Kwapi), that interfaces with OpenStack to provide it with power consumption information collected from multiple heterogeneous probes. OpenStack is a project that aims to provide ubiquitous open source cloud computing platform and is currently used by many corporations, researchers and global data centres\footnote{http://www.openstack.org/user-stories/}. Ceilometer is an OpenStack component conceived provide a framework to collect a large range of metrics for metering purposes\footnote{https://wiki.openstack.org/wiki/Ceilometer}. In this work we describe how Kwapi has been integrated into Ceilometer. With the increasing use of Ceilometer as the de facto metering tool for OpenStack, we believe that such an integration of a power monitoring framework into OpenStack can be of great value to the research community and practitioners. |
70 | 46564e42 | Marcos Assuncao | |
71 | 68fb6bef | Marcos Assuncao | The remaining part of this paper is organised as follows. Section~\ref{sec:related_work} describes background and related work, Section~\ref{sec:architecture} presents the requirements and introduces the Kwapi architecture. Section~\ref{sec:performance} discusses experimental results measuring the throughput of drivers and plug-ins and Section~\ref{sec:conclusion} concludes the paper. |
72 | e542267e | Marcos Assuncao | |
73 | e542267e | Marcos Assuncao | % ---------------------------------------------------------------------------------------- |
74 | e542267e | Marcos Assuncao | |
75 | 389115e3 | Marcos Assuncao | \section{Background and Related Work} |
76 | e542267e | Marcos Assuncao | \label{sec:related_work} |
77 | e542267e | Marcos Assuncao | |
78 | 389115e3 | Marcos Assuncao | This section provides an overview of Ceilometer's architecture and describes related work in the field of monitoring the power consumption of large-scale computing infrastructure. |
79 | 389115e3 | Marcos Assuncao | |
80 | 389115e3 | Marcos Assuncao | \subsection{OpenStack Ceilometer} |
81 | 389115e3 | Marcos Assuncao | |
82 | 389115e3 | Marcos Assuncao | Ceilometer --- whose logical architecture\footnote{http://docs.openstack.org/developer/ceilometer/architecture.html} is depicted in Fugure~\ref{fig:arch_ceilometer} --- is OpenStack's framework for collecting performance metrics and information on resource consumption. As of writing, it allows for data collection in three ways: |
83 | 389115e3 | Marcos Assuncao | |
84 | 389115e3 | Marcos Assuncao | \begin{itemize} |
85 | 68fb6bef | Marcos Assuncao | \item \textbf{Bus listener agent}, which picks events on the Oslo notification bus and turns them into Ceilometer samples (\textit{e.g.} cumulative type, gauge or delta) that can then be stored into the database or provided to an external system via the publishing pipeline. |
86 | 68fb6bef | Marcos Assuncao | |
87 | 68fb6bef | Marcos Assuncao | \item \textbf{Push agents}, more intrusive, consist in deploying agents on the monitored nodes to push the data remotely to be taken by the collector. |
88 | 68fb6bef | Marcos Assuncao | |
89 | 389115e3 | Marcos Assuncao | \item \textbf{Polling agents} that poll APIs or other tool to collect information. |
90 | 389115e3 | Marcos Assuncao | \end{itemize} |
91 | 389115e3 | Marcos Assuncao | |
92 | 389115e3 | Marcos Assuncao | \begin{figure}[!htb] |
93 | 389115e3 | Marcos Assuncao | \center |
94 | 389115e3 | Marcos Assuncao | \includegraphics[width=1.\columnwidth]{figs/ceilometer_logical_architecture.pdf} |
95 | 389115e3 | Marcos Assuncao | \caption{Overview of Ceilometer's logical architecture.} |
96 | 389115e3 | Marcos Assuncao | \label{fig:arch_ceilometer} |
97 | 389115e3 | Marcos Assuncao | \end{figure} |
98 | 389115e3 | Marcos Assuncao | |
99 | 68fb6bef | Marcos Assuncao | The last two methods depend on a combination of central agent, computer agents and collector. Whilst the compute agents run on nodes and retrieve information about resource usage related to a given virtual machine instance and a resource owner, the central agent on the other hand, executes \textit{pollsters} on the management server to retrieve data that is not linked to a particular instance. Pollsters are executed, for example, to poll resources by using an API or other method. The Ceilometer database can be queried via the Ceilometer API, and allows an external system to view the history of a resource's metrics. It also enables a system to set and receive alarms. |
100 | 389115e3 | Marcos Assuncao | |
101 | 389115e3 | Marcos Assuncao | Metering messages can be signed using the \textit{hmac} module in Python's library, and a shared secret value can be provided in the configuration settings. The message signature is included in the message to be used for verification by the colector or by systems accessing the API. |
102 | 389115e3 | Marcos Assuncao | |
103 | 389115e3 | Marcos Assuncao | |
104 | 389115e3 | Marcos Assuncao | \subsection{Related Work} |
105 | 389115e3 | Marcos Assuncao | |
106 | e542267e | Marcos Assuncao | Over the past years, several techniques have been provided to minimise the energy consumed by computing infrastructure. At the hardware level, for instance, processors are able to operate at multiple frequency and voltage levels, and the operating systems or resource managers can choose the level that matches the current workload \cite{LaszewskiDVFS:2009}. At the resource management level, several approaches are proposed, including resource consolidation \cite{BeloglazovOpenStack:2014} and rescheduling requests \cite{OrgerieSaveWatts:2008}, generally with the goal of switching off unused resources or setting them to low power consumption modes. Attempts have also been made to assess the power consumed by individual applications \cite{NoureddineThesis:2014}. |
107 | e542267e | Marcos Assuncao | |
108 | 46564e42 | Marcos Assuncao | A means to monitor the energy consumption is often a key component to assess potential gains of techniques that aim to improve software and cloud resource management systems. Monitoring of Clouds is not a new topic \cite{AcetoMonitoring:2013} as tools to monitor computing infrastructure \cite{BrinkmannMonitoring:2013,VarretteICPP:2014} as well as ways to address some of the usual issues in management systems have been introduced \cite{WardMonitoring:2013,TanMonitoring:2013}. Moreover, several systems for measuring the power consumed by compute clusters have been described in the literature \cite{AssuncaoIngrid:2010}. As traditional system and network monitoring techniques lack the capability to interface with wattmeters, most approaches for measuring energy consumption have been tailored to the specific needs of the projects in which they were conceived. |
109 | e542267e | Marcos Assuncao | |
110 | 46564e42 | Marcos Assuncao | In our work we aim to draw some lessons from previous approaches to monitor and analyse the energy consumption of large scale distributed systems \cite{OrgerieSaveWatts:2008,DaCostaGreenNet:2010,AssuncaoIngrid:2010,MehdiHeterogeneous:2013}. We opted for creating a framework and integrate it with a successful cloud platform; OpenStack. Such a framework can be of value to the research community and practitioners working on the topic. |
111 | e542267e | Marcos Assuncao | |
112 | e542267e | Marcos Assuncao | % ---------------------------------------------------------------------------------------- |
113 | e542267e | Marcos Assuncao | |
114 | 610b40cd | Laurent Lefevre | \section{The Kwapi Architecture} |
115 | e542267e | Marcos Assuncao | \label{sec:architecture} |
116 | e542267e | Marcos Assuncao | |
117 | 46564e42 | Marcos Assuncao | Depending on the number of monitored devices and the frequency at which measurements are taken, wattmeters can generate a large amount of data, which requires storage capacity for further processing and analysis. Although storing and performing pre-processing locally in the monitored nodes if often an approach followed by certain management systems, such an approach poses a few challenges when measuring power consumption; it can impact on the CPU utilisation and hence influence in the power consumed by the nodes, and depending on the power management policy in place, unused nodes may be switched off or set to stand by mode to save energy. Centralised storage, on the other hand, allows for faster access and processing of data, but can generate more network traffic given that all measurements need to be transferred continuously over the network to be stored. Once stored in a central repository, this data can be easily retrieved by components like OpenStack's Ceilometer. |
118 | 46564e42 | Marcos Assuncao | |
119 | 46564e42 | Marcos Assuncao | Wattmeters available in the market vary in terms of physical interconnection, communication protocols, packaging and precision of measurements. They are mostly packaged in multiple outlet power strips called Power Distribution Units (PDUs) or enclosure PDUs (ePDUs), or more recently in the Intelligent Platform Management Interface (IPMI) cards embedded in computers; initially used as an alternative to shutdown or power up the central agent and a dedicated pollster we developed. IPMI is used to query a computer chassis remotely. |
120 | e542267e | Marcos Assuncao | |
121 | 46564e42 | Marcos Assuncao | The type of used interconnection is often either Ethernet to transport IPMI or SNMP packets over IP, or USB or RS-232 serial links. Wattmeters relying on Ethernet are generally linked to the administration network (off the data centre customer's network). Moreover, wattmeters may differ in the manner they operate. Some wattmeters send measurements to a management node on a regularly basis (push mode), whereas others must be queried (pull mode). Amongst the characteristics that differ across wattmeters we can list: |
122 | e542267e | Marcos Assuncao | |
123 | 46564e42 | Marcos Assuncao | \begin{itemize} |
124 | 46564e42 | Marcos Assuncao | \item maximum number of measurements per second (\textit{i.e.} refresh rate); |
125 | 46564e42 | Marcos Assuncao | \item measurement precision; and |
126 | 46564e42 | Marcos Assuncao | \item methodology applied to each measurement (\textit{e.g.} mean value between several measurements, instantaneous values, and exponential moving averages). |
127 | 46564e42 | Marcos Assuncao | \end{itemize} |
128 | e542267e | Marcos Assuncao | |
129 | 610b40cd | Laurent Lefevre | As an example, Table \ref{tab:wattmeters} shows the characteristics of energy sensors infrastructure that we deploy and evaluate on our data centres. |
130 | e542267e | Marcos Assuncao | |
131 | e542267e | Marcos Assuncao | \begin{table} |
132 | e542267e | Marcos Assuncao | \centering |
133 | 610b40cd | Laurent Lefevre | \caption{Wattmeters infrastructure} |
134 | e542267e | Marcos Assuncao | \label{tab:wattmeters} |
135 | e542267e | Marcos Assuncao | \begin{footnotesize} |
136 | e542267e | Marcos Assuncao | \begin{tabular}{llcc} |
137 | e542267e | Marcos Assuncao | \toprule |
138 | e542267e | Marcos Assuncao | \multirow{2}{18mm}{\textbf{Device Name}} & \multirow{2}{30mm}{\textbf{Interface}} & \multirow{2}{12mm}{\centering{\textbf{Refresh Time (s)}}} & \multirow{2}{10mm}{\centering{\textbf{Precision (W)}}} \\ |
139 | e542267e | Marcos Assuncao | & & & \\ |
140 | e542267e | Marcos Assuncao | \toprule |
141 | 610b40cd | Laurent Lefevre | Dell iDrac6 & IPMI / Ethernet & 5 & 7 \\ |
142 | e542267e | Marcos Assuncao | \midrule |
143 | 610b40cd | Laurent Lefevre | Eaton & Serial, SNMP via Ethernet & 5 & 1 \\ |
144 | e542267e | Marcos Assuncao | \midrule |
145 | e542267e | Marcos Assuncao | OmegaWatt & IrDA Serial & 1 & 0.125 \\ |
146 | e542267e | Marcos Assuncao | \midrule |
147 | 610b40cd | Laurent Lefevre | Schleifenbauer & SNMP via Ethernet & 3 & 0.1 \\ |
148 | e542267e | Marcos Assuncao | \midrule |
149 | e542267e | Marcos Assuncao | Watts Up? & Proprietary via USB & 1 & 0.1 \\ |
150 | e542267e | Marcos Assuncao | \midrule |
151 | e542267e | Marcos Assuncao | ZEZ LMG450 & Serial & 0.05 & 0.01 \\ |
152 | e542267e | Marcos Assuncao | \bottomrule |
153 | e542267e | Marcos Assuncao | \end{tabular} |
154 | e542267e | Marcos Assuncao | \end{footnotesize} |
155 | e542267e | Marcos Assuncao | \end{table} |
156 | e542267e | Marcos Assuncao | |
157 | 46564e42 | Marcos Assuncao | The granularity at which measurements are taken is another important factor as the needs often vary depending on what one wishes to evaluate. Taking measurements at time intervals of one second or smaller is common in several scenarios. This can be a challenge in an infrastructure comprising hundreds or thousands of nodes, demanding efficient and scalable mechanisms for transferring information on power consumption. |
158 | e542267e | Marcos Assuncao | |
159 | 46564e42 | Marcos Assuncao | Furthermore, leveraging the capabilities offered by existing cloud platforms like OpenStack, can help the adoption of a monitoring system, ease deployment, and reduce its learning curve. In addition, users and systems administrators need management reports and visualisation tools to analyse the impact of energy management policies and quickly retrieve relevant data for further analysis. |
160 | e542267e | Marcos Assuncao | |
161 | 389115e3 | Marcos Assuncao | \begin{figure*}[!htb] |
162 | 389115e3 | Marcos Assuncao | \center |
163 | 389115e3 | Marcos Assuncao | \includegraphics[width=0.6\linewidth]{figs/architecture.pdf} |
164 | 389115e3 | Marcos Assuncao | \caption{Overview of Kwapi's architecture.} |
165 | 389115e3 | Marcos Assuncao | \label{fig:architecture} |
166 | 389115e3 | Marcos Assuncao | \end{figure*} |
167 | 389115e3 | Marcos Assuncao | |
168 | 46564e42 | Marcos Assuncao | Hence, we summarise the main requirements for our energy monitoring platform as follows: |
169 | e542267e | Marcos Assuncao | |
170 | e542267e | Marcos Assuncao | \begin{itemize} |
171 | 46564e42 | Marcos Assuncao | \item \textbf{Reliable data storage}: a centralised storage where energy consumption data can be placed and easily retrieved. Note that centralised storage here does not imply that data is stored on a single node. Systems like Apache Hadoop HDFS\footnote{http://hadoop.apache.org/}, Apache Cassandra\footnote{http://cassandra.apache.org/}, and MongoDB\footnote{http://www.mongodb.org/} could be used. |
172 | e542267e | Marcos Assuncao | |
173 | e542267e | Marcos Assuncao | \item \textbf{Handle heterogeneous wattmeters}: there is a need for handling multiple device types and to design the architecture in a way that support for new wattmeters can be included. |
174 | e542267e | Marcos Assuncao | |
175 | e542267e | Marcos Assuncao | \item \textbf{Efficient communication}: the envisioned system should provide a means for nodes to efficiently communicate their energy consumption to components interested in processing it. A message bus could be used to manage this communication efficiently. |
176 | e542267e | Marcos Assuncao | |
177 | e542267e | Marcos Assuncao | \item \textbf{Integration with open source cloud platform}: the proposed system should interface with existing open source cloud platforms in order to ease deployment and use. |
178 | e542267e | Marcos Assuncao | |
179 | e542267e | Marcos Assuncao | \item \textbf{Visualisation and reports}: the system should offer a set of management reports that provide quick feedback to system administrators and users during execution of tasks or applications. In addition, it should provide means and APIs that allow more advanced queries to be made. |
180 | e542267e | Marcos Assuncao | \end{itemize} |
181 | e542267e | Marcos Assuncao | |
182 | 46564e42 | Marcos Assuncao | The following sections describe the architecture of Kwapi and how it addresses the aforementioned requirements. |
183 | 46564e42 | Marcos Assuncao | |
184 | 46564e42 | Marcos Assuncao | \subsection{Kwapi} |
185 | e542267e | Marcos Assuncao | |
186 | 389115e3 | Marcos Assuncao | Figure~\ref{fig:architecture} depicts the architecture of Kwapi, which is based on a set of layers comprising drivers, responsible for performing the measurements, and plug-ins that subscribe to collect the collected information. The communication between these two layers is handled by a bus as explained later. As a publish/subscribe architecture, plug-ins can subscribe to receive information collected by drivers from multiple sites. Drivers and plug-ins are easily extensible to support other types of wattmeters, and provide other services. Kwapi API is designed to provide a programming interface for developers and system administrators, and is used to interface with OpenStack by providing the information (\text{i.e.} by polling monitored devices) required to feed Ceilometer. |
187 | e542267e | Marcos Assuncao | |
188 | 389115e3 | Marcos Assuncao | In the context of publishing energy metrics, we use the central agent and a dedicated pollster we developed. It queries the Kwapi API plug-in and publishes cumulative (kWh) and gauge (W) counters. These counters are not yet associated with a particular user, since a server can host multiple clients simultaneously. In the following, we provide more details about some of the framework layers. |
189 | e542267e | Marcos Assuncao | |
190 | e542267e | Marcos Assuncao | \subsubsection{Drivers} |
191 | e542267e | Marcos Assuncao | |
192 | 46564e42 | Marcos Assuncao | The drivers are threads initialised by a manager by providing a set of parameters loaded from a file compliant with the OpenStack configuration format, similar to INI. These parameters are used to query the meters (\textit{e.g.} IP address and port) and indicate the sensor IDs in the issued metrics. The measurements that a driver obtains are represented as JSON dictionaries, which have the advantage of being human readable and can be parsed easily, while keeping a small footprint. The size of the dictionaries may vary depending on the number of fields set by the drivers (\textit{i.e.} whether message signing is enabled). Figure~\ref{fig:json} shows an example of a JSON payload containing one measurement. Optional fields can be added, such as voltage and current. ACK messages have a fixed size of 66 bytes (on a TCP link). When drivers and API are on the same machine, they communicate via IPC sockets. |
193 | 46564e42 | Marcos Assuncao | |
194 | 46564e42 | Marcos Assuncao | \begin{figure} |
195 | 46564e42 | Marcos Assuncao | \includeJSON{figs/measurement.json} |
196 | 46564e42 | Marcos Assuncao | \caption{Example of JSON payload.} |
197 | 46564e42 | Marcos Assuncao | \label{fig:json} |
198 | 46564e42 | Marcos Assuncao | \end{figure} |
199 | e542267e | Marcos Assuncao | |
200 | 46564e42 | Marcos Assuncao | The manager periodically checks if all threads are active, restarting them if necessary as incidents may occur; for example, if a meter is disconnected or becomes inaccessible. The drivers can manage incidents themselves, but if for any reason they stop their execution, they are automatically restarted by the manager. It is important to avoid losing measurements because the information reported is in W and not kWh; the loss of a measurement is hence important. |
201 | e542267e | Marcos Assuncao | |
202 | e542267e | Marcos Assuncao | |
203 | 46564e42 | Marcos Assuncao | \subsubsection{Plug-ins} |
204 | 46564e42 | Marcos Assuncao | |
205 | 46564e42 | Marcos Assuncao | A plug-in retrieves and processes measurements taken by the drivers and provided via the bus. Plug-ins expose this information to other services like Ceilometer and to the user via visualisation tools. They can subscribe to all sensors, a subset of them, or to other plug-ins by using a system of prefixes. After verifying a message signature, they extract the fields and process the received data. As described in the following, currently Kwapi provides two plug-ins, namely an API to interface with Ceilometer, and a visualisation tool. |
206 | e542267e | Marcos Assuncao | |
207 | 389115e3 | Marcos Assuncao | \begin{figure*}[!htb] |
208 | 389115e3 | Marcos Assuncao | \center |
209 | 68fb6bef | Marcos Assuncao | \includegraphics[width=0.9\linewidth]{figs/graph_example.jpg} |
210 | 389115e3 | Marcos Assuncao | \caption{Example of graph generated by the visualisation plug-in (4 monitored servers).} |
211 | 389115e3 | Marcos Assuncao | \label{fig:graph_example} |
212 | 389115e3 | Marcos Assuncao | \end{figure*} |
213 | 389115e3 | Marcos Assuncao | |
214 | e542267e | Marcos Assuncao | \begin{itemize} |
215 | 46564e42 | Marcos Assuncao | |
216 | 46564e42 | Marcos Assuncao | \item \textbf{API for Ceilometer}: the API plug-in computes the number of kWh of each probe, adds a timestamp, and stores the last value in watts. This data is not stored in a database as Ceilometer already has its own. If a probe has not provided measurements for a long time, the corresponding data is removed. This plug-in has a REST API that allows a client to retrieve the name of the probes, measurements in W, kWh, and timestamps. The API is secured by using OpenStack Keystone tokens, whereby the client provides a token, and the plug-in contacts Keystone API to check the token validity before sending its response. |
217 | e542267e | Marcos Assuncao | |
218 | 46564e42 | Marcos Assuncao | \item \textbf{Visualisation}: the visualisation plug-in builds Round-Robin Database (RRD) files from received measurements, and generates graphs that show the energy consumption over a given period, with additional information (average electricity consumption, minimum and maximum watt values, last value, total energy and cost in Euros). RRD files are of fixed size, and store several collections of metrics with different granularities. Figure~\ref{fig:graph_example} shows an example of generated graph. In addition, a web interface displays the generated graphics and a cache mechanism triggers the creation of graphs during queries only if they are out of date. |
219 | e542267e | Marcos Assuncao | \end{itemize} |
220 | e542267e | Marcos Assuncao | |
221 | 610b40cd | Laurent Lefevre | |
222 | 46564e42 | Marcos Assuncao | \subsubsection{Internal communication bus} |
223 | e542267e | Marcos Assuncao | |
224 | 46564e42 | Marcos Assuncao | Kwapi uses ZeroMQ\footnote{http://zeromq.org/}, a fast broker-less messaging framework, written in C++, where transmitters play the role of buffers. ZeroMQ supports a wide range of bus modes, including cross-thread communication, IPC, and TCP. Switching from one to another is straightforward. It also provides several design patterns such as publish/subscribe, and request/response. In our architecture, we use a publish/subscribe design pattern where drivers are publishers, and plug-ins are subscribers. Amongst them, one or more forwarders simply forward packets, and broadcast a packet to all plug-ins subscribed to receive information from a given probe. Thanks to the forwarders, the network usage is optimised because the packets generated by a driver are sent only once, regardless the number of plug-ins that listen to a probe. If a probe is not listened by any plug-in, its measurements are neither sent over the network nor to the first forwarder. The forwarders not only reduce dramatically the network usage, but allow to build flexible architectures, by bypassing network isolation problems, or doing load balancing. |
225 | e542267e | Marcos Assuncao | |
226 | e542267e | Marcos Assuncao | |
227 | 46564e42 | Marcos Assuncao | \section{Performance Evaluation} |
228 | 46564e42 | Marcos Assuncao | \label{sec:performance} |
229 | 9d39d328 | François Rossigneux | |
230 | 46564e42 | Marcos Assuncao | In this section we provide results of a simple performance evaluation we carried out in our testbed. Note that our goal is not to compare publish/subscribe systems as such work has already been performed elsewhere \cite{EugsterSurvey:2003,FabretPS:2001}. The evaluation demonstrates that the framework serves well the needs of a large range of users of the Grid'5000 platform \cite{Grid5000}; the system we use and where the framework is currently deployed as a means to collect and provide energy consumption information to users. |
231 | 9d39d328 | François Rossigneux | |
232 | 389115e3 | Marcos Assuncao | We wanted to evaluate the CPU and network usage of a typical driver to observe the framework's throughput, since provisioning a large number of resources for monitoring purposes was not desirable. For this experiment we deployed the Kwapi drivers and API on a machine with a Core 2 Duo P8770 2.53Ghz processor and 4GB of RAM. We considered several scenarios where we emulated several IPMI cards, each card monitored by a driver thread placing a measurement per second on the communication bus; and cases with multiple PDUs with 10 outlets each and each PDU monitored by a driver thread placing ten values per second on the bus. We have evaluated these scenarios considering both message signature enabled and disabled. Table~\ref{tab:parameters_usage} summarises the considered scenarios. |
233 | 9d39d328 | François Rossigneux | |
234 | 9d39d328 | François Rossigneux | \begin{table} |
235 | 9d39d328 | François Rossigneux | \centering |
236 | 389115e3 | Marcos Assuncao | \caption{Scenarios considered in the experiments.} |
237 | 46564e42 | Marcos Assuncao | \label{tab:parameters_usage} |
238 | 389115e3 | Marcos Assuncao | \begin{tabular}{lcc} |
239 | 46564e42 | Marcos Assuncao | \toprule |
240 | 389115e3 | Marcos Assuncao | \textbf{Scenario name} & \textbf{Agent thread scheme} & \textbf{Message signature} \\ |
241 | 46564e42 | Marcos Assuncao | \toprule |
242 | 389115e3 | Marcos Assuncao | IPMI message signed & 1 thread per card & Enabled\\ |
243 | 389115e3 | Marcos Assuncao | \midrule |
244 | 389115e3 | Marcos Assuncao | IPMI message unsigned & 1 thread per card & Disabled\\ |
245 | 46564e42 | Marcos Assuncao | \midrule |
246 | 389115e3 | Marcos Assuncao | PDU message signed & 1 thread per PDU & Enabled\\ |
247 | 46564e42 | Marcos Assuncao | \midrule |
248 | 389115e3 | Marcos Assuncao | PDU message unsigned & 1 thread per PDU & Disabled\\ |
249 | 46564e42 | Marcos Assuncao | \bottomrule |
250 | 9d39d328 | François Rossigneux | \end{tabular} |
251 | 9d39d328 | François Rossigneux | \end{table} |
252 | 9d39d328 | François Rossigneux | |
253 | 389115e3 | Marcos Assuncao | Moreover, we vary the number of IPMI cards and PDUs respectively from 500 to 5000 and from 50 to 500 to observe the scalability of the framework. |
254 | 389115e3 | Marcos Assuncao | |
255 | 389115e3 | Marcos Assuncao | % This is going to change... |
256 | 46564e42 | Marcos Assuncao | Figure~\ref{fig:cpu_usage} shows the results of CPU usage. Under the evaluated scenarios, the socket type and number of driver threads do not seem to have a distinguishable impact on the CPU usage. On the test machine, the Kwapi drivers with message signing disabled (\textit{i.e.} IPMI cards unsigned and PDUs unsigned) consumed on average 20\% of the total CPU power. The Kwapi API consumed around 10\% with message signing disabled and 16\% when making one request per second querying the last measurements of all probes. Message signing overall increases the CPU usage by 30\% (see IPMI cards signed and PDUs signed). |
257 | 9d39d328 | François Rossigneux | |
258 | 46564e42 | Marcos Assuncao | \begin{figure}[!ht] |
259 | 46564e42 | Marcos Assuncao | \center |
260 | 46564e42 | Marcos Assuncao | \includegraphics[width=1.0\columnwidth]{figs/cpu_usage.pdf} |
261 | 46564e42 | Marcos Assuncao | \caption{CPU usage under the evaluated scenarios.} |
262 | 46564e42 | Marcos Assuncao | \label{fig:cpu_usage} |
263 | 46564e42 | Marcos Assuncao | \end{figure} |
264 | 46564e42 | Marcos Assuncao | |
265 | 46564e42 | Marcos Assuncao | Although the CPU usage often depends on the drivers, plug-ins, and their complexity, and whether message signing is enabled, the experiments show that a large number of probes can be managed by a single machine. In our environment, a management machine per site is more than enough to accommodate the monitoring needs of users. The drivers and API can reuse a machine that already serves other monitoring purposes. |
266 | 46564e42 | Marcos Assuncao | |
267 | 46564e42 | Marcos Assuncao | \begin{figure}[!ht] |
268 | 46564e42 | Marcos Assuncao | \center |
269 | 46564e42 | Marcos Assuncao | \includegraphics[width=1.0\columnwidth]{figs/packet_size.pdf} |
270 | 46564e42 | Marcos Assuncao | \caption{Packet sizes under the evaluated scenarios.} |
271 | 46564e42 | Marcos Assuncao | \label{fig:packet_size} |
272 | 46564e42 | Marcos Assuncao | \end{figure} |
273 | 9d39d328 | François Rossigneux | |
274 | 46564e42 | Marcos Assuncao | While measuring the network usage, our experiments showed a transfer rate of around 230KB/s with message signing enabled and around 135KBs/s otherwise. Message signing overall introduces an overhead of 70\%. Sending large packets can be explored to decrease the packet overhead. If several drivers send measurments simultaneously, ZeroMQ provides an optimisation mechanism that aggregates the data into a single TCP datagram. Figure~\ref{fig:packet_size} shows the number of packets under the evaluated scenarios. We noticed that certain packets contain up to forty measurements. |
275 | 9d39d328 | François Rossigneux | |
276 | 46564e42 | Marcos Assuncao | As mentioned earlier, plug-ins can subscribe and select probes from which they want to receive information. If multiple plug-ins select a node, information from the node is sent only once through the network. The architecture also allows for a hierarchy of plug-ins to be established, where a plug-in can be deployed on a site to summarise or compute average values that are placed on the bus to be consumed by higher level plug-ins. |
277 | 9d39d328 | François Rossigneux | |
278 | e542267e | Marcos Assuncao | % ---------------------------------------------------------------------------------------- |
279 | 46564e42 | Marcos Assuncao | |
280 | 46564e42 | Marcos Assuncao | \section{Conclusion} |
281 | e542267e | Marcos Assuncao | \label{sec:conclusion} |
282 | 46564e42 | Marcos Assuncao | |
283 | 610b40cd | Laurent Lefevre | In this paper, we described a framework for monitoring the power consumed by resources of a data centre. Based on lessg |
284 | 610b40cd | Laurent Lefevre | ons learned by monitoring the power consumption of a large distributed infrastructure, we described the main user requirements and how they are met by the proposed architecture. The framework works in tandem with OpenStack's ceilometer. Experimental results demonstrate that the overhead posed by the monitoring framework is small, allowing us to serve the users' monitoring needs in our infrastructure. |
285 | 46564e42 | Marcos Assuncao | |
286 | 46564e42 | Marcos Assuncao | As future work, we intend to explore means to increase the monitoring granularity and the number of measured devices by applying a hierarchy of plug-ins, and a stream processing system \footnote{https://storm.incubator.apache.org}$^,$\footnote{http://incubator.apache.org/s4/} for processing sterams of measurement tuples. |
287 | 46564e42 | Marcos Assuncao | |
288 | e542267e | Marcos Assuncao | % ---------------------------------------------------------------------------------------- |
289 | 46564e42 | Marcos Assuncao | |
290 | 46564e42 | Marcos Assuncao | \section*{Acknowledgment} |
291 | 46564e42 | Marcos Assuncao | |
292 | 46564e42 | Marcos Assuncao | This research is supported by the French FSN (Fonds national pour la Societe Numerique) XLcloud project. Some experiments presented in this paper were carried out using the Grid'5000 experimental testbed, being developed under the Inria ALADDIN development action with support from CNRS, RENATER and several Universities as well as other funding bodies (see https://www.grid5000.fr). Authors wish to thank Julien Danjou for his help during the integration of Kwapi with Openstack and Ceilometer. |
293 | 46564e42 | Marcos Assuncao | |
294 | e542267e | Marcos Assuncao | \bibliographystyle{IEEEtran} |
295 | 389115e3 | Marcos Assuncao | %\balance |
296 | 46564e42 | Marcos Assuncao | \bibliography{references} |
297 | 46564e42 | Marcos Assuncao | |
298 | 46564e42 | Marcos Assuncao | \end{document} |