12 |
12 |
\usepackage{color}
|
13 |
13 |
\usepackage{xcolor}
|
14 |
14 |
\usepackage{balance}
|
|
15 |
\usepackage{algorithm2e}
|
15 |
16 |
|
16 |
17 |
\acrodef{KWAPI}{KiloWatt API}
|
17 |
18 |
\acrodef{KWRanking}{KiloWatt Ranking}
|
... | ... | |
181 |
182 |
\section{Reservation Strategies}
|
182 |
183 |
\label{sec:strategies}
|
183 |
184 |
|
184 |
|
To benefit from adding support for reservation of bare-metal resources, we discuss simple strategies that a private cloud could implement towards reducing the amount of energy spent with computational resources. These strategies can complement the ranking system described above.
|
|
185 |
To benefit from adding support for reservation of bare-metal resources, we discuss strategies that a private cloud could implement towards reducing the amount of energy consumed by compute resources. These strategies could be used to complement the ranking system described above by, for instance, selecting the least power efficient resources to be switched off.
|
185 |
186 |
|
186 |
187 |
\subsection{Power-off Idle Resources}
|
187 |
188 |
|
188 |
|
This strategy...
|
|
189 |
The strategy considered here consist of checking periodically what resources are idle. Once determined that a resource has remained idle for a number of consecutive intervals and it is not committed to serve a reservation over a give time horizon --- \textit{i.e.} when reservation is enabled --- the resource is powered off. Previous work \cite{OrgerieSaveWatts:2008} has evaluated the impact of decisions on appropriate intervals for measuring idleness, for deciding on the horizon for switching off resources committed to reservations. This work considers that the measurement interval, idleness time, and reservation horizon are respectively 1, 5 and 15 minutes. Algorithm~\ref{algo:alloc_policy} summarises the strategy.
|
|
190 |
|
|
191 |
\IncMargin{-0.6em}
|
|
192 |
\RestyleAlgo{ruled}\LinesNumbered
|
|
193 |
\begin{algorithm}[ht]
|
|
194 |
\caption{Sample resource allocation policy.}
|
|
195 |
\label{algo:alloc_policy}
|
|
196 |
\DontPrintSemicolon
|
|
197 |
\SetAlgoLined
|
|
198 |
\SetAlgoVlined
|
|
199 |
\footnotesize{
|
|
200 |
|
|
201 |
\label{algo:check_idle_start}\textbf{procedure} checkIdleNodes()\;
|
|
202 |
\Begin{
|
|
203 |
$Res\_idle_t \leftarrow $ get list of idle resources at interval $t$\;
|
|
204 |
$Res\_idle_{t-1} \leftarrow $ get list of idle resources at interval $t-1$\;
|
|
205 |
\If{$success = $ \textbf{false}}{
|
|
206 |
$success \leftarrow enqueueReq(r)$\;
|
|
207 |
}
|
|
208 |
|
|
209 |
\ForEach{resource $r \in Res\_idle_t$}{
|
|
210 |
\eIf{$r \in Res\_idle_{t-1}$}{
|
|
211 |
// increase number of idle intervals of $r$\;
|
|
212 |
$r.idle_intervals \leftarrow r.idle_intervals + 1$\;
|
|
213 |
}{
|
|
214 |
$r.idle_intervals \leftarrow 1$\;\label{algo:check_idle_end}
|
|
215 |
}
|
|
216 |
}
|
|
217 |
}
|
|
218 |
|
|
219 |
\BlankLine
|
|
220 |
\label{algo:switch_start}\textbf{procedure} switchResourcesOnOff()\;
|
|
221 |
\Begin{
|
|
222 |
$Res\_on_{t} \leftarrow $ list of resources switched on\;
|
|
223 |
$Res\_off_{t} \leftarrow $ list of resources switched off\;
|
|
224 |
$Res\_reserv_{t,h} \leftarrow $ resources reserved until horizon $h$\;
|
|
225 |
$nres\_reserv_{t,h} \leftarrow $ number of resources in $Res\_reserved_{t,h}$\;
|
|
226 |
$nres\_fcast_{t+1} \leftarrow $ forecast number of resources required at $t+1$ \;
|
|
227 |
$nres\_req_{t+1} \leftarrow max(nres\_fcast_{t+1},nres\_reserv_{t,h})$\;
|
|
228 |
|
|
229 |
\While{$nres\_req_{t+1} < sizeof(Res\_on_{t})$} {
|
|
230 |
$r \leftarrow $ pop resource from $Res_{off}$\;
|
|
231 |
switch resource $r$ on\;
|
|
232 |
add $r$ to $Res\_on_{t}$\;
|
|
233 |
}
|
|
234 |
|
|
235 |
$Res\_idle_t \leftarrow $ get list of idle resources at interval $t$\;
|
|
236 |
\While{$nres\_req_{t+1} > sizeof(Res\_on_{t})$} {
|
|
237 |
\ForEach{resource $r \in Res\_idle_t$}{
|
|
238 |
\If{$r.idle\_intervals \geq 5$ and $r \notin Res\_reserv_{t,h}$}{
|
|
239 |
remove $r$ from $Res\_on_{t}$\;
|
|
240 |
switch resource $r$ off\;
|
|
241 |
add $r$ to $Res\_off_{t}$\;
|
|
242 |
}
|
|
243 |
\If{$nres\_req_{t+1} == sizeof(Res\_on_{t})$}{
|
|
244 |
\textbf{break}\; \label{algo:switch_end}
|
|
245 |
}
|
|
246 |
}
|
|
247 |
}
|
|
248 |
}
|
|
249 |
}
|
|
250 |
|
|
251 |
\While{system is running} {
|
|
252 |
every minute call checkIdleNodes()\;
|
|
253 |
every 5 minutes call switchResourcesOnOff()\;
|
|
254 |
}
|
|
255 |
\end{algorithm}
|
|
256 |
\IncMargin{0.6em}
|
|
257 |
|
|
258 |
Lines~\ref{algo:check_idle_start} to \ref{algo:check_idle_end} contains the pseudo-code to identify idle resources, whereas lines \ref{algo:switch_start} to \ref{algo:switch_end} determines the resources that need to be switched on or off.
|
189 |
259 |
|
190 |
260 |
% ----------------------------------------------------------------------------------------
|
191 |
261 |
|
... | ... | |
198 |
268 |
|
199 |
269 |
A discrete-event simulator developed in house is used to model and simulate the resource allocation and request scheduling in a private cloud setting. We resort to simulation as it enables controlled, repeatable and large-scale experiments. Both infrastructure capacity and resource requests are expressed in number of CPU cores. As traces of cloud workloads are very difficult to obtain, we use request logs gathered from Grid'5000 sites and adapt them to model cloud users' resource demands. Under normal operation, Grid'5000 enables resource reservations, but users' requests are conditioned by the available resources. For instance, a user willing to allocate resources for an experiment will often check a site's agenda, see what resources are available and will eventually make a reservation during a convenient time frame. If the user cannot find enough resources, she will either adapt her requirements to resource availability --- \textit{e.g.} change the number of required resources, and reservation start or/and finish time --- or choose another site with available capacity. The request traces, however, do not capture what users' initial requirements were before they make their requests.
|
200 |
270 |
|
201 |
|
In order to obtain a workload trace on provisioning of bare-metal resources that is more cloud oriented, we adapt the request traces and infrastructure capacity of Grid'5000 by making the following changes:
|
|
271 |
In order to obtain a workload trace on provisioning of bare-metal resources that is more cloud oriented, we adapt the request traces and infrastructure capacity of Grid'5000 by making the following changes to reservation requests:
|
202 |
272 |
|
203 |
273 |
\begin{enumerate}
|
204 |
|
\item \label{enum:cond1} Advance reservation requests whose original submission time is within working hours and start time lies outside these hours are considered immediate reservations starting at their original submission time.
|
205 |
|
\item \label{enum:cond2} Requests whose original submission and start times are on different days of the week are also turned into immediate reservations, both submitted and starting at their original start time.
|
|
274 |
\item \label{enum:cond1} Requests whose original submission time is within working hours and start time lies outside these hours are considered as on-demand requests starting at their original submission time.
|
|
275 |
\item \label{enum:cond2} Remaining requests are considered as on-demand requests both submitted and starting at their original start time.
|
206 |
276 |
\item \label{enum:capacity} The resource capacity of a site is modified to the maximum number of CPU cores required to honour all requests, plus a safety factor.
|
207 |
277 |
\end{enumerate}
|
208 |
278 |
|
209 |
|
Change \ref{enum:cond1} modifies the behaviour of users who today explore resources during off-peak periods, whereas \ref{enum:cond2} alters the current practice of planning experiments in advance and reserving resources before they are taken by other users. Although the changes may seem extreme at first, they allow us to evaluate what we consider to be our \textit{worst case scenario}. Moreover, as mentioned earlier, we believe the model adopted by existing clouds, where short-term advance reservations are generally not allowed and prices of on-demand instances do not vary over time, users would have little incentives to explore off-peak periods or plan their demand in advance. Change \ref{enum:capacity} reflects the industry practice of provisioning resources to handle peak demand and including a margin of safety.
|
|
279 |
The characteristics of best-effort requests are not changed. Change \ref{enum:cond1} modifies the behaviour of users who today explore resources during off-peak periods, whereas \ref{enum:cond2} alters the current practice of planning experiments in advance and reserving resources before they are taken by other users. Although the changes may seem extreme at first, they allow us to evaluate what we consider to be our \textit{worst case scenario} where reservation is not enabled. Moreover, as mentioned earlier, we believe the model adopted by existing clouds, where short-term advance reservations are generally not allowed and prices of on-demand instances do not vary over time, users would have little incentives to explore off-peak periods or plan their demand in advance. Change \ref{enum:capacity} reflects the industry practice of provisioning resources to handle peak demand and including a margin of safety.
|
210 |
280 |
|
211 |
281 |
\subsection{Performance Metrics}
|
212 |
282 |
|