From 186d45cb4ac0229dfbca8ebe786fd8d2dd411632 Mon Sep 17 00:00:00 2001 From: John Garbutt Date: Tue, 24 Nov 2015 16:46:44 +0000 Subject: [PATCH] docs: add clarifications to move servers This adds in some titles, to replace the bold text that simulated heading. It also changes the text into use cases rather than actions. It includes a some additions around: * resource optimization use cases for live-migration * data loss during evacuate * downtime during resize * clearer description of why a user would call shelve This does not complete this doc, but should help improve the readability. blueprint complete-todo-in-api-concept-doc Change-Id: Ib2cf479c14a1c2f3b119ba7f549a00c9bd541cf9 --- api-guide/source/server_concepts.rst | 65 +++++++++++++++++++++++----- 1 file changed, 53 insertions(+), 12 deletions(-) diff --git a/api-guide/source/server_concepts.rst b/api-guide/source/server_concepts.rst index 6910ff5334..e426fc9d56 100644 --- a/api-guide/source/server_concepts.rst +++ b/api-guide/source/server_concepts.rst @@ -399,15 +399,20 @@ compute host to another including shelve, resize, migrations and evacuate. The following use cases demonstrate the intention of the actions and the consequence for operational procedures. -**Shelving** +User doesn't want to be charged when not using a server +------------------------------------------------------- Sometimes a user does not require a server to be active for a while, -perhaps over a weekend or at certain times of day. This gives -the cloud operator an opportunity to make better use of resources by -freeing resources and rebalancing workloads across the infrastructure. +perhaps over a weekend or at certain times of day. +Ideally they don't want to be billed for those resources. +Just powering down a server does not free up any resources, +but shelving a server does free up resources to be used by other users. +This makes it feasible for a cloud operator to offer a discount when +an server is shelved. When the user shelves a server the operator can choose to remove it -from the compute hosts. When it is unshelved it is scheduled to a new +from the compute hosts, i.e. the operator can offload the shelved server. +When the user's server is unshelved, it is scheduled to a new host according to the operators policies for distributing work loads across the compute hosts, including taking disabled hosts into account. This will contribute to increased overall capacity, freeing hosts that @@ -418,14 +423,30 @@ Shelving a server is not normally a choice that is available to the cloud operator because it affects the availability of the server being provided to the user. -**Resize** +User resizes server to get more resources +----------------------------------------- Sometimes a user may want to change the flavor of a server, e.g. change the quantity of cpus, disk, memory or any other resource. This is done by rebuilding the server with a new flavor. As the server is being -rebuilt it is normal to reschedule the server to another host +moved, it is normal to reschedule the server to another host (although resize to the same host is an option for the operator). +Resize involves shutting down the server, finding a host that has +the correct resources for the new flavor size, moving the current +server (including all storage) to the new host. Once the server +has been given the appropriate resources to match the new flavor, +the server is started again. + +After the resize operation, when the user is happy their server is +working correctly after the resize, the user calls Confirm Resize. +This deletes the backup server that was kept on the source host. +Alternatively, the user can call Revert Resize to delete the new +resized server, and restore the back up that was stored on the source +host. If the user does not manually confirm the resize within a +configured time period, the resize is automatically confirmed, to +free up the space the backup is using on the source host. + As with shelving, resize provides the cloud operator with an opportunity to redistribute work loads across the cloud according to the operators scheduling policy, providing the same benefits as @@ -435,7 +456,8 @@ Resizing a server is not normally a choice that is available to the cloud operator because it changes the nature of the server being provided to the user. -**Migration (including cold and live migration)** +Cloud operator needs to move a server +------------------------------------- Sometimes a cloud operator may need to redistribute work loads for operational purposes. For example, the operator may need to remove @@ -459,7 +481,7 @@ As a result it can take considerably longer than cold migration. During the action the server is online and accessible, but only a limited set of management actions are available to the user. -The following are two common patterns for employing migrations in +The following are common patterns for employing migrations in a cloud: - **Host maintenance** @@ -478,8 +500,8 @@ a cloud: Often it is necessary to perform an update on all compute hosts that requires them to be rebooted. In this case it is not - strictly necessary to move inactive instances because they - will be available after the reboot. However, active instances would + strictly necessary to move inactive servers because they + will be available after the reboot. However, active servers would be impacted by the reboot. Live migration will allow them to continue operation. @@ -494,12 +516,22 @@ a cloud: This process can be repeated until the whole cloud has been updated, usually using a pool of empty hosts instead of just one. +- **Resource Optimization** + + To reduce energy usage, some users will try and move servers so + they fit into the minimum number of hosts, allowing some servers + to be turned off. + + Sometimes higher performance might be wanted, so servers are + spread out between the hosts to minimize resource contention. + Migrating a server is not normally a choice that is available to the cloud user because the user is not normally aware of compute hosts. Management of the cloud and how servers are provisioned in it is the sole responsibility of the cloud operator. -**Evacuate** +Recover from a failed compute host +---------------------------------- Sometimes a compute host may fail. This is a rare occurrence, but when it happens during normal operation the servers running on the host may @@ -519,6 +551,15 @@ hosts without worry of the old incarnations reappearing and trying to access shared resources. It is usual to redistribute the servers from a failed host by rescheduling them. +Please note, this operation can result in data loss for the user's server. +As there is no access to the original server, if there were any disks stored +on local storage, that data will be lost. Evacuate does the same operation +as a rebuild. It downloads any images from glance and creates new +blank ephemeral disks. Any disks that were volumes, or on shared storage, +are reconnected. There should be no data loss for those disks. +This is why fencing the host is important, to ensure volumes and shared +storage are not corrupted by two servers writing simultaneously. + Evacuating a server is solely in the domain of the cloud operator because it must be performed in coordination with other operational procedures to be safe. A user is not normally aware of compute hosts but is adversely