vlambda博客
学习文章列表

读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群

Operating and Managing a Ceph Cluster

在本章中,我们将介绍以下秘籍:

  • Understanding Ceph service management
  • Managing the cluster configuration file
  • Running Ceph with SYSTEMD
  • Scale-up versus scale-out
  • Scaling out your Ceph cluster
  • Scaling down your Ceph cluster
  • Replacing a failed disk in the Ceph cluster
  • Upgrading your Ceph cluster
  • Maintaining a Ceph cluster

Introduction

在这一点上,我相信您对 Ceph 集群的部署、配置和监控很有信心。在本章中,我们将介绍标准主题,例如 Ceph 服务管理。我们还将介绍高级主题,例如通过添加 OSD 使用ceph-ansible扩展您的集群span> MON 节点,最后升级 Ceph 集群,然后进行一些维护操作。< /span>

Understanding Ceph service management

Ceph 的每个组件,无论是 MON、OSD、MDS 还是 RGW,都在底层操作系统之上作为服务运行。作为 Ceph 存储管理员,您应该了解 Ceph 服务以及如何操作它们。根据基于 Red Hat 的发行版,Ceph 守护程序作为传统的 systemd 管理器服务 进行管理。每次您启动,重新启动,然后 stop Ceph 守护进程(或您的整个集群),您必须至少指定一个选项和一个命令。您还可以指定守护程序类型或守护程序实例。一般语法如下:

systemctl [选项...] 命令 [服务名称...]

systemctl 选项包括:

  • --help or -h: Prints a short help text
  • --all or -a: When listing units, show all loaded units, regardless of their state
  • --signal or -s: When used will kill, choose which signal to send to the selected process
  • --force or -f: When used with enable, overwrites any existing conflicting symlinks
  • --host or -h: Execute an operation on a remote host

systemctl 命令包括以下内容:

  • status: Shows status of the daemon
  • start: Starts the daemon
  • stop: Stops the daemon
  • restart: Stops and then starts the daemon
  • kill: Kills the specified daemon
  • reload: Reloads the config file without interrupting pending operations
  • list-units: Lists known units managed by systemd
  • condrestart: Restarts if the service is already running
  • enable: Turns the service on for the next boot or other triggering event
  • disable: Turns the service off for the next boot or other triggering event
  • is-enabled: Used to check whether a service is configured to start or not in the current environment

systemctl 可以针对以下 Ceph 服务类型:

  • ceph-mon
  • ceph-osd
  • ceph-mds
  • ceph-radosgw

Managing the cluster configuration file

如果您管理的是大型集群,最好使用有关集群 MON 的信息更新您的集群配置文件 (/etc/ceph/ceph.conf), OSD、MDS 和 RGW 节点。有了这些条目,您就可以从单个节点管理所有集群服务。

How to do it...

ceph-ansible 管理我们将用于更新集群配置的 ceph 配置文件的所有方面。为了实现这一点,我们将使用 /etc/ansible/group_vars/all.yml 文件的 ceph_conf_overrides 部分更新 Ceph 配置文件,并将添加所有 MON、OSD 和 MDS 节点的详细信息。 Ansible 支持与 Ceph 配置文件相同的部分:[global][mon][osd][mds] [rgw] 等等。

Adding monitor nodes to the Ceph configuration file

由于我们有三个监控节点,请将它们的详细信息添加到 all.yml 文件的 ceph_conf_overrides 部分:

  1. In ceph-node1 in the /usr/share/ceph-ansible/group_vars directory, edit the ceph_conf_overrides section of the all.yml to reflect the three monitors in the cluster:
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. Save the updated all.yml file and re-run the playbook from the /usr/share/ceph-ansible directory:
# ansible-playbook site.yml
  1. Validate that the Ceph configuration file has properly updated the monitor nodes in the cluster by viewing the /etc/ceph/ceph.conf file:
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
间距和格式 all.yml 文件需要与屏幕截图和示例中看到的完全相同,否则在运行 ansible-playbook 时,由于格式不正确,它会出错。

Adding an MDS node to the Ceph configuration file

就像在监视器中一样,让我们​​从 ceph-node1 将 MDS 节点详细信息添加到 /etc/ceph/ceph.conf 文件中 使用 Ansible:

  1. On ceph-node1 in the /usr/share/ceph-ansible/group_vars directory, edit the ceph_conf_overrides section of the all.yml to reflect the MDS nodes details. As with the monitors, please be careful with the formatting of the file or the running of the playbook will fail with a formatting error:
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. Save the updated all.yml file and re-run the playbook from the /usr/share/ceph-ansible directory:
 # ansible-playbook site.yml
  1. Validate that the Ceph configuration file updated the MDS node in the cluster by viewing the /etc/ceph/ceph.conf file:
    读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群

Adding OSD nodes to the Ceph configuration file

现在,让我们将 OSD 节点详细信息添加到 /etc/ceph/ceph.conf 文件ceph-node1 使用 Ansible:

  1. On ceph-node1 in the /usr/share/ceph-ansible/group_vars directory, edit the ceph_conf_overrides section of the all.yml to reflect the OSD nodes details. As with the monitors, please be careful with the formatting of the file or the running of the playbook will fail with a formatting error:
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. Save the updated all.yml file and re-run the playbook from the /usr/share/ceph-ansible directory:
# ansible-playbook site.yml
  1. Validate that the Ceph configuration file properly updated the OSD nodes in the cluster by viewing the /etc/ceph/ceph.conf file:
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群

Running Ceph with systemd

Ceph 进程管理是通过 systemd 服务完成的。 Systemd 是 UNIX System V 系统 (SYSVINIT) 的替代品。使用 systemd 管理 Ceph 守护进程的一般语法是 systemctl [options] {command} {service/target}

How to do it...

让我们详细了解使用 systemd 管理 Ceph 守护进程:

Starting and stopping all daemons

要启动或停止所有 Ceph 守护进程,请执行以下命令集。

让我们看看如何启动和停止所有 Ceph 守护进程:

  1. To start all Ceph services on a particular node, execute the systemd manager for the Ceph unit with the start command. This command will start all Ceph services that you have deployed for this node:
# systemctl start ceph.target
  1. To stop all Ceph services on one particular node, execute the systemd manager for the Ceph unit using the stop command. This command will stop all Ceph services that you have deployed for this node:
# systemctl stop ceph\*.service ceph\*.target
  1. To start/stop all Ceph services on a remote host, execute the systemd manager with the -H option (specifying the remote hostname) with the start or stop command on the Ceph unit.
  2. To start all Ceph services for ceph-node2 from ceph-node1 use the following command:
root@ceph-node1 # systemctl -H ceph-node2 start ceph.target
  1. To stop all Ceph services for ceph-node2 from ceph-node1 use the following command:
root@ceph-node1 # systemctl -H ceph-node2 stop ceph\*.service ceph\*.target
Since your ceph.conf file has all of your Ceph hosts defined and your current node can ssh to all those other nodes you can use the -H option to start and stop all Ceph services for a particular host from another remote host. The ceph.conf file should be identical in all the nodes.

Querying systemd units on a node

要列出 Ceph 节点上的 Ceph systemd 单元,请执行以下命令集。

让我们看看如何确定哪些 Ceph 服务正在特定节点上运行。这有助于确定哪些 OSD 或 MON 服务在某个节点上运行的位置:

  1. To list all the Ceph systemd units on a node, execute the systemd manager for the Ceph service/target using the status command. This command will display all active services/targets systemd has loaded:
# systemctl status ceph\*.service ceph\*.target
  1. To list the status on a particular Ceph service, execute the systemd manager for the specified Ceph service using the status command. To check the status of mon.0 issue:
root@ceph-node1 # systemctl status ceph-mon@ceph-node1

要检查 osd.1 问题的状态:

root@ceph-node1 # systemctl status ceph-osd@1
  1. To list all the systemd units on a particular node from a remote host, execute the systemd manager with the -H option (specifying the remote hostname) using the status command. This command will display all active services/targets systemd has loaded on a remote host. To check all systemd units on ceph-node2 from ceph-node1 issue:
root@ceph-node1 # systemctl -H ceph-node2 status ceph\*.service ceph\*.target
  1. To list the status on a particular Ceph service on a remote host, execute the systemd manager with the -H option (specifying the remote hostname), using the status command. To check the status of mon.1 from ceph-node1 issue:
root@ceph-node1 # systemctl -H ceph-node2 status ceph-mon@ceph-node2

Starting and stopping all daemons by type

要按类型启动或停止所有 Ceph 守护进程,请执行以下命令集:

按类型启动守护进程:

  1. To start the Ceph monitor daemons on localhost, execute the systemd manager with the start command followed by the daemon type:
# systemctl start ceph-mon.target
  1. To start the Ceph monitor daemon on a remote host, execute the same command with the -H option and the specified hostname. To start mon.1 from ceph-node1 issue:
root@ceph-node1 # systemctl -H ceph-node2 start ceph-mon.target
  1. Similarly, you can start daemons of other types, that is osds, mds, and ceph-radosgw by issuing:
# systemctl start ceph-osd.target
# systemctl start ceph-mds.target
# systemctl start ceph-radosgw.target

按类型停止守护进程:

  1. To stop the Ceph monitor daemons on a localhost, execute the systemd manager with the stop command followed by the daemon type:
# systemctl stop ceph-mon.target
  1. To stop the Ceph monitor daemon on a remote host, execute the same command with the -H option and the specified hostname. To stop mon.1 from ceph-node1 issue:
 root@ceph-node1 # systemctl -H ceph-node2 stop ceph.mon.target
  1. Similarly, you can stop daemons of other types, that is osds, mds, and ceph-radosgw by issuing:
 # systemctl stop ceph-osd.target
 # systemctl stop ceph-mon.target
 # systemctl stop ceph-radosgw.target

Starting and stopping a specific daemon

要启动或停止特定的 Ceph 守护进程,请执行以下命令集:

通过实例启动特定的守护进程:

要在本地主机上启动特定守护程序,请使用 start 命令执行 systemd 管理器,然后使用 {daemon_type}@{id/hostname},例如:

  1. Start the mon.0 daemon:
root@ceph-node1 # systemctl start ceph-mon@ceph-node1
  1. Similarly, you can start other daemons and their instances:
root@ceph-node1 # systemctl start ceph-osd@1
root@ceph-node1 # systemctl -H ceph-node2 start ceph-mon@ceph-node2
root@rgw-node1 # systemctl stop ceph-radosgw@rgw-node1

通过实例停止特定的守护进程:

要停止本地主机上的特定守护进程,请使用 stop 命令执行 systemd 管理器,后跟 {daemon_type}@{id/hostname},例如:

  1. Stop the mon.0 daemon:
 root@ceph-node1 # systemctl stop ceph-mon@ceph-node1
  1. Similarly, you can stop other daemons and their instances:
root@ceph-node1 # systemctl stop ceph-osd@1
root@ceph-node1 # systemctl -H ceph-node2 stop ceph-mon@ceph-node2</kbd>
root@rgw-node1 # systemctl start ceph-radosgw@rgw-node1

Scale-up versus scale-out

在构建存储基础架构时,可扩展性是最重要的设计方面之一。您为基础架构选择的存储解决方案应该具有足够的可扩展性,可以满足您未来的数据需求。通常,存储系统从小容量到中容量开始,逐渐发展为大型存储解决方案。

传统的存储系统是基于扩展设计的,并且受到一定存储容量的限制。如果您尝试将这些存储系统扩展超过某个限制,您可能需要牺牲性能、可靠性和可用性。存储的纵向扩展设计方法涉及向现有控制器系统添加磁盘资源,当它达到一定水平时,这将成为性能、容量和可管理性的瓶颈。

另一方面,横向扩展设计侧重于将包含磁盘、CPU、内存和其他资源的全新设备添加到现有存储集群中。使用这种类型的设计,您不会面临放大设计中遇到的挑战;它反而受益于线性性能改进。下图解释了存储系统的纵向扩展和横向扩展设计:

读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群

Ceph 是一个基于横向扩展设计的无缝可扩展存储系统,您可以在其中将带有一堆磁盘的计算节点添加到现有的 Ceph 集群中 并将您的存储系统扩展到更大的存储容量。

Scaling out your Ceph cluster

从一开始,Ceph 就被设计为从几个节点增长到数百个节点,并且它应该可以在不停机的情况下动态扩展。在本秘籍中,我们将通过添加 MON、OSD、MDS 和 RGW 节点深入研究 Ceph 横向扩展功能。

How to do it...

随着对增加集群容量的需求不断增长,扩展 Ceph 集群非常重要。让我们逐步扩展 Ceph 集群的几个区域:

Adding the Ceph OSD

向 Ceph 集群添加 OSD 节点是一个在线过程。为了证明这一点,我们需要一个名为 ceph-node4 的新虚拟机,其中包含三个充当 OSD 的磁盘。然后这个新节点将被添加到我们现有的 Ceph 集群中。

ceph-node1 运行以下命令,除非从任何其他节点另行指定:

  1. Create a new node, ceph-node4, with three disks (OSD). You can follow the process of creating a new virtual machine with disks and the OS configuration, as mentioned in the Setting up a virtual infrastructure recipe in Chapter 1, Ceph – Introduction and Beyond, and make sure ceph-node1 can ssh into ceph-node4.
    Before adding the new node to the Ceph cluster, let's check the current OSD tree. As shown in the following screenshot, the cluster has three nodes and a total of nine OSDs:
# ceph osd tree
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. Update the /etc/ansible/hosts file with ceph-node4 under the [osds] section:
    读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  2. Verify that Ansible can reach the newly added ceph-node4 mentioned in /etc/ansible/hosts:
 root@ceph-node1 # ansible all -m ping
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. List the available devices of ceph-node4 to be used as OSD's (sdb, sdc, and sdd):
root@ceph-node4 # lsblk
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. Review the osds.yml file on ceph-node1 and validate that it lists the specified devices corresponding to the storage devices on the OSD node ceph-node4 and that journal_collocation is set to true:
    读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  2. Run the Ansible playbook to deploy the OSD node ceph-node4 with three OSDs from the /usr/share/ceph-ansible directory:
root@ceph-node1 ceph-ansible # ansible-playbook site.yml
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. As soon as you add new OSDs to the Ceph cluster, you will notice that the Ceph cluster starts rebalancing the existing data to the new OSDs. You can monitor rebalancing using the following command; after a while, you will notice that your Ceph cluster becomes stable:
# watch ceph -s
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. Once the addition of the OSDs for ceph-node4 completes successfully, you will notice the cluster's new storage capacity:
# rados df
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
# ceph df
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. Check the OSD tree; it will give you a better understanding of your cluster. You should notice the new OSDs under ceph-node4, which have been recently added:
# ceph osd tree

读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. This command outputs some valuable information such as OSD weight, any reweight that may be set, primary affinity that is set, which Ceph node hosts which OSD, and the UP/DOWN status of an OSD.

刚才,我们学习了如何向现有的 Ceph 集群添加新节点。现在是了解随着 OSD 数量的增加,为 PG 选择正确的值变得更加重要的好时机,因为它对集群的行为有重大影响。在大型集群上增加 PG 计数可能是一项昂贵的操作。我鼓励你看看 http://docs.ceph.com/docs/master/rados/operations/placement-groups/#choosing-the-number-of-placement-groups span> 了解关于放置组(PGs)的任何更新信息。

Adding the Ceph MON

在部署了大型 Ceph 集群的环境中,您可能希望增加监视器数量。就像在 OSD 中一样,向 Ceph 集群添加新监视器是一个在线过程。在这个秘籍中,我们将配置 ceph-node4 作为监控节点。

由于这是一个测试 Ceph 集群,我们将添加 ceph-node4 作为第四个监控节点。但是,在生产设置中,您的 Ceph 集群中应该始终有一个 奇数 监控节点为了形成一个法定人数:

  1. Update the /etc/ansible/hosts file with ceph-node4 under the [mons] section:
    读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  2. Run the Ansible playbook to deploy the new MON on ceph-node4:
root@ceph-node1 ceph-ansible # ansible-playbook site.yml
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. Once ceph-node4 is configured as a monitor node, check the ceph status to see the cluster status. Please note that ceph-node4 is your new monitor node:
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. Check the Ceph monitor status and notice ceph-node4 as the new Ceph monitor:
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群

There's more...

对于对象存储用例,您必须使用 Ansible 部署 Ceph RGW 组件,为了使您的对象存储服务具有高可用性和高性能,您应该部署多个 Ceph 实例RGW。使用 Ansible,Ceph 对象存储服务可以轻松地从 RGW 的一个节点扩展到多个节点。

下图显示了如何部署和扩展多个 RGW 实例以提供 High -可用性(HA)对象存储服务:

读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群

扩展 RGW 与使用 Ansible 添加额外的 RGW 节点相同;请参阅 安装 Rados 网关 配方/a>第 4 章 使用 Ceph 对象存储,将更多 RGW 节点添加到您的 Ceph 环境。

Scaling down your Ceph cluster

存储系统最重要的特性之一是它的灵活性。一个好的存储解决方案应该足够灵活,以支持其扩展和缩减,而不会导致服务停机。传统存储系统灵活性有限;此类系统的扩展和缩减是一项艰巨的工作。有时,您会感到存储容量被锁定,无法根据需要进行更改。

Ceph 是一个绝对灵活的存储系统,支持动态更改存储容量,无论是扩展还是缩减。在上一个秘籍中,我们了解了扩展 Ceph 集群是多么容易。在这个秘籍中,我们将通过从 Ceph 集群中移除 ceph-node4 来缩减 Ceph 集群,而不会对数据可访问性产生任何影响。

How to do it...

由于 ceph-ansible 目前不支持删除 OSD 节点,让我们按照下一组步骤手动执行此操作。

Removing the Ceph OSD

在继续缩减集群大小、缩小规模或移除 OSD 节点之前,请确保集群有足够的可用空间来容纳您计划移出的节点上的所有数据。集群不应处于其完整比率,即 OSD 中已用磁盘空间的百分比。因此,作为最佳实践,不要删除 OSD 或 OSD 节点而不考虑对完整比率的影响。在撰写本书时,Ceph-Ansible 不支持缩减集群中的 Ceph OSD 节点,这必须手动完成。

  1. As we need to scale down the cluster, we will remove ceph-node4 and all of its associated OSDs out of the cluster. Ceph OSDs should be set out so that Ceph can perform data recovery. From any of the Ceph nodes, take the OSDs out of the cluster:
 # ceph osd out osd.9
 # ceph osd out osd.10
 # ceph osd out osd.11
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. As soon as you mark an OSD out of the cluster, Ceph will start rebalancing the cluster by migrating the PGs out of the OSDs that were made out to other OSDs inside the cluster. Your cluster state will become unhealthy for some time, but it will be good for the server data to clients. Based on the number of OSDs removed, there might be some drop in cluster performance until the recovery time is complete. You can throttle the backfill and recovery as covered in this chapter in throttle backfill and recovery section.
    Once the cluster is healthy again, it should perform as usual:
# ceph -s
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群

在这里,您可以看到集群处于恢复模式,但同时正在向客户端提供数据。您可以使用以下命令观察恢复过程:

        # ceph -w
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. As we have marked osd.9, osd.10, and osd.11 as out of the cluster, they will not participate in storing data, but their services are still running. Let's stop these OSDs:
root@ceph-node1 # systemctl -H ceph-node4 stop ceph-osd.target

一旦 OSD 关闭,检查 OSD 树;您会观察到 OSD 关闭并退出:

# ceph osd tree
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. Now that the OSDs are no longer part of the Ceph cluster, let's remove them from the CRUSH map:
# ceph osd crush remove osd.9
# ceph osd crush remove osd.10
# ceph osd crush remove osd.11
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. As soon as the OSDs are removed from the CRUSH map, the Ceph cluster becomes healthy. You should also observe the OSD map; since we have not removed the OSDs, it will still show 12 OSDs, 9 UP, and 9 IN:
# ceph -s
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. Remove the OSD authentication keys:
# ceph auth del osd.9
# ceph auth del osd.10
# ceph auth del osd.11
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. Finally, remove the OSD and check your cluster status; you should observe 9 OSDs, 9 UP, and 9 IN, and the cluster health should be OK:
# ceph osd rm osd.9
# ceph osd rm osd.10
# ceph osd rm osd.11
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. To keep your cluster clean, perform some housekeeping; as we have removed all the OSDs from the CRUSH map, ceph-node4 does not hold any items. Remove ceph-node4 from the CRUSH map; this will remove all the traces of this node from the Ceph cluster:
# ceph osd crush remove ceph-node4
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. Once the OSD node has been removed from the cluster and the CRUSH map, a final validation of the Ceph status should be done to verify HEALTH_OK:
# ceph -s
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. To complete removal of ceph-node4 from the cluster, update the /etc/ansible/hosts file on ceph-node1 and remove ceph-node4 from the [osds] section so the next time the playbook is run it will not redeploy ceph-node4 as an OSD node:
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群

Removing the Ceph MON

删除 Ceph MON 通常不是一项非常频繁的任务。当您从集群中删除监视器时,请考虑 Ceph 监视器使用 PAXOS 算法来建立关于主集群映射的共识。您必须有足够数量的监视器来建立在集群映射上达成共识的法定人数。在这个秘籍中,我们将学习如何从 Ceph 集群中移除 ceph-node4 监视器。 在撰写本书时,ceph-ansible 不支持缩减集群中的 Ceph MON 节点,这必须手动完成。

  1. Check the monitor status:
# ceph mon stat
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. Stop the monitor service on ceph-node4:
root@ceph-node1 # systemctl -H ceph-node4 stop ceph-mon.target
  1. Remove the monitor from the cluster:
# ceph mon remove ceph-node4
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. Check to see that your monitors have left the quorum:
# ceph quorum_status --format json-pretty
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. Update the /etc/ansible/hosts file and remove ceph-node4 from the [mons] section so ceph-node4 is not redeployed as a mon and the ceph.conf file is properly updated:
    读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  2. You can choose to back up the monitor data on ceph-node4 or remove it. To back it up, you can create a removed directory and move the data there:
 # mkdir /var/lib/ceph/mon/removed
 # mv /var/lib/ceph/mon/ceph-ceph-node4 /var/lib/ceph/mon/removed/ceph-ceph-node4
  1. If you choose not to back up the monitor data, then remove the monitor data on ceph-node4:
# rm -r /var/lib/ceph/mon/ceph-ceph-node4
  1. Re-run the Ansible playbook to update the ceph.conf on all the nodes in the cluster to complete the removal of monitor ceph-node4:
root@ceph-node1 # ansible-playbook site.yml
  1. Finally, check the monitor status; the cluster should have three monitors:
    读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群

Replacing a failed disk in the Ceph cluster

一个 Ceph 集群可以由 10 到数千个为集群提供存储容量的物理磁盘组成。随着 Ceph 集群的物理磁盘数量增加,磁盘故障的频率也会增加。因此,更换发生故障的磁盘驱动器可能会成为 Ceph 存储管理员的重复性任务。在这个秘籍中,我们将了解 Ceph 集群的磁盘更换过程。

How to do it...

这些步骤将引导您完成 Ceph OSD 的正确更换过程:

  1. Let's verify cluster health; since this cluster does not have any failed disk status, it would be HEALTH_OK:
# ceph status
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. Since we are demonstrating this exercise on virtual machines, we need to forcefully fail a disk by bringing ceph-node1 down, detaching a disk, and powering up the VM. Execute the following commands from your HOST machine:
# VBoxManage controlvm ceph-node1 poweroff
# VBoxManage storageattach ceph-node1 --storagectl "SATA" --port 1 --device 0 --type hdd --medium none
# VBoxManage startvm ceph-node1

以下屏幕截图将是您的输出:

读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. Now ceph-node1 contains a failed disk, osd.0, which should be replaced:
# ceph osd tree
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
# ceph -s
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群

您还会注意到 osd.0 已关闭。但是,在 ceph osd 树 中,它的权重仍为 1.00000,这意味着它仍被标记为 IN。只要其状态标记为 IN,Ceph 集群就不会触发该驱动器的数据恢复。进一步查看 ceph -s 可以看到 osdmap9 个 osd: 8 up< /kbd>,9 in。默认情况下,Ceph 集群需要 300 秒将停机磁盘标记为 OUT,然后触发数据恢复。此超时的原因是为了避免由于短期中断(例如服务器重新启动)而导致不必要的数据移动。如果您愿意,可以增加甚至减少此超时值。

  1. You should wait 300 seconds to trigger data recovery, or else you can manually mark the failed OSD as OUT:
 # ceph osd out osd.0
  1. As soon as the OSD is marked OUT, the Ceph cluster will initiate a recovery operation for the PGs that were hosted on the failed disk. You can watch the recovery operation using the following command:
 # ceph status
  1. Let's now remove the failed disk OSD from the Ceph CRUSH map:
# ceph osd crush rm osd.0
  1. Delete the Ceph authentication keys for the OSD:
# ceph auth del osd.0
  1. Finally, remove the OSD from the Ceph cluster:
# ceph osd rm osd.0
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. Since one of your OSDs is unavailable, the cluster health will not be OK, and the cluster will be performing recovery. Nothing to worry about here; this is a normal Ceph operation. Once the recovery operation is complete, your cluster will attain HEALTH_OK:
 # ceph -s
 # ceph osd stat
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. At this point, you should physically replace the failed disk with the new disk on your Ceph node. These days, almost all the servers and server OS support disk hot swapping, so you will not require any downtime for disk replacement.
  2. 由于我们是在虚拟机上模拟这个,我们需要关闭虚拟机,添加一个新磁盘,然后重新启动虚拟机。插入磁盘后,记下其操作系统设备 ID:

# VBoxManage controlvm ceph-node1 poweroff
# VBoxManage storageattach ceph-node1 --storagectl "SATA" --port 1 --device 0 --type hdd --medium ceph-node1_disk2.vdi
# VBoxManage startvm ceph-node1
  1. Before adding the new disk back into the cluster, we will zap the disk to validate it is in a clean state:
root@ceph-node1 # ceph-disk zap /dev/sdb
  1. View the device to validate that partitions were cleared with zap:
# lsblk
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. Add the new disk into the cluster using the ceph-disk prepare command:
root@ceph-node1 # ceph-disk --setuser ceph --setgroup ceph prepare --fs-type xfs /dev/sdb

ceph-disk prepare 命令完成了创建 OSD、OSD 密钥、身份验证、将 OSD 放入 CRUSH 映射等所有手动工作:

读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. Check the device after the prepare completes to validate that the OSD directory is mounted:
# lsblk
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. Once the ceph-disk prepare command completes, the OSD will be added to the cluster successfully and Ceph will perform a backfilling operation and will start moving PGs from secondary OSDs to the new OSD. The recovery operation might take a while, but after it, your Ceph cluster will be HEALTH_OK again:
 # ceph -s
 # ceph osd stat
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群

Upgrading your Ceph cluster

Ceph伟大的几个原因之一是Ceph集群上的几乎所有操作都可以在线执行,这意味着您的Ceph集群处于生产状态并为客户端提供服务,您可以在不停机的情况下在集群上执行管理任务。其中一项操作是升级 Ceph 集群版本。

从第一章开始,我们就一直在使用 Ceph 的 Jewel 版本。我们将演示使用 /usr/share/ceph-ansible/infrastructure-playbooks 目录中的 Ansible rolling_update.yml playbook 将 Ceph 集群版本从 Jewel 升级到 Kraken . rolling_update.yml playbook 完全自动化了 Ceph 集群升级过程。

Ansible 按以下顺序升级 Ceph 节点,一次一个:

  • Monitor nodes
  • OSD nodes
  • MDS nodes
  • Ceph RadosGW nodes
  • All other Ceph client nodes

在升级过程中,Ansible 还会在集群上设置 nooutnoscrubnodeep-scrub 标志,以防止在集群上进行任何不必要的数据移动。集群和擦洗的开销。 Ansible 在升级过程中还具有内置检查功能,它将检查集群 PG 状态,如果集群遇到问题,将不会继续前进。

Once you upgrade a Ceph daemon, you cannot downgrade it. It's very much recommended to refer to the release-specific sections at http://docs.ceph.com/docs/master/release-notes/ to identify release-specific procedures for upgrading the Ceph cluster.

How to do it...

在这个秘籍中,我们将在 Jewel 版本 (10.2.9) 上运行的 Ceph 集群升级到最新的稳定 Kraken (11.2.1) 版本:< /span>

  1. On ceph-node1 navigate to the /usr/share/ceph-ansible/group_vars/all.yml file and change the ceph_stable_release from Jewel to Kraken:
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. On ceph-node1 navigate to the /usr/share/ceph-ansible/group_vars/all.yml file and uncomment and change the upgrade_ceph_packages from False to True:
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. Copy the rolling_update.yml from the infrastructure-playbooks directory to the /usr/share/ceph-ansible directory:
# cp /usr/share/ceph-ansible/infrastructure-playbooks/rolling_update.yml /usr/share/ceph-ansible
  1. Run the rolling_update.yml playbook:
# ansible-playbook rolling_update.yml
  1. Once the playbook completes, validate the new running Ceph version on our Ceph nodes using ceph tell:
# ceph tell mon.* version
# ceph tell osd.* version
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
  1. Running ceph -v will also show the newly upgraded Kraken (11.2.1) running on the Ceph cluster:
 # ceph -v
读书笔记《ceph-cookbook-second-edition》操作和管理Cave集群
运行 rolling_update.yml playbook 会提示一个问题: 您确定要升级集群吗?。一旦 Yes 输入回复,Ansible 将启动升级;这是您中止升级的最后机会!

Maintaining a Ceph cluster

作为 Ceph 存储管理员,维护 Ceph 集群将是您的首要任务之一。 Ceph 是一个分布式系统,旨在从几十个 OSD 增长到几千个。维护 Ceph 集群所需的关键事项之一是管理其 OSD。在这个秘籍中,我们将介绍 OSD 和 PG 的 Ceph 子命令,这将在集群维护和故障排除期间为您提供帮助。

How to do it...

为了更好地理解对这些命令的需求,让我们假设您要向生产 Ceph 集群添加一个新节点的场景。一种方法是简单地将带有多个磁盘的新节点添加到 Ceph 集群,集群将开始回填并将数据混洗到新节点上。这对于测试集群来说很好。

但是,当涉及到生产设置时,情况变得非常危急,您应该使用一些 ceph osd 子命令/标志,如下所述,在集群中添加新节点之前,如noinnobackfill 等等。这样做是为了在新节点进入时您的集群不会立即开始回填过程。然后您可以在非高峰时段取消设置这些标志,并且集群将花时间重新平衡:

  1. 这些标志的使用就像设置和取消设置一样简单。例如,要设置标志,请使用以下命令行:

# ceph osd set <flag_name>
# ceph osd set noout
# ceph osd set nodown
# ceph osd set norecover
  1. Now to unset the same flags, use the following command lines:
# ceph osd unset <flag_name>
# ceph osd unset noout
# ceph osd unset nodown
# ceph osd unset norecover

How it works...

我们现在将了解这些标志是什么以及为什么使用它们:

  • noout: This forces the Ceph cluster to not mark any OSD as out of the cluster, irrespective of its status. It makes sure all the OSDs remain inside the cluster.
  • nodown: This forces the Ceph cluster to not mark any OSD down, irrespective of its status. It makes sure all the OSDs remain UP and none of them DOWN.
  • noup: This forces the Ceph cluster to not mark any down OSD as UP. So, any OSD that is marked DOWN can only come UP after this flag is unset. This also applies to new OSDs that are joining the cluster.
  • noin: This forces the Ceph cluster to not allow any new OSD to join the cluster. This is quite useful if you are adding several OSDs at once and don't want them to join the cluster automatically.
  • norecover: This forces the Ceph cluster to not perform cluster recovery.
  • nobackfill: This forces the Ceph cluster to not perform backfilling. This is quite useful when you are adding several OSDs at once and don't want Ceph to perform automatic data placement on the new node.
  • norebalance: This forces the Ceph cluster to not perform cluster rebalancing.
  • noscrub: This forces Ceph to not perform OSD scrubbing.
  • nodeep-scrub: This forces Ceph to not perform OSD deep scrubbing.

Throttle the backfill and recovery:

如果您想在生产高峰时间或非高峰时间添加新的 OSD 节点,并且希望与 Ceph 数据重新平衡相比对客户端 IO 的影响最小 - 由于新的 OSD 新的恢复和回填 IO。您可以借助以下命令来限制回填和恢复:

  • Set osd_max_backfills = 1 option to throttle the backfill threads. You can add this in ceph.conf [osd] section and you can also set it dynamically with the following command:
    # ceph tell osd.* injectargs '--osd_max_backfills 1'
  • Set osd_recovery_max_active = 1 option to throttle the recovery threads. You can add this in ceph.conf [osd] section and you can also set it dynamically with the following command:
    # ceph tell osd.* injectargs '--osd_recovery_max_active 1'
  • Set osd_recovery_op_priority = 1 option to lower the recovery priority. You can add this in ceph.conf [osd] section and you can also set it dynamically with the following command:
    # ceph tell osd.* injectargs '--osd_recovery_op_priority 1'

在 Ceph 的 Jewel 版本中,当在 Jewel 上安装 Ceph 集群时,默认启用两个附加标志。如果集群是从 Jewel 之前的版本(例如 Hammer)升级而来的,则可以启用这些标志:

  • sortbitwise: The sortbitwise flag indicates that objects are sorted in a bitwise fashion. The old sort order nibblewise, was an historical artifact of filestore that is simply inefficient with the current version of Ceph. Bitwise sort order makes operations that require listing objects, like backfill and scrubbing, a bit more efficient:
# ceph osd set sortbitwise
  • require_jewel_osds: This flag prevents any pre-Jewel OSDs from joining the Ceph cluster. The purpose of this flag is to prevent an OSD from joining the cluster that will not support features that the Jewel code supports leading to possible OSD flapping and cluster issues:
# ceph osd set require_jewel_osds
设置 按位排序 flag 是一个破坏性的变化,因为每个 PG 都必须通过对等互连,并且每个客户端都必须重新发送 inflight 请求。设置此标志后集群中没有数据移动。另请注意,集群中的所有 OSD 必须在设置此标志之前运行 Jewel。

除了这些标志,你还可以使用以下命令修复OSD和PG:

  • ceph osd repair: This performs repairing on a specified OSD.
  • ceph pg repair: This performs repairing on a specified PG. Use this command with caution; based on your cluster state, this command can impact user data if not used carefully.
  • ceph pg scrub: This performs scrubbing on a specified PG.
  • ceph deep-scrub: This performs deep-scrubbing on specified PGs.

Ceph CLI 对于端到端集群管理非常强大。您可以在 http://docs.ceph.com/docs/master/ 获得更多信息rados/man/.

d 经理服务。每次您启动,重新启动,然后 stop Ceph 守护进程(或您的整个集群),您必须至少指定一个选项和一个命令。您还可以指定守护程序类型或守护程序实例。一般语法如下:

systemctl [选项...] 命令 [服务名称...]

systemctl 选项包括:

  • --help or -h: Prints a short help text
  • --all or -a: When listing units, show all loaded units, regardless of their state
  • --signal or -s: When used will kill, choose which signal to send to the selected process
  • --force or -f: When used with enable, overwrite any existing conflicting symlinks
  • --host or -h: Execute an operation on a remote host

systemctl 命令包括以下内容:

  • status: Shows status of the daemon
  • start: Starts the daemon
  • stop: Stops the daemon
  • restart: Stops and then starts the daemon
  • kill: Kills the specified daemon
  • reload: Reloads the config file without interrupting pending operations
  • list-units: List known units managed by systemd
  • condrestart: Restarts if the service is already running
  • enable: Turns the service on for the next boot or other triggering event
  • disable: Turns the service off for the next boot or other triggering event
  • is-enabled: Used to check whether a service is configured to start or not in the current environment

systemctl 可以针对以下 Ceph 服务类型:

    • ceph-mon
    • ceph-osd
    • ceph-mds
    • ceph-radosgw