vlambda博客
学习文章列表

读书笔记《ceph-cookbook-second-edition》使用Cave对象存储

Working with Ceph Object Storage

在本章中,我们将介绍以下秘籍:

  • Understanding Ceph object storage
  • RADOS Gateway standard setup, installation, and configuration
  • Creating the radosgw user
  • Accessing the Ceph object storage using the S3 API
  • Accessing the Ceph object storage using the Swift API
  • Integrating RADOS Gateway with OpenStack Keystone
  • Integrating RADOS Gateway with Hadoop S3A plugin

Introduction

随着组织正在为其庞大的数据寻求灵活性,基于对象的存储已经引起了业界的广泛关注。对象存储是一种以对象而不是传统文件和块的形式存储数据的方法,每个对象都存储数据、元数据和唯一标识符。在本章中,我们将了解 Ceph 的对象存储部分,并通过配置 Ceph RADOS 网关获得实用知识。

Understanding Ceph object storage

对象存储不能作为文件系统的磁盘被操作系统直接访问。相反,它只能通过应用程序级别的 API 访问。 Ceph 是一个分布式对象存储系统,通过 Ceph 对象网关提供对象存储接口,也称为 RADOS 网关(RGW)接口,已构建在上面Ceph RADOS 层。 RGW 使用 librgwRADOS 网关库)和 librados< /span>,允许应用程序与 Ceph 对象存储建立连接。 RGW 为应用程序提供了一个 RESTful S3 / Swift 兼容的 API 接口,以在 Ceph 集群中以对象的形式存储数据。 Ceph 还支持多租户对象存储,可通过 RESTful API 访问。除此之外,RGW 还支持 Ceph Admin API,可用于使用原生 API 调用管理 Ceph 存储集群。

librados 软件库非常灵活,可以让用户应用程序通过 C、C++、Java、Python 和 PHP 绑定直接访问 Ceph 存储集群。 Ceph对象存储还具备多站点能力,即提供灾难恢复解决方案。

下图表示一个 Ceph 对象存储:

读书笔记《ceph-cookbook-second-edition》使用Cave对象存储

RADOS Gateway standard setup, installation, and configuration

对于生产环境,建议您在物理专用机器上配置 RGW。但是,如果您的对象存储工作量不是太大,您可以考虑使用任意一台监控机器作为 RGW 节点。 RGW 是一个独立的服务,它从外部连接到 Ceph 集群并为其客户端提供对象存储访问。在生产环境中,建议您运行多个 RGW 实例,由 Load Balancer 屏蔽,如下图所示:

读书笔记《ceph-cookbook-second-edition》使用Cave对象存储

从 Ceph 的 Firefly 版本开始,引入了一个新的 RGW 前端:Civetweb,它是一个轻量级的独立 Web 服务器。 Civetweb 已直接嵌入到 ceph-radosgw 服务中,使 Ceph 对象存储服务部署更快更容易。跨度>

在以下秘籍中,我们将在虚拟机上使用 Civetweb 演示 RGW 配置,该虚拟机将与我们在 第 1 章Ceph - 简介及其他。< /span>

Setting up the RADOS Gateway node

要运行 Ceph 对象存储服务,我们应该有一个正在运行的 Ceph 集群,并且 RGW 节点应该可以访问 Ceph 网络。

就像您在 第 1 章中创建的那样,< em>Ceph – 介绍和超越,具有以下 Ceph 状态:

读书笔记《ceph-cookbook-second-edition》使用Cave对象存储

How to do it…

如前几章所述,我们将使用 Vagrant 启动一个虚拟机并将其配置为我们的 RGW 节点:

  1. Launch rgw-node1 using vagrantfile, as we have done for Ceph nodes in Chapter 1, Ceph – Introduction and Beyond. Make sure you are on the host machine and under the Ceph-Cookbook-Second-Edition repository before bringing up rgw-node1 using Vagrant:
        # cd Ceph-Cookbook-Second-Edition
# vagrant up rgw-node1
  1. Once rgw-node1 is up, check the Vagrant status, and log into the node:
        $ vagrant status rgw-node1
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
        $ vagrant ssh rgw-node1
  1. Upgrade to the latest CentOS 7.4, you can use the following command:
        $ sudo yum update -y
  1. Check if rgw-node1 can reach the Ceph cluster nodes:
        # ping ceph-node1 -c 3
# ping ceph-node2 -c 3
# ping ceph-node3 -c 3
  1. Verify the localhost file entries, hostname, and FQDN for rgw-node1:
        # cat /etc/hosts | grep -i rgw
# hostname
# hostname -f
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储

Installing and configuring the RADOS Gateway

上一个秘籍是关于为 RGW 设置一个虚拟机。在这个秘籍中,我们将学习在这个节点上设置 ceph-radosgw 服务。

How to do it…

  1. To install and configure the Ceph RGW, we will use the ceph-ansbile from ceph-node1, which is our ceph-ansible and one of the monitor node. Log in to the ceph-node1 and perform the following commands:
    1. Make sure that the ceph-node1 can reach the rgw-node1 over the network by using the following command:
                # ping rgw-node1 -c 1
    1. Allow ceph-node1 a password-less SSH login to rgw-node1 and test the connection.

rgw-node1的root密码和之前一样,即vagrant
# ssh-copy-id rgw-node1
# ssh rgw-node1 主机名

  1. Add rgw-node1 to the ceph-ansible hosts file and test the Ansible ping command:
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
        # ansible all -m ping
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
  1. Update all.yml file to install and configure the Ceph RGW in the VM rgw-node1:
        [root@ceph-node1 ceph-ansible]# cd /usr/share/
ceph-ansible/group_vars/
[root@ceph-node1 group_vars]# vim all.yml
  1. Enable the radosgw_civetweb_port and radosgw_civetweb_bind_ip option. In this book, rgw-node1 has IP 192.168.1.106 and we are using port 8080:
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
  1. Change the directory back to /usr/share/ceph-ansible and then run the playbook, it will install and configure the RGW in rgw-node1:
        $ cd ..
$ ansible-playbook site.yml
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
  1. Once ceph-ansible finishes the installation and configuration, you will have the following recap output:
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
  1. Once it completes, you will have the radosgw daemon running in rgw-node1:
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
  1. And you will notice in the following screenshot that we now have more pools which got created for RGW:
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
  1. The Civetweb web server that is embedded into the radosgw daemon should now be running on the specified port, 8080:
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
  1. You will have the following entries related to this RGW in rgw-node1 VM /etc/ceph/ceph.conf:
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储

Creating the radosgw user

要使用 Ceph 对象存储,我们应该为 S3 接口创建一个初始 Ceph 对象网关用户,然后为 Swift 接口创建一个子用户。

How to do it…

以下步骤将帮助您创建 radosgw 用户:

  1. Make sure that the rgw-node1 is able to access the Ceph cluster:
        # ceph -s -k /var/lib/ceph/radosgw/ceph-rgw.rgw-node1/keyring 
--name client.rgw.rgw-node1
  1. Create a RADOS Gateway user for the S3 access:
      # radosgw-admin user create --uid=pratima 
--display-name="Pratima Umrao"
[email protected]
-k /var/lib/ceph/radosgw/ceph-rgw.rgw-node1/keyring
--name client.rgw.rgw-node1
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
  1. The values keys (access_key) and the keys (secret_key) would be required later in this chapter for access validation.
  2. To use Ceph object storage with the Swift API, we need to create a Swift subuser on the Ceph RGW:
         # radosgw-admin subuser create --uid=pratima 
--subuser=pratima:swift --access=full
-k /var/lib/ceph/radosgw/ceph-rgw.rgw-node1/keyring
--name client.rgw.rgw-node1
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储

See also…

使用 Swift API 访问 Ceph 对象存储秘诀。

Accessing the Ceph object storage using S3 API

Amazon Web Services 提供 Simple Storage Service (S3),通过 REST 等 Web 界面提供存储。 Ceph 通过 RESTful API 扩展了与 S3 的兼容性。 S3 客户端应用程序可以根据访问和密钥访问 Ceph 对象存储。

S3 还需要一个 DNS 服务器,因为它使用虚拟主机存储桶命名约定,即 <object_name>。<RGW_Fqdn> 。例如,如果您有一个名为 jupiter 的存储桶,则可以通过 HTTP 通过 URL 访问它,< span>http://jupiter.rgw-node1.cephcookbook.com.

How to do it…

执行以下步骤在 rgw-node1 节点上配置 DNS。如果您有现有的 DNS 服务器,则可以跳过 DNS 配置并使用您的 DNS 服务器。

Configuring DNS

  1. Install bind packages on the ceph-rgw node:
        # yum install bind* -y
  1. Edit /etc/named.conf and add information for IP addresses, IP range, and zone, which are mentioned as follows. You can match the changes from the author's version of the named.conf file provided with this book:
 listen-on port 53 { 127.0.0.1;192.168.1.106; }; 
### Add DNS IP ###
allow-query { localhost;192.168.1.0/24; };
### Add IP Range ###
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
 ### Add new zone for the domain cephcookbook.com before EOF ###
zone "cephcookbook.com" IN {
type master;
file "db.cephcookbook.com";
allow-update { none; };
};
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
  1. Create the zone file /var/named/db.cephcookbook.com, with the following content:
 @ 86400 IN SOA cephcookbook.com. root.cephcookbook.com. (
20091028 ; serial yyyy-mm-dd
10800 ; refresh every 15 min
3600 ; retry every hour
3600000 ; expire after 1 month +
86400 ); min ttl of 1 day
@ 86400 IN NS cephbookbook.com.
@ 86400 IN A 192.168.1.106
* 86400 IN CNAME @
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
  1. Edit /etc/resolve.conf and add the following content on top of the file:
 search cephcookbook.com
nameserver 192.168.1.106
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
  1. Start the named service:
  # systemctl start named.service
  1. Test the DNS configuration files for any syntax errors:
        # named-checkconf /etc/named.conf
# named-checkzone cephcookbook.com
/var/named/db.cephcookbook.com
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
  1. Test the DNS server:
        # dig rgw-node1.cephcookbook.com
# nslookup rgw-node1.cephcookbook.com

Configuring the s3cmd client

要通过 S3 API 访问 Ceph 对象存储,我们应该使用 s3cmd 以及 DNS 客户端设置来配置客户端计算机。执行以下步骤来配置 s3cmd 客户端机器:

  1. Bring up the client-node1 virtual machine using Vagrant. This virtual machine will be used as a client machine for S3 object storage.

  1. Go to the Ceph-Cookbook-Second-Edition repository directory and run the following command:
        $ vagrant up client-node1
$ vagrant ssh client-node1
  1. Upgrade the client-node1 to the latest CentOS 7.4:
        $ sudo yum update -y 
$reboot
$ vagrant ssh client-node1
  1. Install the bind-utils package:
        # yum install bind-utils -y
  1. On the client-node1 machine, update /etc/resolve.conf with the DNS server entries on top of the file:
 search cephcookbook.com
nameserver 192.168.1.106
  1. Test the DNS settings on the client-node1:
        # dig rgw-node1.cephcookbook.com
# nslookup rgw-node1.cephcookbook.com
  1. client-node1 should be able to resolve all the subdomains for rgw-node1.cephcookbook.com:
        # ping mj.rgw-node1.cephcookbook.com -c 1
# ping anything.rgw-node1.cephcookbook.com -c 1

Configure the S3 client (s3cmd) on client-node1

以下命令用于在 client-node1 上配置 s3cmd

  1. Install s3cmd using the following command:
        # yum install s3cmd -y
  1. Configure s3cmd by providing the access_key and secret_key of the user, pratima, which we created earlier in this chapter. Execute the following command and follow the prompts:
        # s3cmd --configure
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储

s3cmd --configure 命令会创建 /root/.s3cfg.

  1. Edit this file for the RGW host details. Modify host_base and host_bucket, as shown. Make sure these lines do not have trailing spaces at the end:
 host_base = rgw-node1.cephcookbook.com:8080
host_bucket = %(bucket).rgw-node1.cephcookbook.com:8080
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
  1. Finally, we will create buckets and put objects into them:
        # s3cmd mb s3://first-bucket
# s3cmd ls
# s3cmd put /etc/hosts s3://first-bucket
# s3cmd ls s3://first-bucket
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储

Accessing the Ceph object storage using the Swift API

Ceph 支持与 Swift API 的基本数据访问模型兼容的 RESTful API。在上一节中,我们介绍了通过 S3 API 访问 Ceph 集群;在本节中,我们将学习通过 Swift API 访问它。

How to do it...

要将 Ceph 对象存储与 Swift API 一起使用,我们需要在本章前面创建的 Swift 子用户和密钥。然后将使用 Swift CLI 工具传递此用户信息,以访问 Ceph 对象存储:

  1. On the client-node1, a virtual machine installs the Python Swift client:
 # easy_install pip
# pip install --upgrade setuptools
# pip install --upgrade python-swiftclient
  1. Get the swift subuser and secret keys from the RGW node:
        # radosgw-admin user info --uid pratima 
-k /var/lib/ceph/radosgw/ceph-rgw.rgw-node1/keyring
--name client.rgw.rgw-node1
  1. Access Ceph object storage by listing the default bucket:
        # swift -A http://192.168.1.106:8080/auth/1.0 
-U pratima:swift
-K whUTYlKFeKvKO59O6wFOANoyoH37SUJEjBD9cQmH list
  1. Add a new bucket, second-bucket:
        # swift -A http://192.168.1.106:8080/auth/1.0 
-U pratima:swift
-K whUTYlKFeKvKO59O6wFOANoyoH37SUJEjBD9cQmH post second-bucket
  1. List the buckets; it should show the new second-bucket as well:
        # swift -A http://192.168.1.106:8080/auth/1.0 
-U pratima:swift
-K whUTYlKFeKvKO59O6wFOANoyoH37SUJEjBD9cQmH list
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储

Integrating RADOS Gateway with OpenStack Keystone

Ceph 可以与 OpenStack 身份管理服务 Keystone 集成。通过这种集成,Ceph RGW 被配置为接受 Keystone 令牌以获得用户权限。因此,任何通过 Keystone 验证的用户都将获得访问 RGW 的权限。

How to do it...

在你的 openstack-node1 上执行以下命令,除非另有说明:

  1. Configure OpenStack to point to the Ceph RGW by creating the service and its endpoints:
        # keystone service-create --name swift --type object-store 
--description "ceph object store"
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
   # keystone endpoint-create --service-id 
6614554878344bbeaa7fec0d5dccca7f --publicurl
http://192.168.1.106:8080/swift/v1 --internalurl
http://192.168.1.106:8080/swift/v1 --adminurl
http://192.168.1.106:8080/swift/v1 --region RegionOne
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
  1. Get the Keystone admin token, which will be used for the RGW configuration:
        # cat /etc/keystone/keystone.conf | grep -i admin_token
  1. Create a directory for certificates:
        # mkdir -p /var/ceph/nss
  1. Generate OpenSSL certificates:
        # openssl x509 -in /etc/keystone/ssl/certs/ca.pem 
-pubkey|certutil -d /var/ceph/nss -A -n ca -t "TCu,Cu,Tuw"
# openssl x509 -in /etc/keystone/ssl/certs/signing_cert.pem
-pubkey | certutil -A -d /var/ceph/nss -n signing_cert
-t "P,P,P"
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
  1. Create the /var/ceph/nss directory on rgw-node1:
        # mkdir -p /var/ceph/nss
  1. From openstack-node1, copy OpenSSL certificates to rgw-node1. If you are logging in for the first time, you will get an SSH confirmation; type yes and then type the root password, which is vagrant for all the machines:
        # scp /var/ceph/nss/* rgw-node1:/var/ceph/nss
  1. Update /etc/ceph/ceph.conf on rgw-node1 with the following entries under the [client.rgw.rgw-node1] section:
 rgw keystone url = http://192.168.1.111:5000
rgw keystone admin token = f72adb0238d74bb885005744ce526148
rgw keystone accepted roles = admin, Member, swiftoperator
rgw keystone token cache size = 500
rgw keystone revocation interval = 60
rgw s3 auth use keystone = true
nss db path = /var/ceph/nss

rgw keystone url 必须是可以从中获取的Keystone管理URL # keystone endpoint-list 命令。 rgw keystone 管理员令牌 是我们在本秘籍的步骤 2 中保存的令牌值。

  1. Finally, restart the ceph-radosgw service:
  # systemctl restart ceph-radosgw.target
  1. Now, to test the Keystone and Ceph integration, switch back to openstack-node1 and run the basic Swift commands, and it should not ask for any user keys:
        # export OS_STORAGE_URL=http://192.168.1.106:8080/swift/v1
# swift list
# swift post swift-test-bucket
# swift list
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
  1. Let us verify if the container swift-test-bucket got created in the RGW:
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储

Integrating RADOS Gateway with Hadoop S3A plugin 

对于需要 Hadoop 分布式文件系统 (HDFS) 访问的数据分析应用程序,可以使用 Hadoop 的 Apache S3A 连接器访问 Ceph 对象网关。 S3A 连接器是一个开源工具,它将与 S3 兼容的对象存储呈现为 HDFS 文件系统,在数据存储在 Ceph 对象网关中时,应用程序具有 HDFS 文件系统读写语义。

Ceph 对象网关 Jewel 版本 10.2.9 与 Hadoop 2.7.3 附带的 S3A 连接器完全兼容。

How to do it...

您可以使用 client-node1 来配置 Hadoop S3A 客户端。

  1. Install Java packages in the client-node1:
        # yum install java* -y
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
  1. Extract the Hadoop .tar file:
        # tar -xvf hadoop-2.7.3.tar.gz 
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
  1. Add the following in the .bashrc file:
        export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk
export
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/
bin:/root/hadoop-2.7.3/bin
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
  1. Update the /root/hadoop-2.7.3/etc/hadoop/core-site.xml file with the following details. Add the RGW node IP and Port and we have the RGW user pratima as the access key and secret key.
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
  1. You can now upload a file using the hadoop distcp command to your RGW first-bucket:
        # hadoop distcp /root/anaconda-ks.cfg s3a://first-bucket/

您将在命令行中获得初始地图日志:

读书笔记《ceph-cookbook-second-edition》使用Cave对象存储

完成上传后,您将拥有以下日志:

读书笔记《ceph-cookbook-second-edition》使用Cave对象存储
  1. Now you can verify if the anaconda-ks.cfg file got uploaded to the first-bucket:
读书笔记《ceph-cookbook-second-edition》使用Cave对象存储