云原生时代 RocketMQ 运维管控的利器 - RocketMQ Operator
作者 | 刘睿、杜恒
导读: 现已加入 OperatorHub,正式进入 Operator 社区。本文将从实践出发,结合案例来说明,如何通过 RocketMQ Operator 在 Kubernetes 上快速搭建一个 RocketMQ 集群,并提供一些 RocketMQ 集群管理功能包括 Broker 扩容等。
本文主要分为三个部分:
首先简单介绍一下 RocketMQ Operator 的相关知识;
然后结合案例详细介绍 RocketMQ Operator 提供的自定义资源及使用方法;
最后介绍 Operator 社区目前的情况并展望 RocketMQ Operator 下一步的发展方向。
相关背景知识
1. RocketMQ
2012~2013 年期间,阿里巴巴中间件团队自主研发并对外开源了第三代分布式消息引擎 RocketMQ,其高性能、低延迟、抗堆积的特性稳定支撑了阿里巴巴 双11 万亿级数据洪峰业务,其云产品 Aliware MQ 在微服务、流计算、IoT、异步解耦、数据同步等无数工况场景大放异彩。
2016 年,阿里巴巴向 Apache 软件基金会捐赠了 RocketMQ。次年,RocketMQ 顺利从基金会毕业,成为 Apache 顶级开源项目,与 Apache Hadoop,Apache Spark 一起为全球分布式、大数据领域的开发者带来福音。然而,在云原生时代的今天,RocketMQ 作为有状态的分布式服务系统,如何在大规模集群上做到极简运维,则是一个极具挑战和价值的问题。
RocketMQ 支持多种部署方式,以基本的双主双从架构为例,如下图所示。
RocketMQ 双主双从架构
2. Kubernetes Operator
快速开始
-
准备好 K8s 环境,可以使用 docker desktop 自带的 K8s,或者 minikube;
-
克隆 rocketmq-operator 仓库到你的 K8s 节点上;
$ git clone https://github.com/apache/rocketmq-operator.git
$ cd rocketmq-operator
-
运行脚本安装 RocketMQ Operator;
$ ./install-operator.sh
-
检查下 RocketMQ Operator 是否安装成功
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
rocketmq-operator-564b5d75d-jllzk 1/1 Running 0 108s
-
应用 Broker 和 NameService 自定义资源,创建 RocketMQ 集群;
apiVersion: rocketmq.apache.org/v1alpha1
kind: Broker
metadata:
# name of broker cluster
name: broker
spec:
# size is the number of the broker cluster, each broker cluster contains a master broker and [replicaPerGroup] replica brokers.
size: 1
# nameServers is the [ip:port] list of name service
nameServers: ""
# replicationMode is the broker replica sync mode, can be ASYNC or SYNC
replicationMode: ASYNC
# replicaPerGroup is the number of each broker cluster
replicaPerGroup: 1
# brokerImage is the customized docker image repo of the RocketMQ broker
brokerImage: apacherocketmq/rocketmq-broker:4.5.0-alpine
# imagePullPolicy is the image pull policy
imagePullPolicy: Always
# resources describes the compute resource requirements and limits
resources:
requests:
memory: "2048Mi"
cpu: "250m"
limits:
memory: "12288Mi"
cpu: "500m"
# allowRestart defines whether allow pod restart
allowRestart: true
# storageMode can be EmptyDir, HostPath, StorageClass
storageMode: EmptyDir
# hostPath is the local path to store data
hostPath: /data/rocketmq/broker
# scalePodName is broker-[broker group number]-master-0
scalePodName: broker-0-master-0
# volumeClaimTemplates defines the storageClass
volumeClaimTemplates:
- metadata:
name: broker-storage
spec:
accessModes:
- ReadWriteOnce
storageClassName: rocketmq-storage
resources:
requests:
storage: 8Gi
---
apiVersion: rocketmq.apache.org/v1alpha1
kind: NameService
metadata:
name: name-service
spec:
# size is the the name service instance number of the name service cluster
size: 1
# nameServiceImage is the customized docker image repo of the RocketMQ name service
nameServiceImage: apacherocketmq/rocketmq-nameserver:4.5.0-alpine
# imagePullPolicy is the image pull policy
imagePullPolicy: Always
# hostNetwork can be true or false
hostNetwork: true
# Set DNS policy for the pod.
# Defaults to "ClusterFirst".
# Valid values are 'ClusterFirstWithHostNet', 'ClusterFirst', 'Default' or 'None'.
# DNS parameters given in DNSConfig will be merged with the policy selected with DNSPolicy.
# To have DNS options set along with hostNetwork, you have to specify DNS policy
# explicitly to 'ClusterFirstWithHostNet'.
dnsPolicy: ClusterFirstWithHostNet
# resources describes the compute resource requirements and limits
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1024Mi"
cpu: "500m"
# storageMode can be EmptyDir, HostPath, StorageClass
storageMode: EmptyDir
# hostPath is the local path to store data
hostPath: /data/rocketmq/nameserver
# volumeClaimTemplates defines the storageClass
volumeClaimTemplates:
- metadata:
name: namesrv-storage
spec:
accessModes:
- ReadWriteOnce
storageClassName: rocketmq-storage
resources:
requests:
storage: 1Gi
注意到这个例子中 storageMode: EmptyDir,表示存储使用的是 EmptyDir,数据会随着 Pod 的删除而抹去,因此该方式仅供开发测试时使用。一般使用 HostPath 或 StorageClass 来对数据进行持久化存储。使用 HostPath 时,需要配置 hostPath,声明宿主机上挂载的目录。使用 storageClass 时,需要配置 volumeClaimTemplates,声明 PVC 模版。具体可参考 RocketMQ Operator 文档。
$ kubectl apply -f example/rocketmq_v1alpha1_rocketmq_cluster.yaml
broker.rocketmq.apache.org/broker created
nameservice.rocketmq.apache.org/name-service created
$ kubectl get pods -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
broker-0-master-0 1/1 Running 0 27s 10.1.2.27 docker-desktop <none> <none>
broker-0-replica-1-0 1/1 Running 0 27s 10.1.2.28 docker-desktop <none> <none>
name-service-0 1/1 Running 0 27s 192.168.65.3 docker-desktop <none> <none>
rocketmq-operator-76b4b9f4db-x52mz 1/1 Running 0 3h25m 10.1.2.17 docker-desktop <none> <none>
-
访问这个 RocketMQ 集群中的 Pod 来验证集群是否能正常工作;
$ kubectl exec -it broker-0-master-0 bash
bash-4.4# sh ./tools.sh org.apache.rocketmq.example.quickstart.Producer
OpenJDK 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
06:56:29.145 [main] DEBUG i.n.u.i.l.InternalLoggerFactory - Using SLF4J as the default logging framework
SendResult [sendStatus=SEND_OK, msgId=0A0102CF007778308DB1206383920000, offsetMsgId=0A0102CF00002A9F0000000000000000, messageQueue=MessageQueue [topic=TopicTest, brokerName=broker-0, queueId=0], queueOffset=0]
...
06:56:51.120 [NettyClientSelector_1] INFO RocketmqRemoting - closeChannel: close the connection to remote address[10.1.2.207:10909] result: true
bash-4.4#
$ kubectl exec -it name-service-0 bash
bash-4.4# sh ./tools.sh org.apache.rocketmq.example.quickstart.Consumer
OpenJDK 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
07:01:32.077 [main] DEBUG i.n.u.i.l.InternalLoggerFactory - Using SLF4J as the default logging framework
Consumer Started.
ConsumeMessageThread_1 Receive New Messages: [MessageExt [queueId=0, storeSize=273, queueOffset=19845, sysFlag=0, bornTimestamp=1596768410268, bornHost=/30.4.165.204:53450, storeTimestamp=1596768410282, storeHost=/100.81.180.84:10911, msgId=6451B45400002A9F000014F96A0D6C65, commitLogOffset=23061458676837, bodyCRC=532471758, reconsumeTimes=0, preparedTransactionOffset=0, toString()=Message{topic='TopicTest', flag=0, properties={MIN_OFFSET=19844, TRACE_ON=true, eagleTraceId=1e04a5cc15967684102641001d0db0, MAX_OFFSET=19848, MSG_REGION=DefaultRegion, CONSUME_START_TIME=1596783715858, UNIQ_KEY=1E04A5CC0DB0135FBAA421365A5F0000, WAIT=true, TAGS=TagA, eagleRpcId=9.1}, body=[72, 101, 108, 108, 111, 32, 77, 101, 116, 97, 81, 32, 48], transactionId='null'}]]
ConsumeMessageThread_4 Receive New Messages: [MessageExt [queueId=1, storeSize=273, queueOffset=19637, sysFlag=0, bornTimestamp=1596768410296, bornHost=/30.4.165.204:53450, storeTimestamp=1596768410298, storeHost=/100.81.180.84:10911, msgId=6451B45400002A9F000014F96A0D7141, commitLogOffset=23061458678081, bodyCRC=1757146968, reconsumeTimes=0, preparedTransactionOffset=0, toString()=Message{topic='TopicTest', flag=0, properties={MIN_OFFSET=19636, TRACE_ON=true, eagleTraceId=1e04a5cc15967684102961002d0db0, MAX_OFFSET=19638, MSG_REGION=DefaultRegion, CONSUME_START_TIME=1596783715858, UNIQ_KEY=1E04A5CC0DB0135FBAA421365AB80001, WAIT=true, TAGS=TagA, eagleRpcId=9.1}, body=[72, 101, 108, 108, 111, 32, 77, 101, 116, 97, 81, 32, 49], transactionId='null'}]]
...
-
删除集群,清理环境;
$ kubectl delete -f example/rocketmq_v1alpha1_rocketmq_cluster.yaml
$ ./purge-operator.sh
按照 OperatorHub 官网指导安装 RocketMQ Operator
-
在 网页搜索 RocketMQ Operator;
-
进入 RocketMQ Operator 页面,点击 Install 按钮;
-
按照说明安装 OLM 和 RocketMQ Operator;
本地安装 OLM 来使用 RocketMQ Operator
-
本地安装和启动 (Operator Lifecycle Manager) console;
-
本地启动 UI 界面控制台;
$ make run-console-local
-
访问 http://localhost:9000 查看控制台;
搜索 RocketMQ 或点击 All Items 分类中的 Streaming & Messaging,找到 RocketMQ Operator 并进行安装;
安装完 RocketMQ Operator 后可以在 Installed Operators 中找到 RocketMQ Operator;
已安装的 Operators 界面
RocketMQ Operator 介绍界面
通过 UI 界面创建 NameService 自定义资源
可以在 UI 中创建指定 Namespace 下的 NameService 和 Broker 实例,并对已创建的实例进行浏览和管理。我们也可以通过命令查看当前 K8s 集群中的 Pod 状态,例如:
$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
docker compose-78f95d4f8c-8fr5z 1/1 Running 0 32h
docker compose-api-6ffb89dc58-nv9rh 1/1 Running 0 32h
kube-system coredns-5644d7b6d9-hv6r5 1/1 Running 0 32h
kube-system coredns-5644d7b6d9-mkqb6 1/1 Running 0 32h
kube-system etcd-docker-desktop 1/1 Running 0 32h
kube-system kube-apiserver-docker-desktop 1/1 Running 0 32h
kube-system kube-controller-manager-docker-desktop 1/1 Running 1 32h
kube-system kube-proxy-snmxh 1/1 Running 0 32h
kube-system kube-scheduler-docker-desktop 1/1 Running 1 32h
kube-system storage-provisioner 1/1 Running 1 32h
kube-system vpnkit-controller 1/1 Running 0 32h
marketplace broker-0-master-0 1/1 Running 0 5h3m
marketplace broker-0-replica-1-0 1/1 Running 0 5h3m
marketplace name-service-0 1/1 Running 0 5h3m
marketplace marketplace-operator-69756457d8-42chk 1/1 Running 0 32h
marketplace rocketmq-operator-0.2.1-c9fffb5f-cztcl 1/1 Running 0 32h
marketplace rocketmq-operator-84c7bb4ddc-7rvqr 1/1 Running 0 32h
marketplace upstream-community-operators-5b79db455f-7t47w 1/1 Running 1 32h
olm catalog-operator-7b788c597d-gjz55 1/1 Running 0 32h
olm olm-operator-946bd977f-dhszg 1/1 Running 0 32h
olm operatorhubio-catalog-fvxp9 1/1 Running 0 32h
olm packageserver-789c7b448b-7ss7m 1/1 Running 0 32h
olm packageserver-789c7b448b-lfxrw 1/1 Running 0 32h
可以看到在 marketplace 这个 namespace 中也成功创建了对应的 name server 和 broker 实例。
以上是基于 OperatorHub 和 OLM 安装使用 RocketMQ Operator 的案例,我们将持续推送和维护新版本的 RocketMQ Operator 至该平台,方便用户获取最新更新或选择合适的 Operator 版本。
社区
RocketMQ Operator 是 Apache 社区的开源项目,服务于阿里巴巴 SaaS 类交付专有云,产品私有云环境部署等场景,同时也收到来自爱奇艺等互联网公司开源贡献者的代码提交。欢迎广大用户来社区项目中进行反馈,点击下方链接留下您的信息,让我们更好地完善 RocketMQ Operator。
链接:https://github.com/apache/rocketmq-operator/issues/20
目前,RocketMQ Operator v0.2.1 的 PR 已合并进入 community-operators 仓库,RocketMQ Operator 进入 OperatorHub.io 后,用户可以通过使用 OLM(Operator Lifecycle Manager) 来安装、订阅 RocketMQ Operator,获得持续的服务支持。
未来展望
RocketMQ Operator v0.2.1 支持的功能主要包括:Name Server 和 Broker 集群的自动创建,Name Server 集群的无缝扩容(自动通知 Broker 集群更新 Name Server IP 列表),非顺序消息下的 Broker 集群无缝扩容(新 Broker 实例会从 Broker CRD 指定的源 Broker Pod 中同步元数据,包括 Topic 信息和订阅信息),以及 Topic 迁移等。
下一步我们希望和社区一起进一步完善 RocketMQ Operator 项目,包括灰度发布,数据的全生命周期管理,容灾备份恢复,流量等指标监控、自动弹性扩缩容等方面,最终实现通过 Operator 可以覆盖 RocketMQ 服务全生命周期的管理。
欢迎大家使用 RocketMQ Operator,提出宝贵建议。
相关链接
RocketMQ Operator 项目:https://github.com/apache/rocketmq-operator
OperatorHub:https://operatorhub.io/
Operator Framework:https://github.com/operator-framework
RocketMQ 官网:https://rocketmq.apache.org/
Apache RocketMQ 仓库:https://github.com/apache/rocketmq
《云原生实践公开课》
去年,CNCF 与 阿里云联合发布了《云原生技术公开课》已经成为了 Kubernetes 开发者的一门“必修课”。
今天,阿里云再次集结多位具有丰富云原生实践经验的技术专家,正式推出《云原生实践公开课》。课程内容由浅入深,专注讲解“ 落地实践”。还为学习者打造了真实、可操作的实验场景,方便验证学习成果,也为之后的实践应用打下坚实基础。课程已经正式上线,欢迎大家观看。