新认识 | Kubernetes将改变数据库管理方式

vlambda
2022-04-26

新认识 | Kubernetes将改变数据库管理方式

【前言】

大家都知道，数据库是有序状态的，容器化docker技术是为了便捷的进行无序部署，主要应用于应用程序的快速部署，Kubernetes是为了管理docker容器的利器/管理工具。

当考虑在 Kubernetes 上部署数据库时，面临的第一个问题就是：“Kubernetes 有应对有状态服务的能力吗？”多年来的答案都是“不建议”。毕竟，Kubernetes 最初的设计便是用于处理无状态服务的容器编排。

如今，有状态服务的相关技术已经相当成熟，是时候重新考虑在 Kubernetes 上运行数据库了。

为啥说可以考虑Kubernetes上部署DB了呐?

请看如下文章，直接看英文吧，看起来更舒服，不翻译了。

原文链接如下：

https://thenewstack.io/kubernetes-will-revolutionize-enterprise-database-management/

正文如下：

Theme: Kubernetes Will Revolutionize Enterprise Database Management

BY 21 Oct 2021 10:00am, by Álvaro Hernández

“Is Kubernetes ready for stateful workloads?” is the first question that pops up when decision-makers consider deploying databases on Kubernetes. For years the answer was “don’t do it,” and for good reasons. Kubernetes was initially designed to handle the orchestration of stateless workload. But the technology has matured, and it is time to reconsider running data on Kubernetes.

There are three important technical aspects to be considered:

Kubernetes maturity
Kubernetes stateful capabilities
Availability and performance characteristics of running databases in containers.

How Mature Is Kubernetes?

While assessing the maturity of any technology isn’t a straightforward process, there are solid signals that can be used. Kubernetes is a Cloud Native Computing Foundation graduated project, meaning that the technology has the “adoption, a healthy rate of changes, and committers from multiple organizations”. Their 2020 Survey Report shows that “91% of respondents using containers report using Kubernetes, 83% of them in production.”

Since November 2017, reputed analyst firm Thoughtworks considers Kubernetes as a mature technology that companies should adopt, explaining that “it has become the default solution for most of our clients when deploying containers into a cluster of machines.”

Chart courtesy of the Cloud Native Computing Foundation.

Is Kubernetes Stateful Ready?

Kubernetes stateful capabilities are often doubted, and a first-generation stateful technology named Persistent Sets (“PetSet”) is (partially) to blame. This feature was deprecated in favor of the current stateful technology in Kubernetes: StatefulSets. Released for GA (“General Availability”) in 2018, it is used today across countless solutions that provide persistent, non-ephemeral, storage for Kubernetes containers. This is what makes Vitess or other cloud native databases deployment in Kubernetes possible.

Most notably, StatefulSets mount PersistentVolumes (“PVs”) into the containers. These PVs are generally provided by storage external to the Kubernetes node, either in the form of networked drives or software-defined storage solutions, like OpenEBS. In essence, the storage used in Kubernetes and in the cloud is the same EBS volumes you use on AWS, or the Persistent Disks you use on GCP; and we can expect the same level of maturity.

Performances of Running Data on K8s

Surely, database performance suffers in Kubernetes, doesn’t it? Containers are wrongly perceived as “lightweight virtual machines.” They are rather extremely thin layers of abstraction wrapping the filesystem, process, and networking spaces, provided by the Linux kernel. There might be some overhead if you use only ephemeral, container storage for the data. But the overhead is negligible if you use external PV storage.

And what about the ephemeral nature of containers? Wouldn’t this affect high availability? Since containers are just “wrappers” around a process, their lifetime is tied to that of the process. In other words, containers will be as stable as the database process running inside of them.

Running Databases on Kubernetes Revolutionizes the Way you Run Databases

There are obvious advantages to running databases on Kubernetes: the simplicity of deployment, having the whole stack managed by the same orchestration tool, auto-healing, and automatic reprovisioning of failed containers leading to higher availability. For example, if one of the nodes running a database fails, Kubernetes will automatically self-heal, rescheduling the workload on another node. With cooperation with the database management software, it may elect a new database primary running on a previously existing replica, and re-initialize the new node as a new replica, all automatically. But there are other, more important, reasons why you want to run databases in Kubernetes.

Most companies want to operate databases as a DBaaS (“Database-as-a-Service”). To self-provision a self-healing database, including backups, and monitoring. While this is offered by most cloud providers, doing it yourself by using Kubernetes can save significant costs, and offer additional capabilities, such as multicloud and cloud portability.

These capabilities are made available via Kubernetes Operators. Operators are application-specific extensions to Kubernetes that encode deployment and operations automation while exposing simple interfaces to the users. Advanced database Kubernetes operators bring, among others, the following benefits:

A declarative approach to deployments and updates, making it 100% GitOps friendly and perfect for any company using CI/CD. Operators define CRDs (Custom Resource Definition) that are high-level objects — typically interfaced as simple YAML files — that allow to deploy and manage complex database architectures in a simple manner.
Automate “Day 2 Operations”: deployment, high availability, backups, and monitoring; patching, vacuuming, bloat removal, reindexing, etc. Operators can encode these operations into CRDs, YAML files that allow performing these operations automatically. One example of this approach is StackGres (which I founded), an advanced operator to run Postgres on Kubernetes, which fully automates all the operations mentioned above.
Externalization of database functionality to third-party, well-known, Kubernetes components, like the Envoy proxy; Prometheus and Grafana for monitoring; or Cert Manager for SSL certificate management. The database operators may rely on these components to offload database functionality, reducing the cognitive knowledge for the user to operate them, as it is more familiar, and obtaining more advanced functionality.

Running databases on Kubernetes is not only the future but also the present, as shown by leading companies such as Goldman Sachs, Zalando, and Flipkart. As with any technology, careful and objective evaluation should be performed before deploying production workloads.

Unsurprisingly, the Data on Kubernetes 2021 report found that 90% of the responding companies believe that Kubernetes is ready for stateful workloads. A large majority of these organizations (70%) run stateful workloads in production with databases topping the list. Those running 75% or more of their production workloads on it report an impressive 2x or greater productivity gains!

Considering all the advantages that running databases on Kubernetes offers, companies should ensure to consider it. Running data on Kubernetes was the latest frontier to have fully orchestrated infrastructure and I believe that this shift will unleash considerable value for businesses.

原文内容至此。

个人小结：

在 Kubernetes 上运行数据库有明显的优势：部署简单，整个堆栈由同一个编排工具管理，自动修复，以及自动重新部署失败的容器，从而提高可用性。
多数公司希望将数据库作为 DBaaS（数据库即服务）进行操作。自我配置自愈数据库，包括备份和监视。这是功能大多数云厂商也提供，但通过使用 Kubernetes 自己动手可以节省大量成本，并提供额外的功能，如多云和云可移植性。

近期热文

你可能也会对以下话题感兴趣。点击链接便可查看。