读书笔记《building-microservices-with-spring》记录和监控微服务

Chapter 13. Logging and Monitoring Microservices

由于互联网规模微服务部署的分布式特性，最大的挑战之一是单个微服务的日志记录和监控。很难通过关联不同微服务发出的日志来跟踪端到端事务。与单体应用程序一样，没有用于监控微服务的单一管理平台。这一点很重要，尤其是当我们使用多种技术处理企业级微服务时，如前一章所述。

本章将介绍微服务部署中日志记录和监控的必要性和重要性。本章将进一步研究使用许多潜在架构和技术解决日志记录和监控的挑战和解决方案。

在本章结束时，您将了解以下内容：

The different options, tools, and technologies for log management
The use of Spring Cloud Sleuth for microservices
The different tools for end-to-end monitoring of microservices
The use of Spring Cloud Hystrix and Turbine for circuit monitoring
The use of Data Lake for enabling business data analysis

Understanding log management challenges

日志只不过是来自正在运行的进程的事件流。对于传统的 JEE 应用程序，许多框架和库可用用于日志记录。 Java Logging (JUL) 是一个现成的选项Java 本身的架子。 Log4j、Logback 和 SLF4J 是其他一些流行的 logging 框架。这些框架支持 UDP 和 TCP 协议进行日志记录。应用程序将日志条目发送到控制台或文件系统。通常采用文件回收技术来避免日志填满所有磁盘空间。

由于磁盘 IO 的高成本，日志处理的最佳实践之一是关闭生产中的大部分日志条目。磁盘 IO 不仅会减慢应用程序的速度，还会严重影响可伸缩性。将日志写入磁盘也需要很高的磁盘容量。磁盘空间不足的情况可能会导致应用程序崩溃。日志框架提供了在运行时控制日志的选项，以限制必须打印的内容和不打印的内容。这些框架中的大多数都提供了对日志控制的细粒度控制。它还提供了在运行时更改这些配置的选项。

另一方面，如果分析得当，日志可能包含重要信息并具有很高的价值。因此，限制日志条目本质上限制了我们理解应用程序行为的能力。

当从传统部署转移到云部署时，应用程序不再被锁定在特定的预定义机器上。虚拟机和容器没有与应用程序硬连线。用于部署的机器可能会不时更改。此外，像 Docker 这样的容器是短暂的。这实质上意味着不能依赖磁盘的持久状态。一旦容器停止并重新启动，写入磁盘的日志将丢失。因此，我们不能依赖本地机器的磁盘来写日志文件。

正如我们在第 10 章中讨论的，相关的架构风格和用例 ，Twelve-Factor 应用程序的原则之一是避免应用程序本身路由或存储日志文件。在微服务的上下文中，它们将在隔离的物理机或虚拟机上运行，从而导致日志文件碎片化。在这种情况下，几乎不可能跟踪跨多个微服务的端到端事务。

读书笔记《building-microservices-with-spring》记录和监控微服务

如上图所示，每个微服务都会将日志发送到本地文件系统。在这种情况下，事务 T1 调用 M1 后跟 M3。由于 M1 和 M3 在不同的物理机上运行，因此两者他们将各自的日志写入不同的日志文件。这使得关联和理解端到端交易流变得更加困难。此外，由于 M1 和 M3 的两个实例正在运行两台不同的机器，服务级别的日志聚合很难实现。

Centralized logging solution

为了解决前面提到的挑战，传统的日志记录解决方案需要认真重新思考。除了解决上述挑战外，新的日志记录解决方案还预计支持此处总结的功能：

Ability to collect all log messages and run analytics on top of the log messages
Ability to correlate and track transactions end-to-end
Ability to keep log information for longer time periods for trending and forecasting
Ability to eliminate dependency on the local disk system
Ability to aggregate log information coming from multiple sources, such as network devices, operating system, microservices, and so on

解决这些问题的方法是集中存储和分析所有日志消息，而不考虑日志的来源。新的日志记录解决方案采用的基本原则是将日志存储和进程从服务执行环境中分离出来。与在微服务执行环境中存储和处理它们相比，大数据解决方案更适合更有效地存储和处理大量日志消息。

在集中式日志记录解决方案中，日志消息将从执行环境传送到中央大数据存储。日志分析和处理将使用大数据解决方案进行处理。

如上图 logical 图中所示，集中式日志记录解决方案中有许多组件。这些解释如下：

Log streams: These are streams of log messages coming out of the source systems. The source system can be microservices, other applications, or even network devices. In typical Java-based systems, these are equivalent to streaming the Log4j log messages.
Log shippers: These are responsible for collecting the log messages coming from different sources or endpoints. The log shippers then send these messages to another set of endpoints, such as writing to a database, pushing to a dashboard, or sending it to a stream processing endpoint for further real-time processing.
Log store: This is the place where all log messages will be stored for real-time analysis, trending, and so on. Typically, the log store will be a NoSQL database, such as HDFS, capable of handling large data volumes.
Log stream processor: This is capable of analyzing real-time log events for quick decision making. Stream processors take actions such as sending information to a dashboard, sending alerts, and so on. In the case of self-healing systems, stream processors can even take action to correct the problems.

Log dashboard: This dashboard is a single pane of glass for displaying log analysis results, such as graphs and charts. These dashboards are meant for operational and management staff.

这种集中式方法的好处是没有本地 IO 或阻塞磁盘写入。它也没有使用本地机器的磁盘空间。这个架构与大数据处理的Lambda架构基本相似。

Note

点击此链接以阅读有关 Lambda 架构的更多信息：http://lambda-architecture.net

拥有每条日志消息、上下文、消息和相关 ID 非常重要。上下文通常包含时间戳、IP 地址、用户信息、进程详细信息（如服务、类、函数）、日志类型、分类等。该消息将是简单明了的自由文本信息。相关 ID 将用于建立服务调用之间的链接，以便可以跟踪跨微服务的调用。

Selection of logging solutions

有许多选项可用用于实施集中式日志记录解决方案。这些解决方案使用不同的方法、架构和技术。了解所需的功能并选择满足需求的正确解决方案非常重要。

Cloud services

有数量的云日志服务可用作 SaaS 解决方案。 Loggly 是最流行的基于云的日志记录服务之一。 Spring Boot 微服务可以使用Loggly的Log4j和Logback appenderto 直接将日志消息流式传输到 Loggly 服务。

如果 application 或 service< /a> 部署在AWS CloudTrail 可以integrated 与 Loggly 进行日志分析。

论文、Logsene、 Sumo Logic、Google Cloud Logging 和 Logentries 是其他基于云的日志记录解决方案的示例。安全运营中心中的一些工具（SOC）也有资格进行集中日志管理。

云日志服务通过提供易于集成的服务来消除管理复杂基础设施和大型存储解决方案的开销。但是，延迟是选择云日志作为服务时要考虑的关键因素之一。

Off-the-shelf solutions

有许多专门构建的工具可提供可在本地本地安装的端到端日志管理功能数据中心或云端。

Graylog 是流行的开放式之一源日志管理解决方案。它使用 Elasticsearch 进行日志存储，使用 MongoDB 作为元数据存储。它还使用 GELF 库进行 Log4j 日志流。

Splunk 是流行商业广告之一可用于日志管理和分析的工具。与其他解决方案用于收集日志的日志流相比，它使用日志文件传送方法。

Best of the breed integration

最后的方法是挑选最佳的品种组件并构建自定义日志记录解决方案。

Log shippers

有一些日志运输工具可以与其他工具结合使用来构建端到端的日志管理解决方案。不同 log 传送工具的功能不同。

Logstash 是一个强大的数据管道工具可用于收集和传送日志文件。它充当代理，提供一种机制来接受来自不同来源的流数据并将它们接收到不同的目的地。 Log4j 和 Logback appender 也可用于将日志消息直接从 Spring Boot 微服务发送到 Logstash。 Logstash 的另一端将连接到 Elasticsearch、HDFS 或任何其他数据库。

Fluentd 是另一个工具，非常类似于 Logstash。 Logspout 是另一个与 Logstash 类似的工具，但它更适合基于 Docker 容器的环境。

Log stream processors

流处理技术可用于动态处理日志流。例如，如果连续收到 404 错误作为对特定服务调用的响应，则表示该服务有问题。此类情况必须尽快处理。在这种情况下，流处理器非常方便，因为与传统反应分析相比，它们能够对某些事件流做出反应。

用于 stream 处理的典型架构是 combinationFlume 和 Kafka 的“indexterm">与 Storm 或 Spark Streaming 一起使用。 Log4j 有 Flume 附加程序，可用于收集日志消息。这些消息将被推送到分布式 Kafka message 队列中。流处理器从 Kafka 收集数据并在发送到 Elasticsearch 和之前对其进行动态处理其他日志存储。

Spring Cloud Stream、Spring Cloud Stream 模块和Spring Cloud Data Flow也可以用于构建日志流处理。

Log storage

实时日志 messages 通常存储在 Elasticsearch 中，允许客户端根据基于文本的索引进行查询。除了 Elasticsearch，HDFS 也常用于存储归档日志消息。 MongoDB 或 Cassandra 用于存储汇总数据，例如每月汇总的事务计数。离线日志处理可以使用 Hadoop map reduce 程序来完成。

Dashboards

中央日志记录解决方案中的最后一块必需是仪表板。最常用用于日志分析的仪表板是 Kibana 在 top < /a> 的 Elasticsearch 数据存储。 Graphite 和 Grafana 也用于显示日志分析报告。

Custom logging implementation

上一节中提到的工具可用于构建自定义的端到端日志记录解决方案。自定义日志管理最常用的架构是Logstash< /strong>、Elasticsearch 和 Kibana ，也称为 ELK 堆栈。

Note

本章的完整源代码可在 chapter8 项目中找到Spring5Microservice" target="_blank">https://github.com/rajeshrv/Spring5Microservice。复制 chapter7.configserver, chapter7.eurekaserver, chapter7.search, < code class="literal">chapter7.search-apigateway 和 chapter7.website 到一个新的 STS 工作区并重命名 chapter8。 * 。

注意：尽管 Spring Cloud Dalston SR1 正式支持 Spring Boot 1.5.2.RELEASE，但围绕 Hystrix 存在一些问题。为了运行 Hystrix 示例，建议将 Spring Boot 版本升级到 1.5.4.RELEASE。

下图展示了日志监控流程：

在本节中，将检查使用 ELK 堆栈的自定义日志记录解决方案的简单实现。

请按照以下步骤实现用于日志记录的 ELK 堆栈：

Download and install Elasticsearch, Kibana, and Logstash from https://www.elastic.co.
Update the Search microservice (chapter8.search). Review and ensure that there are some log statements in Application.java of the Search microservice. The log statements are nothing special but simple log statements using slf4j as shown in the following code snippet:

        import org.slf4j.Logger;
        import org.slf4j.LoggerFactory;
        //other code goes here
        private static final Logger logger = LoggerFactory
          .getLogger(SearchRestController.class);

        //other code goes here

        logger.info("Looking to load flights...");
        for (Flight flight : flightRepository
          .findByOriginAndDestinationAndFlightDate
          ("NYC", "SFO", "22-JAN-18")) {
            logger.info(flight.toString());
        }

Add the Logstash dependency to integrate logback to logstash in the Search service's pom.xml:

        <dependency>
          <groupId>net.logstash.logback</groupId>
          <artifactId>logstash-logback-encoder</artifactId>
          <version>4.6</version>
        </dependency>

Override the default logback configuration. This can be done by adding a new logback.xml under src/main/resources. A sample log configuration is shown as follows:

        <?xml version="1.0" encoding="UTF-8"?>
        <configuration>
          <include resource="org/springframework/boot/logging /logback/defaults.xml"/>
          <include resource="org/springframework/boot/logging /logback/console-appender.xml" />
          <appender name="stash" class="net.logstash.logback.appender .LogstashTcpSocketAppender">
            <destination>localhost:4560</destination>
            <!-- encoder is required -->
            <encoder class="net.logstash.logback.encoder .LogstashEncoder" />
          </appender>
          <root level="INFO">
           <appender-ref ref="CONSOLE" />
           <appender-ref ref="stash" />
          </root>
        </configuration>

前面的配置通过添加一个新的 TCP 套接字附加器来覆盖默认的 logback 配置，它将所有日志消息流式传输到正在侦听端口 的 Logstash 服务4560 。添加编码器很重要，如前面的配置中所述。

Create a configuration, as shown next, and store it in a logstash.conf file. The location of this file is irrelevant, since it will be passed as an argument when starting Logstash. This configuration will take input from the socket, listening on port 4560 and send the output to Elasticsearch running on port 9200. The stdout is optional and set for debugging:

        input {
          tcp {
            port => 4560
            host => localhost
          }
        }
        output {
          elasticsearch { hosts => ["localhost:9200"] }
          stdout { codec => rubydebug }
        }

Run Logstash, Elasticsearch, and Kibana from their respective installation folders:

        ./bin/elasticsearch
        ./bin/kibana
        ./bin/logstash -f logstash.conf

Run the Search microservice. This will invoke the unit test cases and result in printing the log statements mentioned earlier. Ensure that RabbitMQ, Config Server, and Eureka servers are running.
Go to a browser and access Kibana:

        http://localhost:5601

转到设置并配置索引模式，如以下屏幕截图所示：

Go to discover menu to see the logs. If everything is successful, we will see the following Kibana screenshot. Note that the log messages are displayed in the Kibana screen.

Kibana 提供了开箱即用的功能来使用日志消息构建摘要图表和图形。

Kibana UI 将看起来如下截图：

Distributed tracing with Spring Cloud Sleuth

上一节解决了微服务通过集中日志数据来分散和分散日志记录问题。使用 central 日志记录解决方案，我们将所有日志都保存在中央存储中。然而，仍然几乎不可能追踪端到端的交易。为了进行端到端跟踪，跨微服务的事务需要具有关联 ID。

Twitter 的 Zipkin、Cloudera 的 HTrace 和 Google 的 Dapper 是分布式跟踪系统的示例。 Spring Cloud 使用 Spring Cloud Sleuth 库在这些组件之上提供了一个 wrapper 组件。

分布式跟踪works 与概念 Span 和 Trace 的。 Span 是一个工作单元，例如调用服务，由一个 64 位的 span ID 标识。一组跨度形成称为跟踪的树状结构。使用跟踪 ID，可以端到端跟踪呼叫，如下图所示：

如上图所示，微服务1调用2， 2 调用 3。在这种情况下，如图所示，相同的 Trace-id 将被通过跨所有微服务，可用于端到端跟踪事务。

为了演示这一点，我们将使用 Search API Gateway 和 Search 微服务。必须在 Search API 网关 (chapter8.search-apigateway) 中添加一个新端点，该端点在内部调用 Search 服务以返回数据。如果没有跟踪 ID，几乎不可能跟踪或链接来自网站的调用 search-apigateway 到 Search 微服务。在这种情况下，它只涉及两到三个服务；而在复杂的环境中，可能存在许多相互依赖的服务。

按照以下步骤使用 Sleuth 创建示例：

Update Search and Search API Gateway. Before that, the Sleuth dependency has to be added to the respective pom files:

        <dependency>
          <groupId>org.springframework.cloud</groupId>
          <artifactId>spring-cloud-starter-sleuth</artifactId>
        </dependency>

Add the Logstash dependency to the Search service as well as the logback configuration, as shown in the previous example.
The next step is to add the service name property in the logback configuration of the respective microservices:

        <property name="spring.application.name" value="search-service"/>
        <property name="spring.application.name" value="search-apigateway"/>

Add a new endpoint to the Search API Gateway, which will call the Search service, as follows. This is to demonstrate the propagation of the trace ID across multiple microservices. This new method in the gateway returns the operating hub of the airport by calling the search service. Note--the Rest Template (with @Loadbalanced) and Logger details also need to be added to the SearchAPIGateway.java class:

        @RequestMapping("/hubongw")
        String getHub(HttpServletRequest req){
          logger.info("Search Request in API gateway
           for getting Hub, forwarding to search-service ");
          String hub = restTemplate.getForObject("http://search-
           service/search/hub", String.class);
          logger.info("Response for hub received,  Hub "+ hub);
          return hub;
        }

Add another endpoint in the Search service, as follows:

        @RequestMapping("/hub")
        String getHub(){
          logger.info("Searching for Hub, received from
            search-apigateway ");
          return "SFO";
        }

Once added, run both services. Hit the gateway's new hub on the gateway (/hubongw) endpoint using a browser. Copy and paste the following link:

http://localhost:8095/hubongw

如前所述，Search API Gateway 服务运行在 8095 上，而 Search 服务运行在 8090 上。

Notice the console logs to see the trace ID and span IDs printed. The following print is from the Search API Gateway:

2017-03-31 22:30:17.780  INFO [search-
        apigateway,9f698f7ebabe6b83,9f698f7ebabe6b83,false]
        47158 --- [nio-8095-exec-1] 
        c.b.p.s.a.SearchAPIGatewayController: Response for hub
        received,  Hub SFO

以下日志来自搜索服务：

2017-03-31 22:30:17.741 INFO [search-
        service,9f698f7ebabe6b83,3a63748ac46b5a9d,false]
        47106---[nio-8090-exec-
        1]c.b.p.s.controller.SearchRestController  : Searching
        for Hub, received from search-apigateway

请注意，两种情况下的跟踪 ID 都是相同的。

Open the Kibana console and search for the trace ID using the trace ID printed in the console. In this case, it is 9f698f7ebabe6b83. As shown in the following screenshot, with a trace ID, one can trace service calls that span across multiple services:

Monitoring microservices

微服务是真正的分布式系统，具有流畅的部署拓扑。如果没有完善的监控，运营团队可能会在管理大规模微服务时遇到麻烦。传统的单体应用程序部署仅限于许多已知的服务、实例、机器等。与可能在不同机器上运行的大量微服务实例相比，这更容易管理。为了增加更多复杂性，这些服务会动态更改其拓扑。集中式日志记录功能仅解决了部分问题。对于运营团队来说，了解运行时部署拓扑以及系统行为非常重要。这要求比集中式日志记录所能提供的更多。

一般来说，应用程序监控更多的是度量和聚合的集合，并根据某些基线值验证它们。如果存在服务级别违规，则监控工具会生成警报并发送给管理员。由于有成百上千个相互连接的微服务，传统的监控并不能真正提供真正的价值。在大规模微服务中，很难实现一刀切的监控方法，或者用单一的玻璃板监控所有内容。

微服务监控的主要目标之一是从用户体验的角度了解系统的行为。这将确保端到端的行为是一致的，并符合用户的期望。

Monitoring challenges

与 fragmented 日志记录问题类似，监控微服务的关键挑战是微服务生态系统中有许多活动部分。

典型问题总结如下：

The statistics and metrics are fragmented across many services, instances, and machines.
Heterogeneous technologies may be used to implement microservices, making things even more complex. A single monitoring tool may not give all required monitoring options.
Microservices deployment topologies are dynamic, making it impossible to preconfigure servers, instances, and monitoring parameters.

许多传统的monitoring 工具适用于监控单体应用程序，但在监控大规模分布式和互连微服务系统方面存在不足.许多传统的监控系统是基于代理的，在目标机器或应用程序实例上预安装代理。这带来了以下两个挑战：

If the agents require deep integration with the services or operating systems, then this will be hard to manage in a dynamic environment
If these tools impose overheads when monitoring or instrumenting the application, they can hinder performance issues

许多传统工具需要基线指标。此类系统使用预设规则，例如如果 CPU 利用率超过 60% 并保持在该水平两分钟，然后向管理员发送警报。在大型 Internet 规模部署中预先配置这些值非常困难。

新一代监控应用程序自学习应用程序行为并设置自动阈值。这将管理员从执行这项平凡的任务中解放出来。自动化基线有时比人工预测更准确。

如上图所示，微服务监控的重点领域如下：

Metrics sources and data collectors: The metrics collection at the source will be done by either the server pushing metrics information to a central collector or by embedding lightweight agents to collect information. The data collectors collect monitoring metrics from different sources, such as network, physical machines, containers, software components, application, and so on. The challenge is to collect this data using auto-discovery mechanisms instead of static configurations.

这将通过在源机器上运行代理、从源流式传输数据或定期轮询来完成。

Aggregation and correlation of metrics: The aggregation capability is required to aggregate metrics collected from different sources, such as user transaction, service, infrastructure, network, and so on. Aggregation can be challenging, as it requires some level of understanding of the applications behaviors, such as service dependencies, service grouping, and so on. In many cases, these are automatically formulated based on the metadata provided by the sources.

通常，这将由接受指标的中介完成。

Processing metrics and actionable insights: Once the data is aggregated, then the next step is to take measurements. Measurements are typically done by using set thresholds. In the new generation monitoring systems, these thresholds are automatically discovered. The monitoring tools then analyze the data and provide actionable insights.

这些工具可能使用大数据和流分析解决方案。

Alerting, actions and dashboards: As soon as issues are detected, they have to be notified to the relevant people or systems. Unlike traditional systems, the microservices monitoring systems should be capable of taking actions on a real-time basis. Proactive monitoring is essential to achieving self-healing. Dashboards are used to display SLAs, KPIs, and so on.

仪表板和警报工具能够处理这些要求。

微服务监控通常通过三种方法完成。有效监控确实需要它们的组合：

Application Performance Monitoring (APM) (sometimes referred to as Digital Performance Monitoring or DPM) is is more of a traditional approach of system metrics collection, processing, alerting, and rendering dashboards. These are more from the system's point of view. Application topology discovery and visualization are new capabilities implemented by many of the APM tools. The capabilities vary between different APM providers.
Synthetic monitoring is is a technique that is used to monitor system behavior using end-to-end transactions with a number of test scenarios in a production or production-like environment. Data will be collected to validate the system behavior and potential hotspots. Synthetic monitoring helps us understand system dependencies as well.

Real user monitoring (RUM) or user experience monitoring is typically a browser-based software that records real user statistics, such as response times, availability, and service levels. With microservices, with a more frequent release cycle and dynamic topology, users experience that monitoring is more important.

Monitoring tools

有许多工具可用于监控微服务。许多这些工具之间也存在重叠。监控工具的选择实际上取决于需要监控的生态系统。在大多数情况下，监控整个微服务生态系统需要不止一种工具。

本部分的目的是熟悉一些常见的微服务友好型监控工具：

AppDynamics, Dynatrace and New Relic are top commercial vendors in the APM space, as per Gartner magic quadrant 2015. These tools are microservice-friendly and support microservice monitoring effectively in a single console. Ruxit, Datadog, and Dataloop are other commercial offerings that are purpose-built for distributed systems that are essentially microservices-friendly. Multiple monitoring tools can feed data to Datadog using plugins.
Cloud vendors come with their own monitoring tools, but, in many cases, these monitoring tools alone may not be sufficient for large-scale microservices monitoring. For instance, AWS uses CloudWatch and Google Cloud Platform uses Cloud Monitoring to collect information from various sources.
Some of the data collecting libraries, such as Zabbix, statd, collectd, jmxtrans, and so on, operate at a lower level in collecting runtime statistics, metrics, gauges, and counters. Typically, this information will be fed into data collectors and processors, such as Riemann, Datadog, and Librato, or dashboards, such as Graphite.
Spring Boot Actuator is one of the good vehicles for collecting microservices metrics, gauges, and counters, as we saw in Chapter 11, Building Microservices with Spring Boot. Netflix's Servo is a metric collector similar to Actuator. QBit and Dropwizard metrics also fall in the same category of metric collectors. All these metrics collectors need an aggregator and dashboard to facilitate full-sized monitoring.

Monitoring through logging is popular, but a less effective approach in microservices monitoring. In this approach, as discussed in the previous section, log messages will be shipped from various sources, such as microservices, containers, networks, and so on, to a central location. Then, use the log files to trace transactions, identify hotspots, and so on. Loggly, ELK, Splunk, and Trace are candidates in this space.
Sensu is a popular choice for microservice monitoring from the open source community. Weave scope is another tool, primarily targeting containerized deployments. SimianViz (formerly Spigo) is one of the purpose-built microservices, monitoring the system closely aligned with the Netflix stack. Cronitor is also another useful tool.
Pingdom, New Relic synthetic, Runscope, Catchpoint, and so on, provide options for synthetic transaction monitoring and user experience monitoring on live systems.
Circonus is classified more towards DevOps monitoring tools, but can also do microservices monitoring. Nagios is a popular open source monitoring tool, but it falls more into the traditional monitoring systems.
Prometheus provides a time series database and visualization GUI useful for building custom monitoring tools.

Monitoring microservice dependency

当存在大量个具有依赖关系的微服务时，拥有一个可以显示微服务之间依赖关系的监控工具非常重要.它不是静态配置和管理这些依赖项的可扩展方法。有很多工具 a> 对于监控微服务依赖关系很有用。

指导工具，例如 AppDynamics、Dynatrace 和 New Relic，可以绘制微服务之间的依赖关系。端到端事务监控还可以跟踪事务依赖关系。 Spigo 等其他监控工具也有用用于微服务 dependency 管理。 CMDB 工具，例如 Device42，或专用工具，例如 Accordance ，对于管理微服务的依赖关系很有用。 Vertias Risk Advisor (VRA) 对基础设施发现也很有用.

也可以使用带有图形数据库的自定义实现，例如 Neo4j。在这种情况下，微服务必须预先配置其直接和间接依赖项。在服务启动时，它发布并交叉检查其与此 Neo4j 数据库的依赖关系。

Spring Cloud Hystrix for fault-tolerant microservices

本节将探索 Spring Cloud Hystrix 作为容错和延迟容忍微服务实现的库。 Hystrix 基于基于故障快速和快速恢复原则。如果服务出现问题，Hystrix 会帮助隔离问题。它通过回退到另一个预配置的回退服务来帮助快速失败。它是来自 Netflix 的another 久经考验的库，并且基于Circuit Breaker 模式上的 class="indexterm">。

Note

在 https://msdn 上阅读有关断路器模式的更多信息。 microsoft.com/en-us/library/dn589784.aspx。

在本节中，我们将使用 Spring Cloud Hystrix 构建一个断路器。按照以下步骤更改 Search API Gateway 服务以与 Hystrix 集成。更新搜索 API 网关服务。

给服务添加Hystrix依赖，如下：

        <dependency>
          <groupId>org.springframework.cloud</groupId>
          <artifactId>spring-cloud-starter-hystrix</artifactId>
        </dependency>

如果从头开始开发，请选择以下库：

在 Spring Boot Application 类（SearchAPIGateway）中，添加 @EnableCircuitBreaker。该命令将告诉 Spring Cloud Hystrix 为该应用程序启用断路器。它还公开了用于度量收集的 /hystrix.stream 端点。

使用方法向 Search API Gateway 服务添加组件类；在这种情况下; getHub 用 @HystrixCommand 注释。这告诉 Spring 这个方法很容易失败。 Spring Cloud 库封装了这些方法，通过启用断路器来处理容错和延迟容错。 HystrixCommand 通常跟在 fallbackMethod 之后。如果发生故障，Hystrix 会自动启用提到的 fallbackMethod 并将流量转移到 fallbackMethod。

如以下代码所示，在这种情况下，getHub 将回退到 getDefaultHub：

        @Component
        class SearchAPIGatewayComponent {
          @LoadBalanced
          @Autowired
          RestTemplate restTemplate;

          @HystrixCommand(fallbackMethod = "getDefaultHub")
          public String getHub(){
            String hub = restTemplate
              .getForObject("http://search-service/search/hub",
              String.class);
            return hub;
          }

          public String getDefaultHub(){
            return "Possibily SFO";
          }
        }

SearchAPIGatewayController的getHub方法调用getHub方法"literal">SearchAPIGatewayComponent:

        @RequestMapping("/hubongw")
        String getHub(){
 logger.info("Search Request in API gateway for getting Hub, 
            forwarding to search-service ");
          return component.getHub();
        }

本练习的最后一部分是构建一个 Hystrix 仪表板。为此，构建另一个 Spring Boot 应用程序。包括 Hystrix、Hystrix Dashboard 和 Actuator 在构建这个应用程序时。

在 Spring Boot Application 类中，添加 @EnableHystrixDashboard 注解。

启动 Search 服务、Search API Gateway 和 Hystrix Dashboard 应用程序。将浏览器指向 Hystrix 仪表板应用程序的 URL。在此示例中，Hystrix 仪表板在端口 9999 上启动。

打开以下 URL：http://localhost:9999/hystrix

将显示如下屏幕截图所示的屏幕。在Hystrix Dashboard中，输入要监控的服务的URL。

在这种情况下，搜索 API 网关在 8095 端口上运行。因此 hystrix.stream URL 将是 http://localhost:8095/hytrix.stream：

Hystrix 仪表板将显示如下：

请注意，必须至少执行一项事务才能看到显示。这可以通过点击 http://localhost:8095/hubongw 来完成。

通过关闭搜索服务来创建故障场景。请注意，点击以下 URL 时将调用回退方法：http://localhost:8095/hubongw

如果连续出现故障，则电路状态将更改为开路。这可以通过多次点击前面的链接来完成。在打开状态下，将不再检查原来的服务。 Hystrix 仪表板将电路的状态显示为 Open，如下图所示。一旦电路打开，系统将定期检查原始服务状态以进行恢复。当原始服务返回时，circuit 断路器落下back 到 original 服务，状态将设置为 关闭：

Note

以下 Hystrix Wiki URL 显示了每个参数的含义：https:// github.com/Netflix/Hystrix/wiki/Dashboard

Aggregate Hystrix streams with Turbine

在前面的示例中，我们的微服务的 /hystrix.stream 端点在 Hystrix 仪表板中给出。 Hystrix 仪表板一次只能监控一个微服务。如果有很多微服务，那么每次将微服务切换到监视器。一次查看一个实例很乏味，尤其是当有许多微服务或多个微服务实例时。

我们必须有一种机制来聚合来自多个 /hystrix.stream 实例的数据，并将它们整合到一个仪表板视图中。涡轮机的作用完全相同。它是另一台服务器，它从多个实例收集 Hystrix 流并将它们合并到一个 /turbine.stream 中。现在 Hystrix 仪表板可以指向 /turbine.stream 以获取综合信息。看看下面的图表：

Note

Turbine 仅适用于不同的主机名。每个实例都必须在不同的主机上运行。如果在同一主机上本地测试多个服务，请更新主机文件（/etc/hosts）以模拟多个主机。完成后，bootstrap.properties 必须配置如下：eureka.instance.hostname: localdomain2.

以下示例展示了如何使用 Turbine 跨多个实例和服务监控断路器。在此示例中，我们将使用搜索服务和搜索 API 网关。 Turbine 在内部使用 Eureka 来解析配置用于监控的服务 ID。

按照以下步骤构建和执行此示例。

The Turbine server can be created as just another Spring Boot application using Spring Boot Starter. Select Turbine to include the Turbine libraries.
Once the application is created, add @EnableTurbine to the main Spring Boot Application class. In this example, both the Turbine and the Hystrix dashboard are configured to run on the same Spring Boot Application. This is possible by adding the following annotations to the newly created Turbine application:

        @EnableTurbine
        @EnableHystrixDashboard
        @SpringBootApplication
        publicclass TurbineServerApplication {

Add the following configuration to the yaml or property file to point to instances that we are interested in monitor.spring:

        application:
          name : turbineserver
        turbine:
          clusterNameExpression: new String('default')
          appConfig : search-service,search-apigateway
          server:
            port: 9090
            eureka:
          client:
            serviceUrl:
              defaultZone: http://localhost:8761/eureka/

The preceding configuration instructs the Turbine server to look up the Eureka server to resolve the search-service and search-apigateway services. The search-service and search-apigateways services are the service IDs used to register services with Eureka. Turbine will use these names to resolve the actual service host and port by checking with the Eureka server. It then uses this information to read /hystrix.stream from each of these instances. Turbine then reads all individual Hystrix streams, aggregates all of them together, and exposes them under the Turbine server's /turbine.stream URL. The cluster name expression points to the default cluster, since there is no explicit cluster configurations done in this example. If clusters are manually configured, then the following configuration has to be used:

        turbine:
          aggregator:
            clusterConfig: [comma separated clusternames]

Change the Search service and SearchComponent to add another circuit breaker:

        @HystrixCommand(fallbackMethod = "searchFallback")
        public List<Flight> search(SearchQuery query){

Also add @EnableCircuitBreaker to the main class in the Search service. In this example, we will run two instances of search-apigateway--One on localdomain1:8095 and another one on localdomain2:8096. We will also run one instance of search-service on localdomain1:8090.
Run the microservices with command-line overrides to manage different host addresses, as follows:

        java -jar -Dserver.port=
          8096 -Deureka.instance.hostname=localdomain2 -
          Dserver.address=localdomain2
          target/search-apigateway-1.0.jar

        java -jar -Dserver.port=
          8095 -Deureka.instance.hostname=localdomain1 -
          Dserver.address=localdomain1
          target/search-apigateway-1.0.jar

        java -jar -Dserver.port=
          8090 -Deureka.instance.hostname=localdomain1 -
          Dserver.address=localdomain1
          target/search-1.0.jar

Open the Hystrix dashboard by pointing the browser to the following URL:http://localhost:9090/hystrix
Instead of giving /hystrix.stream, this time, we will point to /turbine.stream. In this example, the Turbine stream is running on 9090. Hence, the URL to be given in the Hystrix dashboard is as follows:http://localhost:9090/turbine.stream
Fire a few transactions by opening the browser window and hitting http://localhost:8095/hubongw and http://localhost:8096/hubongw.
Once this is done, the dashboard page will show the getHub service.
Run chapter8.website. Execute the search transaction using the following website:http://localhost:8001

After executing the preceding search, the dashboard page will show search-service as well. This is shown in the following screenshot:

正如我们在仪表板中看到的，search-service 和 getHub 来自 Search API Gateway。由于我们有 Search API Gateway 的两个 instances，getHub 即将到来来自两个主机，由Hosts 2。搜索来自 Search 微服务。数据由我们创建的两个组件提供 - Search 微服务中的 SearchComponent 和 Search API Gateway 微服务中的 SearchAPIGateway 组件.

Data analysis using Data Lake

就像碎片化的日志和监控的场景一样，碎片化的数据是微服务架构的另一个挑战。碎片化的data给数据分析带来了挑战。该数据可用于简单的业务事件监控、数据审计，甚至用于从数据中获取商业智能。

数据湖或数据中心是处理此类场景的理想解决方案。事件源架构模式通常用于将状态和状态更改作为事件与外部数据存储共享。当状态发生变化时，微服务将状态变化发布为事件。感兴趣的各方可以订阅这些事件并根据他们的要求进行处理。中央事件存储还可以订阅这些事件并将它们存储在大数据存储中以供进一步分析。

下图显示了此类数据处理的常用架构之一：

state 改变events 生成来自微服务，在我们的例子中是搜索、预订和入住事件，被推送到分布式高性能消息传递系统，例如 Kafka< /strong>。数据摄取，例如 Flume，可以订阅< /a> 这些事件并将它们更新到 HDFS 集群。在某些情况下，这些消息将由 Spark< 在 real 时间内处理/strong>流式传输。为了处理异构事件源，Flume也可以在事件源和Kafka。

Spring Cloud Streams、Spring Cloud Streams模块和 Spring Cloud Data Flow 也可用作高速数据的替代方案摄取。

Summary

在本章中，我们了解了在处理互联网规模的微服务时有关日志记录和监控的挑战。

我们探索了集中式日志记录的各种解决方案，还学习了如何使用 Elasticsearch、Logstash 和 Kibana (ELK< /跨度>）。为了理解分布式跟踪，我们使用 Spring Cloud Sleuth 升级了 BrownField 微服务。

在本章的后半部分，我们深入探讨了微服务监控解决方案所需的能力以及不同的监控方法。随后，我们研究了一些可用于微服务监控的工具。

使用 Spring Cloud Hystrix 和 Turbine 进一步增强了 BrownField 微服务，用于监控服务间通信中的延迟和故障。这些示例还演示了如何使用断路器模式在发生故障时回退到另一个服务。

最后，我们还谈到了数据湖的重要性以及如何在微服务环境中集成数据湖架构。

微服务管理是我们在处理大规模微服务部署时必须解决的另一个重要挑战。下一章将探讨容器如何帮助简化微服务管理。

vlambda博客
学习文章列表