算法系列|Python绘制分类模型的Calibration curves校准曲线

vlambda
2022-04-17

算法系列|Python绘制分类模型的Calibration curves校准曲线

资源系列

Python

Python绘制分类模型的Calibration curves

校准曲线

- 04·16-

宜分享

临床数据的分析处理中，常常采⽤各种模型预测病⼈的结局(例如Logistic与Cox模型)。很多情况下我们构建了预测模型，但是没有评估模型的有效性及可靠性。

评价分类模型性能的方法有很多，如何保证预测模型是可靠的呢？

应读者朋友的强烈邀请，小编也来学习一番，并将自己的学习心得与大家分享。

呐，今天为大家带来这样一套：《Python绘制分类模型的Calibration curves校准曲线》

图. Python中机器学习的库

赞赏本文，即可解锁资源哦~

Python绘制分类模型的

Calibration curves校准曲线

Brief Introduction

Calibration curve：

直译过来就是校准曲线或校准图。事实上，校准曲线就是实际发生率和预测发生率的散点图。其使用分桶法的思想，即将连续数据离散化。然后以此来观察分类模型的预测概率是否接近于经验概率，即真实概率。

校准曲线的算法思想：

① 计算所有样本的预测概率值；

② 将预测概率值分成若干个组；

注：组的数量根据实际情况自己设定，分组方法根据常用的统计学方法即可，一般有‘uniform’, ‘quantile’；

③ 计算每组预测概率值的均值，即一个概率均值；

④ 计算每组正例的比例，即阳性的比例；

⑤ 以每组概率均值作横坐标，每组阳性比例作纵坐标，绘制散点图，将点连线；

算法系列|Python绘制分类模型的Calibration curves校准曲线

图. 更改分组数n后校准曲线的变化

那么，最佳的校准曲线长什么样？

可以看到，当分类完全正确时，即预测值等于真实值，则：每组的预测概率均值=该组阳性比例

即，校准曲线会是一条对角线。

因此，一个模型的校准曲线越接近对角线，即曲线越接近斜率为1的直线，就说明模型的分类预测越准确。

当曲线在虚线下方时，表示预测为真 > 实际为真；

当曲线在虚线上方时，表示预测为真 < 实际为真；

缺乏自信的模型的校准曲线是sigmoid形的，而过于自信的模型是校准曲线是反sigmoid形的。

2017年在《JAMA》期刊上发表了临床预测模型的区分和校准指南。

Discrimination and Calibration of Clinical Prediction Models: Users' Guides to the Medical Literature. JAMA, 2017.

图. JAMA指南

在指南中，作者也给出了校准曲线的例子，并且指出：

Calibration or goodness of fit is often considered the most important property of a model, and reflects the extent to which a model correctly estimates the absolute risk (ie, if the values predicted by the model agree with the observed values). Poorly calibrated mod els will underestimate or overestimate the outcome of interest.
校准或拟合优度通常被认为是模型最重要的属性，它反映了模型正确估计绝对风险的程度（即，如果模型预测的值与观察值一致）。校准不佳的模型会低估或高估感兴趣的结果。
Although the visual representation of the relationship between predicted and observed (Figure 2) is the best way to evaluate calibration, an alternative is to evaluate the difference between the predicted and the observed values using statistical tests (eg, the Hosmer-Lemeshow test) to determine whether chance can explain the difference between the predicted and the observed event rate. However, the Hosmer-Lemeshow test has limitations.
虽然预测和观察之间的关系的视觉表示（图 2）是评估校准的最佳方法，但另一种方法是使用统计测试（例如，Hosmer- Lemeshow test）来确定机会是否可以解释预测事件率和观察到的事件率之间的差异。然而，Hosmer-Lemeshow 检验有局限性。

图. JAMA指南中校准曲线范例

Python绘制Calibration curves实战

实战

接下来，我们就使用Python来一起进行实战绘图吧。

在Python中，可以直接使用calibration_curve()函数来获得曲线的横纵坐标值，然后进行绘图。

即，

# calibration_curve()函数格式from sklearn.calibration import calibration_curve
fraction_of_positives, mean_predicted_value = calibration_curve(y_train, prob_true,n_bins=20)# 注：prob_true需为概率。 # calibration_curve函数计算校准曲线，体现预测概率与真实分类的差距    # 第一个参数是样本真实分类，第二个参数是样本预测概率，n_bins参数表示把概率从小到大分成几个区间    # 返回的第一个数组是每一个区间中样本为真的比例，第二个数组是每一个区间中样本的平均预测概率    # 当曲线在虚线下方时，表示预测为真>实际为真，当曲线在虚线上方时，表示预测为真<实际为真

下面，我们来一起看两个实例：

实例1：绘制单一逻辑回归模型的Calibration curve

# Dataset# 导入库import pandas as pd# 导入数据data = pd.read_csv(r"C:\Users\lenovo\Desktop\04_16_disease.csv")# 数据描述print(data.describe())# 检查数据中是否有缺失值import numpy as npprint(np.isnan(data).any())data[data.isnull().values==True]# 取每列的缺失值个数：n_missings_each_col = data.apply(lambda x: x.isnull().sum())print(n_missings_each_col)

# 确定特征和目标X = data.iloc[:, 34:].valuesy = data.iloc[:, 0].values# 划分测试集和训练集from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.3, random_state=42)

# 导入逻辑回归模型from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(solver='lbfgs')
# 绘制校准曲线from sklearn.calibration import calibration_curve
n_bins = 10prob_true = lr.predict_proba(X_train)[:, 1] # results are tuples of (Pr(False), Pr(True))fraction_of_positives, mean_predicted_value = calibration_curve(y_train, prob_true, n_bins=n_bins)
fig, ax = plt.subplots(figsize=(12, 6))ax.plot(mean_predicted_value, fraction_of_positives, marker='o')ax.set_xlabel('Average Predicted Value')ax.set_ylabel('Proportion of Positive Records')ax.plot([0, 1], [0, 1], linestyle='--')

图. 单一逻辑回归模型的Calibration curve

# 绘制预测概率分组的柱状图fig, ax = plt.subplots(figsize=(12, 6))ax.hist(prob_true, bins=10)

图.单一逻辑回归模型的Calibration curve对应的柱状图

注：可以看到，如果只有一两个算法模型，可在calibration_curve使⽤对n_bins进⾏调参来达到更好的效果；当校准曲线过于粗糙的时候可以适当的降低n_bins的值，来使曲线变得更为平滑。

实例2：绘制多模型比较的Calibration curve

下面绘制逻辑回归、朴素贝叶斯、线性支持向量分类器、随机森林等四个算法模型的校准曲线对比。

# Dataset# 导入库import pandas as pd# 导入数据data = pd.read_csv(r"C:\Users\lenovo\Desktop\04_16_disease.csv")# 数据描述print(data.describe())# 检查数据中是否有缺失值import numpy as npprint(np.isnan(data).any())data[data.isnull().values==True]# 取每列的缺失值个数：n_missings_each_col = data.apply(lambda x: x.isnull().sum())print(n_missings_each_col)

# 确定特征和目标X = data.iloc[:, 34:].valuesy = data.iloc[:, 0].values# 划分测试集和训练集from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.3, random_state=42)

# Calibration curves (二分类问题)import numpy as np
from sklearn.svm import LinearSVC

class NaivelyCalibratedLinearSVC(LinearSVC):    """LinearSVC with `predict_proba` method that naively scales    `decision_function` output."""
    def fit(self, X, y):        super().fit(X, y)        df = self.decision_function(X)        self.df_min_ = df.min()        self.df_max_ = df.max()
    def predict_proba(self, X):        """Min-max scale output of `decision_function` to [0,1].""" # 将分数限制在0~1之间，与上面的概率取值范围保持一致，才能同步比较        df = self.decision_function(X) # decision_function函数返回每一个样本属于每一个类别的置信分数，这个分数是根据样本到分隔超平面的距离计算的        calibrated_df = (df - self.df_min_) / (self.df_max_ - self.df_min_)        proba_pos_class = np.clip(calibrated_df, 0, 1)        proba_neg_class = 1 - proba_pos_class        proba = np.c_[proba_neg_class, proba_pos_class]        return proba

from sklearn.calibration import CalibrationDisplayfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.linear_model import LogisticRegressionfrom sklearn.naive_bayes import GaussianNB
# Create classifiers# 实例化四种分类算法：逻辑回归、朴素贝叶斯、线性支持向量分类器、随机森林lr = LogisticRegression(solver='lbfgs')gnb = GaussianNB()svc = NaivelyCalibratedLinearSVC(C=1.0)rfc = RandomForestClassifier()
clf_list = [    (lr, "Logistic"),    (gnb, "Naive Bayes"),    (svc, "SVC"),    (rfc, "Random forest"),]

import matplotlib.pyplot as pltfrom matplotlib.gridspec import GridSpec
fig = plt.figure(figsize=(15, 15))gs = GridSpec(4, 2) # 用于划分子图colors = plt.cm.get_cmap("Dark2")
ax_calibration_curve = fig.add_subplot(gs[:2, :2])calibration_displays = {}for i, (clf, name) in enumerate(clf_list): clf.fit(X_train, y_train) display = CalibrationDisplay.from_estimator( clf, X_test, y_test, n_bins=10, name=name, ax=ax_calibration_curve, color=colors(i), ) calibration_displays[name] = display
ax_calibration_curve.grid()ax_calibration_curve.set_ylabel("fraction of positives")ax_calibration_curve.set_ylim([-0.05, 1.05])ax_calibration_curve.legend(loc="lower right")ax_calibration_curve.set_title('Calibration plots (Reliability curve)') # calibration_curve函数计算校准曲线，体现预测概率与真实分类的差距 # 第一个参数是样本真实分类，第二个参数是样本预测概率，n_bins参数表示把概率从小到大分成几个区间 # 返回的第一个数组是每一个区间中样本为真的比例，第二个数组是每一个区间中样本的平均预测概率 # 当曲线在虚线下方时，表示预测为真>实际为真，当曲线在虚线上方时，表示预测为真<实际为真


# Add histogramgrid_positions = [(2, 0), (2, 1), (3, 0), (3, 1)]for i, (_, name) in enumerate(clf_list): row, col = grid_positions[i] ax = fig.add_subplot(gs[row, col])
 ax.hist( calibration_displays[name].y_prob, range=(0, 1), bins=10,# 在子图上绘制将0~1平均分为10个区间，每一个区间的样本数目 label=name, color=colors(i), ) ax.set(title=name, xlabel="Mean predicted probability", ylabel="Count")
plt.tight_layout()plt.show()# 设置子图的标签、轴刻度范围、legend样例集、使用紧凑的排列，避免子图超出绘图区域不显示

plt.tight_layout()

图.多模型比较的Calibration curves

下面，再来看下这些模型的其他评价指标：
如：Precision, Recall,F1, AUC等

# 其他评价指标from collections import defaultdict
import pandas as pd
from sklearn.metrics import (    precision_score,    recall_score,    f1_score,    brier_score_loss,    log_loss,    roc_auc_score,)
scores = defaultdict(list)for i, (clf, name) in enumerate(clf_list):    clf.fit(X_train, y_train)    y_prob = clf.predict_proba(X_test)    y_pred = clf.predict(X_test)    scores["Classifier"].append(name)
    for metric in [brier_score_loss, log_loss]:        score_name = metric.__name__.replace("_", " ").replace("score", "").capitalize()        scores[score_name].append(metric(y_test, y_prob[:, 1]))
    for metric in [precision_score, recall_score, f1_score, roc_auc_score]:        score_name = metric.__name__.replace("_", " ").replace("score", "").capitalize()        scores[score_name].append(metric(y_test, y_pred))
    score_df = pd.DataFrame(scores).set_index("Classifier")    score_df.round(decimals=3)
score_df

表. 模型的其他评价指标

现在：

长按扫码关注：科研生信充电宝

5元赞赏本文，即喜欢作者~

即可直接解锁：

《Python绘制分类模型的Calibration curves校准曲线》代码

看到这里你还不心动吗？

赶紧关注、转发、点赞、分享，领取你的专属福利吧~

好啦，以上就是今天推文的全部内容啦！

今天的分享就到这里啦~

BIOINFOR

· BIOINFOR ·

永远相信美好的事情

即将发生

vlambda博客
学习文章列表