R语言线性回归+自然样条+Nomogram
载入我们用到的数据集
pollution <- read.table("https://www4.stat.ncsu.edu/~boos/var.select/pollution.data.txt",header = T,stringsAsFactors = F)
本实例变量说明
Y 死亡率
x13 氮氧化物
x14 二氧化硫
载入程序包,并且打包数据,这一步是必须的
library(rms)
## Loading required package: Hmisc
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, units
## Loading required package: SparseM
##
## Attaching package: 'SparseM'
## The following object is masked from 'package:base':
##
## backsolve
d <- datadist(pollution)
options(datadist = "d")
用ols函数建立模型,指定x13为5个节点的自然样条(限制性三次样条)函数形式,在此不具体介绍样条,大家只需要知道它类似于分段多项式即可。
当指定一个变量为节点个数为k
的自然三次样条形式,除截距外,它需要估计的参数个数为k-1
个
mod <- ols(y ~ rcs(x13, 5)+x14,data = pollution, x = TRUE, y = TRUE)
mod
## Linear Regression Model
##
## ols(formula = y ~ rcs(x13, 5) + x14, data = pollution, x = TRUE,
## y = TRUE)
##
## Model Likelihood Discrimination
## Ratio Test Indexes
## Obs 60 LR chi2 28.36 R2 0.377
## sigma51.3356 d.f. 5 R2 adj 0.319
## d.f. 54 Pr(> chi2) 0.0000 g 43.868
##
## Residuals
##
## Min 1Q Median 3Q Max
## -114.5099 -27.3521 -0.2096 25.4952 172.4004
##
##
## Coef S.E. t Pr(>|t|)
## Intercept 861.9767 31.6919 27.20 <0.0001
## x13 11.3924 9.2105 1.24 0.2215
## x13' -334.5967 964.1308 -0.35 0.7299
## x13'' 350.5024 1498.6819 0.23 0.8160
## x13''' 46.0302 574.8216 0.08 0.9365
## x14 0.5136 0.1651 3.11 0.0030
##
从以上输出可以看出,参数个数为 截距+(5-1)+1 = 6个
打印模型,模型形式如下,跟普通线性回归方程一个意思,带入x13和x14便可求得y
options(prType='html')
latex(mod)
诺曼图
plot(nomogram(mod))