【R语言】群体结构可视化R语言包pophelper的使用
本教程主要是关于 R 包的使用,将从安装、输入文件、输出文件、主要函数等几方面进行描述。
pophelper
可以直接从 CRAN 安装,也可以从 GitHub 上安装。
安装之前最好是先安装依赖包:
install.packages(c("devtools","ggplot2","gridExtra","gtable","label.switching","tidyr"),dependencies=T)
-
从 CRAN 安装
# install pophelper package from CRAN
install.packages('pophelper') -
从 GitHub 安装
# install pophelper package from GitHub
devtools::install_github('royfrancis/pophelper')
安装完成后直接加载即可:
# load library
library(pophelper)
# check version
packageDescription("pophelper", fields="Version")
pophelper
的函数如下:
# convert q-matrix run files (structure, tess 2.3, baps, basic, clumpp)
# to R qlist object
readQ()
# convert TESS3 R list object to qlist object
readQTess3()
# collate/tabulate a qlist
tabulateQ()
# summarise an output from tabulateQ()
summariseQ()
# Align clusters
alignK()
# create single-line barplots from qlist
plotQ()
# create multi-line barplots from qlist
PlotQMultiline()
# export files for DISTRUCT from qlist
distructExport()
# Run and plot the Evanno method for STRUCTURE data
evannoMethodStructure()
# collect TESS output from multiple directories into one
collectRunsTess()
# wrapper function to tabulate, summarise, perform evanno method and
# generate barplots from filenames/paths.
analyseQ()
可以有 5 种类型的输入数据:
-
软件 STRUCTURE
结果文件; -
软件 TESS2.3
结果文件; -
BAPS
文件; -
BASIC
文件; -
CLUMPP
文件
下面这个是 CLUMPP
的数据格式:
其中 plotQ
和 plotMultiline
连个函数绘图需要的分组标签需要是 data.frame
格式的;distructExport
需要的标签必须是字符串格式的。
数据读入到 R 以后会被存储为一个 qlist
对象。
这个函数只能用于 TESS2.3。TESS
输出的结果文件在不同的文件夹,也就需要在不同的文件夹中读取对应的数据。collectRunsTess
能够读取不同文件夹的文件,重命名后放到第一个文件目录下。
# basic usage
collectRunsTess(runsdir="path-to-tess-runs-root-dir")
# another usage
path <- "path-to-tess-runs-root-dir"
collectRunsTess(runsdir=path)
# another usage
collectRunsTess(runsdir=choose.dir())
这个数会自动搜索以 TR.txt
结尾的文件,重命名后赋值到另外一个文件夹。不能将生成的原始结果文件进行重命名。
readQ
能够将 STRUCTURE
、 TESS 2.3
、BASIC
、 CLUMPP
等软件的结果读入到 R 并转换成 qlist
对象。使用命令 filetype
指定输入文件的类型,默认的是 auto
。
readQ() # automatically detects input filetype
readQ(filetype="auto") # automatically detects input filetype
readQ(filetype="structure") # Convert STRUCTURE run files to qlist
readQ(filetype="tess") # Convert TESS2 run files to qlist
readQ(filetype="baps") # Convert BAPS run files to qlist
readQ(filetype="basic") # Convert delimited numeric text files to qlist
readQ(filetype="clumpp") # Convert CLUMPP format files to qlist
包里面内置了对应的数据集,可以使用对应的函数进行读取:
# STRUCTURE files (do not use this command to read local files)
sfiles <- list.files(path=system.file("files/structure",package="pophelper"), full.names=T)
# basic usage
slist <- readQ(files=sfiles)
readQ(files=sfiles,filetype="structure")
# select files interactively
# readQ(files=choose.files(multi=TRUE))
# check class of ouput
class(slist)
# view head of first converted file
head(slist[[1]])
# TESS files (do not use this command to read local files)
tfiles <- list.files(path=system.file("files/tess",package="pophelper"), full.names=T)
tlist <- readQ(files=tfiles)
# select files interactively
# readQ(files=choose.files(multi=TRUE))
# use BAPS files (do not use this command to read local files)
bfiles<- list.files(path=system.file("files/baps",package="pophelper"), full.names=T)
blist <- readQ(files=bfiles)
# use ADMIXTURE files (do not use this command to read local files)
afiles<- list.files(path=system.file("files/admixture",package="pophelper"), full.names=T)
alist <- readQ(files=afiles)
# use FASTSTRUCTURE files (do not use this command to read local files)
ffiles <- list.files(path=system.file("files/faststructure",package="pophelper"), full.names=T)
flist <- readQ(files=ffiles)
# use space-delimited text files (do not use this command to read local files)
msfiles <- list.files(path=system.file("files/basic/space",package="pophelper"), full.names=T)
mslist <- readQ(files=msfiles)
# use tab-delimited text files (do not use this command to read local files)
mtfiles <- list.files(path=system.file("files/basic/tab",package="pophelper"), full.names=T)
mtlist <- readQ(files=mtfiles)
# use comma-separated text files (do not use this command to read local files)
mcfiles <- list.files(path=system.file("files/basic/comma",package="pophelper"), full.names=T)
mclist <- readQ(files=mcfiles)
当对每个 K 多次运行时,无法正确分配对应的颜色,需要用 alignK
进行校正。没有进行校正时:
sfiles <- list.files(path=system.file("files/structure",package="pophelper"), full.names=T)
slist <- readQ(sfiles)
p1 <- plotQ(slist[c(3,4,10)],imgoutput="join",returnplot=T,exportplot=F,quiet=T,basesize=11)
grid.arrange(p1$plot[[1]])
使用 alignK
进行校正:
slist1 <- alignK(slist[c(3,4,10)])
p1 <- plotQ(slist1,imgoutput="join",returnplot=T,exportplot=F,quiet=T,basesize=11)
grid.arrange(p1$plot[[1]])
可以看到每个 K 值的重复颜色都一样了。
函数 plotQ
对 qlist
对象进行可视化。每个个体(样本)展示在一列。所有的参数见下图。
上面这个图的代码如下:
sfiles <- list.files(path=system.file("files/structure",package="pophelper"), full.names=T)
slist <- readQ(files=sfiles,indlabfromfile=T)
threelabset <- read.delim(system.file("files/metadata.txt", package="pophelper"), header=T,stringsAsFactors=F)
twolabset <- threelabset[,2:3]
plotQ(slist[2:3],imgoutput="join",showindlab=T,grplab=twolabset,
subsetgrp=c("Brazil","Greece"),selgrp="loc",ordergrp=T,showlegend=T,
showtitle=T,showsubtitle=T,titlelab="The Great Structure",
subtitlelab="The amazing population structure of your favourite organism.",
height=2,indlabsize=2.3,indlabheight=0.08,indlabspacer=-1,
barbordercolour="white",barbordersize=0,outputfilename="figures/plotq",imgtype="png")
其他的绘图参数参考网站:http://www.royfrancis.com/pophelper/articles/index.html