工具 | 比较基因组 | WGDI
2024-04-10 15:10:30  阅读数 1792

看我不如看【参考】

参考:
WGDI | https://github.com/SunPengChuan/wgdi
WGDI | https://wgdi.readthedocs.io/en/latest/Introduction.html
bilibili | WGDI的简单使用(一)
bilibili | WGDI的简单使用(二)
简书 | xuzhougeng | 如何用WGDI进行共线性分析(上)
简书 | xuzhougeng | 如何用WGDI进行共线性分析(中)
简书 | xuzhougeng | 如何用WGDI进行共线性分析(下)
简书 | xuzhougeng | WGDI之深入理解blockinfo输出结果
简书 | xuzhougeng | 使用jcvi基于wgdi的共线性结果(-icl)绘制点图
简书 | 生信石头 | 点阵图 | 比较基因组分析之基石 - 手把手入门教程
简书 | 生信石头 | WGDI | 比较基因组分析神器,要啥有啥!
公众号 | 生信媛 | 使用WGDI分析全基因组复制事件

1. 下载

$ conda create -c bioconda -c conda-forge -n wgdi wgdi
$ conda activate wgdi

2. 所需文件

  • blast 结果
    -outfmt 6
  • gff
    wgdi自己定义的gff,注意提取最长转录本
  • lens
    注意排序
  • cds.fa
    将所选物种序列cat到一起
  • pep.fa

3. 运行

  • 生成配置文件
$ wgdi -d help >> total.conf
  • 修改手动total.conf文件
  • 结果运行
$ wgdi -d total.conf

每个参数都是【三步走】方式,运行简单

4. 逐个参数

原图地址:https://wgdi.readthedocs.io/en/latest/Introduction.html
  • -d | [dotplot]
    依赖:blast 结果、gff、lens
    功能:显示同源基因dotplot
  • -icl | [collinearity]
    依赖:blast 结果、gff、lens
    功能:ColinearScan 的改进版,动态规划算法提取共线性
  • -ks | [ks]
    依赖:-icl结果、pep.fa、cds.fa
    功能:yn00 计算共线性块上同源基因间的ka、ks
  • -bi | [blockinfo]
    依赖:-icl结果、ks结果
    功能:结果整合

  • -c | [correspondence]
    依赖:-bi结果
    功能:提取

  • -bk | [blockks]
    依赖:-bi结果
    功能:绘制dotplot 展示共线性块上的 Ks

  • -kp | [kspeaks]
    依赖:-bi结果
    功能:ks分布图

  • -pf | [peaksfit]
    依赖:-bi结果
    功能:ks分布的高斯拟合

  • -kf | [ksfigure]
    依赖:-kp结果
    功能:ks分布图(正态)

后面古基因组和核型分析这块,我还不是很明白。

4. 直接生成全部配置文件

可以把关键信息传参导入,不用每次修改文件名了。

$ python3 createWGDIconf.py -h
usage: createWGDIconf.py [-h] [--prefix PREFIX] --blast BLAST --gff1 GFF1 [--gff2 [GFF2]] --len1 LEN1 [--len2 [LEN2]] [--genome_name1 GENOME_NAME1] [--genome_name2 GENOME_NAME2] [--cds CDS]
                         [--pep PEP]

OK

optional arguments:
  -h, --help            show this help message and exit
  --prefix PREFIX
  --blast BLAST
  --gff1 GFF1
  --gff2 [GFF2]
  --len1 LEN1
  --len2 [LEN2]
  --genome_name1 GENOME_NAME1
  --genome_name2 GENOME_NAME2
  --cds CDS
  --pep PEP
#!python3

import argparse


def create_parser():
    parser = argparse.ArgumentParser()
    parser.add_argument("--prefix", default="test")
    parser.add_argument("--blast", required=True)
    parser.add_argument("--gff1", required=True)
    parser.add_argument("--gff2", nargs='?')
    parser.add_argument("--len1", required=True)
    parser.add_argument("--len2", nargs='?')
    parser.add_argument("--genome_name1", default="genome1")
    parser.add_argument("--genome_name2", default="genome2")
    parser.add_argument("--cds", default="cds.fa")
    parser.add_argument("--pep", default="pep.faa")
    return parser

def create_conf():
    d = """[dotplot]
blast = {blast}
gff1 =  {gff1}
gff2 =  {gff2}
lens1 = {len1}
lens2 = {len2}
genome1_name =  {genome_name1}
genome2_name =  {genome_name2}
multiple  = 1
score = 100
evalue = 1e-5
repeat_number = 10
position = order
blast_reverse = false
ancestor_left = none
ancestor_top = none
markersize = 0.5
figsize = 10,10
savefig = {prefix}.pdf

[dotplot]
blast = {blast}
gff1 =  {gff1}
gff2 =  {gff2}
lens1 = {len1}
lens2 = {len2}
genome1_name =  {genome_name1}
genome2_name =  {genome_name2}
multiple  = 1
score = 100
evalue = 1e-5
repeat_number = 10
position = order
blast_reverse = false
ancestor_left = none
ancestor_top = none
markersize = 0.5
figsize = 10,10
savefig = {prefix}.pdf

[collinearity]
gff1 = {gff1}
gff2 = {gff2}
lens1 = {len1}
lens2 = {len2}
blast = {blast}
blast_reverse = false
multiple  = 1
process = 8
evalue = 1e-5
score = 100
grading = 50,25,10
mg = 25,25
pvalue = 1
repeat_number = 10
positon = order
savefile = {prefix}.icl.collinearity.txt

[ks]
cds_file = {cds}
pep_file = {pep}
align_software = muscle
pairs_file = {prefix}.icl.collinearity.txt
ks_file = {prefix}.ks

[blockinfo]
blast = {blast}
gff1 =  {gff1}
gff2 =  {gff2}
lens1 = {len1}
lens2 = {len2}
collinearity = {prefix}.icl.collinearity.txt
score = 100
evalue = 1e-5
repeat_number = 20
position = order
ks = {prefix}.ks
ks_col = ks_NG86
savefile = {prefix}.block.csv

[correspondence]
blockinfo =  {prefix}.block.csv
lens1 = {len1}
lens2 = {len2}
tandem = false
tandem_length = 200
pvalue = 0.2
block_length = 5
multiple  = 1
homo = -1,1
savefile = {prefix}.block.c.csv

[blockks]
lens1 = len1
lens2 = {len2}
genome1_name =  {genome_name1}
genome2_name =  {genome_name2}
blockinfo = {prefix}.block.c.csv
pvalue = 0.2
tandem = false
tandem_length = 200
markersize = 1
area = 0,2
block_length =  5
figsize = 8,8
savefig = {prefix}.ks.c.pdf

[kspeaks]
blockinfo = {prefix}.block.c.csv
pvalue = 0.2
tandem = false
block_length = 5
ks_area = 0,10
multiple  = 1
homo = 0,1
fontsize = 9
area = 0,3
figsize = 10,6.18
savefig = {prefix}.kp.pdf
savefile = {prefix}.kp.medain.ks

[peaksfit]
blockinfo = {prefix}.block.c.csv
mode = median
bins_number = 200
ks_area = 0,10
fontsize = 9
area = 0,3
figsize = 10,6.18
shadow = true
savefig = {prefix}.pf.pdf

""".format(blast=args.blast,
    gff1=args.gff1,
    gff2=args.gff2,
    len1=args.len1,
    len2=args.len2,
    genome_name1=args.genome_name1,
    genome_name2=args.genome_name2,
    prefix=args.prefix,
    cds=args.cds,
    pep=args.pep)

    return(d)

if __name__ == "__main__":
    parser = create_parser()
    args = parser.parse_args()
    if args.len2 is None:
        args.len2=args.len1
    if args.gff2 is None:
        args.gff2=args.gff1
    conf = create_conf()
    print(conf)