是否有一个软件包允许我从我的R数据集中编写一个.ped文件,以便与具有合适标题的EPACTS一起使用?
我不能谷歌,只能找到一种方法来阅读它
答案 0 :(得分:0)
网络搜索显示没有工具可以执行此操作。您可能需要考虑使用VCF格式,因为EPACTS似乎接受这一点:
http://genome.sph.umich.edu/wiki/EPACTS#VCF_file_for_Genotypes
您可以使用plink将PED转换为VCF,如下所示:
plink --file prefix --recode vcf --out prefix
您可能需要摆弄其他选项以使其产生您想要的输出,请参阅https://www.cog-genomics.org/plink2/data#recode,具体说明:
The 'vcf', 'vcf-fid', and 'vcf-iid' modifiers result in production of a
VCFv4.2 file. 'vcf-fid' and 'vcf-iid' cause family IDs and within-family IDs
respectively to be used for the sample IDs in the last header row, while
'vcf' merges both IDs and puts an underscore between them (in this case, a
warning will be given if an ID already contains an underscore).
If the 'bgz' modifier is added, the VCF file is block-gzipped. (Gzipping
of other --recode output files is not currently supported.)
The A2 allele is saved as the reference and normally flagged as not
based on a real reference genome ('PR' INFO field value). When it is
important for reference alleles to be correct, you'll usually also want to
include --a2-allele and --real-ref-alleles in your command.
答案 1 :(得分:0)
EPACTS需要VCF和PED文件作为关联分析的输入。与PLINK documentation中描述的PED文件不同,EPACTS中使用的PED文件不包含基因型数据。它的目的是保存您的表型数据和协变量,它需要一个.ped扩展才能被EPACTS认可。
要将R中的数据框导出为PED文件,您只需指定需要.ped扩展名;您可以使用以下命令:
write.table(df, filename.ped, sep="\t", row.names=F, col.names=T, quote=F)
EPACTS还要求注释掉包含列名的标题行。我通常只是手动执行此步骤,因为添加了#'#'非常快,我总是打开我的文件来检查它。或者,您可以设置col.names = F并使用.dat文件,如EPACTS文档中所示:https://genome.sph.umich.edu/wiki/EPACTS#PED_file_for_Phenotypes_and_Covariates