我有一个非常大的报告文件,我想检查名为Sample_id
的某个列,以获取唯一值。我的数据如下:
[1] "[Header]"
[2] "GSGT Version\t1.9.4"
[3] "Processing Date\t7/6/2012 11:41 AM"
[4] "Content\t\tGS0005701-OPA.opa\tGS0005702-OPA.opa\tGS0005703-OPA.opa\tGS0005704-OPA.opa"
[5] "Num SNPs\t5858"
[6] "Total SNPs\t5858"
[7] "Num Samples\t132"
[8] "Total Samples\t132"
[9] "[Data]"
[10] "SNP Name\tSample ID\tGC Score\tAllele1 - AB\tAllele2 - AB\tChr\tPosition\tGT Score\tX Raw\tY Raw"
[11] "rs1867749\t106N\t0.8333\tB\tB\t2\t120109057\t0.8333\t301\t378"
[12] "rs1397354\t106N\t0.6461\tA\tB\t2\t215118936\t0.6461\t341\t192"
[13] "rs2840531\t106N\t0.5922\tB\tB\t1\t2155821\t0.6091\t296\t391"
[14] "rs649593\t106N\t0.8709\tA\tB\t1\t37635225\t0.8709\t357\t200"
[15] "rs1517342\t106N\t0.4839\tA\tB\t2\t169218217\t0.4839\t316\t210"
[16] "rs1517343\t106N\t0.5980\tA\tB\t2\t169218519\t0.5980\t312\t165"
[17] "rs1868071\t106N\t0.5518\tA\tB\t2\t30219358\t0.5518\t355\t229"
[18] "rs761162\t106N\t0.6923\tA\tB\t1\t13733834\t0.6923\t315\t257"
[19] "rs911903\t106N\t0.6053\tA\tA\t1\t46982589\t0.6096\t383\t158"
[20] "rs753646\t106N\t0.6676\tA\tB\t1\t208765509\t0.6688\t341\t169"
所以我的问题是如何使用R检查列Sample_ID
中的唯一值。我已经知道unique
的某些内容但是如何使用制表符分隔文件来获取正确的列?
答案 0 :(得分:3)
首先阅读文件:
sample_data <- read.table(file = "filename", sep = "\t", skip = 9, header = TRUE)
然后执行(列名中的空格自动转换为点)
unique(sample_data[, "Sample.ID"])
答案 1 :(得分:1)
如果数据在R对象中,比如说它名为“Lines”,那么你需要将cafe876提供的敏感解决方案应用于textConnection调用,或者使用在最近版本中添加到R的text =参数:
samp_dat <- read.table(file = textConnection(Lines), sep = "\t", skip = 9, header=TRUE)
OR:
samp_dat&lt; - read.table(text = Lines,sep =“\ t”,skip = 9,header = TRUE)
这是一个测试用例:
Lines <-
c("[Header] ",
"GSGT Version\t1.9.4 ",
"Processing Date\t7/6/2012 11:41 AM ",
"Content\t\tGS0005701-OPA.opa\tGS0005702-OPA.opa\tGS0005703-OPA.opa\tGS0005704-OPA.opa ",
"Num SNPs\t5858 ",
"Total SNPs\t5858 ",
"Num Samples\t132 ",
"Total Samples\t132 ",
"[Data] ",
"SNP Name\tSample ID\tGC Score\tAllele1 - AB\tAllele2 - AB\tChr\tPosition\tGT Score\tX Raw\tY Raw",
"rs1867749\t106N\t0.8333\tB\tB\t2\t120109057\t0.8333\t301\t378 ",
"rs1397354\t106N\t0.6461\tA\tB\t2\t215118936\t0.6461\t341\t192 ",
"rs2840531\t106N\t0.5922\tB\tB\t1\t2155821\t0.6091\t296\t391 ",
"rs649593\t106N\t0.8709\tA\tB\t1\t37635225\t0.8709\t357\t200"
)