有条件地插入行

时间:2012-09-06 15:58:10

标签: r insert formatting rows genetics

我有一个独特的数据集,其中一部分可以使用:

进行复制
data <- textConnection("SNP_Pres,Chr_N,BP_A1F,A1_Beta,A2_SE,ForSortSNP,SortOrder
rs122,13,100461219,C,T,rs122,6
1,16362,0.8701,-0.0048,0.0056,rs122,7
1,19509,0.546015137607046,-0.0033,0.0035,rs122,8
1,17218,0.1539,-0.004,0.013,rs122,9
rs142,13,61952115,G,T,rs142,6
1,16387,0.1295,0.0044,0.0057,rs142,7
1,17218,0.8454,0.006,0.013,rs142,9
rs160,13,100950452,C,T,rs160,6
1,16387,0.549,-0.0021,0.0035,rs160,7
1,19509,0.519102731537216,0.003,0.0027,rs160,8
rs298,13,66664221,C,G,rs298,6
1,19509,0.308290808358246,-0.0032,0.0033,rs298,8
1,17218,0.7227,0.022,0.01,rs298,9")
mydata <- read.csv(data, header = T, sep = ",", stringsAsFactors=FALSE)

它被格式化以用于需要保存丢失数据条目点的程序。在这种情况下,Sort Order列中的数字跳过表示缺少的条目。如果列下降6 - 7 - 8 - 9,则条目完成,新条目将以6开始。

我需要一种方法来读取数据文件,并为每个缺少的条目插入一行零,以便文件如下所示:

data <- textConnection("SNP_Pres,Chr_N,BP_A1F,A1_Beta,A2_SE,ForSortSNP,SortOrder
rs122,13,100461219,C,T,rs122,6
1,16362,0.8701,-0.0048,0.0056,rs122,7
1,19509,0.546015137607046,-0.0033,0.0035,rs122,8
1,17218,0.1539,-0.004,0.013,rs122,9
rs142,13,61952115,G,T,rs142,6
1,16387,0.1295,0.0044,0.0057,rs142,7
0,0,0,0,0,rs142,8
1,17218,0.8454,0.006,0.013,rs142,9
rs160,13,100950452,C,T,rs160,6
1,16387,0.549,-0.0021,0.0035,rs160,7
1,19509,0.519102731537216,0.003,0.0027,rs160,8
0,0,0,0,0,rs160,9
rs298,13,66664221,C,G,rs298,6
0,0,0,0,0,rs289, 7
1,19509,0.308290808358246,-0.0032,0.0033,rs298,8
1,17218,0.7227,0.022,0.01,rs298,9")
mydata <- read.csv(data, header = T, sep = ",", stringsAsFactors=FALSE)

最终,最后两列ForSortSNPSortOrder将从数据文件中删除,但为方便起见,现在将它们包括在内。 任何建议都有很大的意义。

1 个答案:

答案 0 :(得分:2)

以下是使用expand.gridmerge函数的解决方案。

grid <- with(mydata, expand.grid(ForSortSNP=unique(ForSortSNP), SortOrder=unique(SortOrder)))
complete <- merge(mydata, grid, all=TRUE, sort=FALSE)
complete[is.na(complete)] <- 0 # replace NAs with 0's
complete <- complete[order(complete$ForSortSNP, complete$SortOrder), ] # re-sort