我必须要DF:
1
E F G H
chr1_100203723_100203724 NA NA NA
chr1_100212951_100212952 rs760764323 A,G, 0.000008,0.999992,
chr1_10032235_10032236 NA NA NA
chr1_100327060_100327061 NA NA NA
chr1_100346889_100346890 NA NA NA
chr1_100347237_100347238 rs749372877 C,G,T, 0.000008,0.000008,0.999983,
chr1_100357190_100357191 NA NA NA
chr1_100358057_100358058 NA NA NA
chr2_182852606_182852607 NA NA NA
chr2_202492077_202492078 NA NA NA
chr2_203760838_203760839 NA NA NA
chr2_215976351_215976352 NA NA NA
chr2_220354644_220354645 NA NA NA
chr2_234749403_234749404 NA NA NA
chr2_11802110_11802111 NA NA NA
chr2_31167747_31167748 NA NA NA
2
E F G H
chr1_100203723_100203724 NA NA NA
chr1_100212951_100212952 NA NA NA
chr1_10032235_10032236 NA NA NA
chr1_100327060_100327061 NA NA NA
chr1_100346889_100346890 NA NA NA
chr1_100347237_100347238 NA NA NA
chr1_100357190_100357191 NA NA NA
chr1_100358057_100358058 NA NA NA
chr2_182852606_182852607 rs773426830 C,T, 0.999967,0.000033,
chr2_202492077_202492078 rs750583431 C,G, 0.000013,0.999987,
chr2_203760838_203760839 NA NA NA
chr2_215976351_215976352 rs113648834 C,T, 0.999934,0.000066,
chr2_220354644_220354645 NA NA NA
chr2_234749403_234749404 NA NA NA
chr2_11802110_11802111 rs371327070 A,G, 0.000044,0.999956,
chr2_31167747_31167748 rs201375957 A,C,T, 0.000008,0.999887,0.000105,
期望的输出:
E F G H
chr1_100203723_100203724 NA NA NA
chr1_100212951_100212952 rs760764323 A,G, 0.000008,0.999992,
chr1_10032235_10032236 NA NA NA
chr1_100327060_100327061 NA NA NA
chr1_100346889_100346890 NA NA NA
chr1_100347237_100347238 rs749372877 C,G,T, 0.000008,0.000008,0.999983,
chr1_100357190_100357191 NA NA NA
chr1_100358057_100358058 NA NA NA
chr2_182852606_182852607 rs773426830 C,T, 0.999967,0.000033,
chr2_202492077_202492078 rs750583431 C,G, 0.000013,0.999987,
chr2_203760838_203760839 NA NA NA
chr2_215976351_215976352 rs113648834 C,T, 0.999934,0.000066,
chr2_220354644_220354645 NA NA NA
chr2_234749403_234749404 NA NA NA
chr2_11802110_11802111 rs371327070 A,G, 0.000044,0.999956,
chr2_31167747_31167748 rs201375957 A,C,T, 0.000008,0.999887,0.000105,
如您所见,DF1由DF2列F,G,H更新,其中E列是我的唯一索引。我尝试merge()
但是这个功能没有更新我的行,它将DF2的列添加到DF1。我还尝试使用data.table
和tidyverse
进行更新,我的行已更新,但其他行已转到NAs
...最后我决定使用嵌套{lapply()
做简单的ifelse()
{1}}但是,我不知道如何同时更新所有三列,对于每个DF中超过50000行的数据而言,这是非常缓慢的......
到目前为止我做了什么:
DF1$F <- sapply(1:nrow(DF1), function(i) ifelse(DF1[i,1]==DF2[i,1] & is.na(DF1[i,1]), DF2[i,1], DF[i,1]))
答案 0 :(得分:4)
你可以在基地R中做到这一点:
as.data.frame(Map(function(x,y) ifelse(is.na(x),y,x),DF1,DF2))
使用库purrr
,您可以拥有更漂亮更紧凑的形式(请参阅Soto的答案,了解更为紧凑的dplyr
):
library(purrr)
map2_df(DF1,DF2,~ifelse(is.na(.x),.y,.x))
在这两种情况下(技术上第一种情况为data.frame
,第二种情况为tibble
):
<强>输出强>
E F G H
1 chr1_100203723_100203724 <NA> <NA> <NA>
2 chr1_100212951_100212952 rs760764323 A,G, 0.000008,0.999992,
3 chr1_10032235_10032236 <NA> <NA> <NA>
4 chr1_100327060_100327061 <NA> <NA> <NA>
5 chr1_100346889_100346890 <NA> <NA> <NA>
6 chr1_100347237_100347238 rs749372877 C,G,T, 0.000008,0.000008,0.999983,
7 chr1_100357190_100357191 <NA> <NA> <NA>
8 chr1_100358057_100358058 <NA> <NA> <NA>
9 chr2_182852606_182852607 rs773426830 C,T, 0.999967,0.000033,
10 chr2_202492077_202492078 rs750583431 C,G, 0.000013,0.999987,
11 chr2_203760838_203760839 <NA> <NA> <NA>
12 chr2_215976351_215976352 rs113648834 C,T, 0.999934,0.000066,
13 chr2_220354644_220354645 <NA> <NA> <NA>
14 chr2_234749403_234749404 <NA> <NA> <NA>
15 chr2_11802110_11802111 rs371327070 A,G, 0.000044,0.999956,
16 chr2_31167747_31167748 rs201375957 A,C,T, 0.000008,0.999887,0.000105,
数据强>
DF1 <- read.table(text="E F G H
chr1_100203723_100203724 NA NA NA
chr1_100212951_100212952 rs760764323 A,G, 0.000008,0.999992,
chr1_10032235_10032236 NA NA NA
chr1_100327060_100327061 NA NA NA
chr1_100346889_100346890 NA NA NA
chr1_100347237_100347238 rs749372877 C,G,T, 0.000008,0.000008,0.999983,
chr1_100357190_100357191 NA NA NA
chr1_100358057_100358058 NA NA NA
chr2_182852606_182852607 NA NA NA
chr2_202492077_202492078 NA NA NA
chr2_203760838_203760839 NA NA NA
chr2_215976351_215976352 NA NA NA
chr2_220354644_220354645 NA NA NA
chr2_234749403_234749404 NA NA NA
chr2_11802110_11802111 NA NA NA
chr2_31167747_31167748 NA NA NA",header=T,stringsAsFactors=F)
DF2 <- read.table(text="E F G H
chr1_100203723_100203724 NA NA NA
chr1_100212951_100212952 NA NA NA
chr1_10032235_10032236 NA NA NA
chr1_100327060_100327061 NA NA NA
chr1_100346889_100346890 NA NA NA
chr1_100347237_100347238 NA NA NA
chr1_100357190_100357191 NA NA NA
chr1_100358057_100358058 NA NA NA
chr2_182852606_182852607 rs773426830 C,T, 0.999967,0.000033,
chr2_202492077_202492078 rs750583431 C,G, 0.000013,0.999987,
chr2_203760838_203760839 NA NA NA
chr2_215976351_215976352 rs113648834 C,T, 0.999934,0.000066,
chr2_220354644_220354645 NA NA NA
chr2_234749403_234749404 NA NA NA
chr2_11802110_11802111 rs371327070 A,G, 0.000044,0.999956,
chr2_31167747_31167748 rs201375957 A,C,T, 0.000008,0.999887,0.000105,",header=T,stringsAsFactors=F)
答案 1 :(得分:4)
来自coalesce
的{{1}}函数就是这样做的。我确信我们可以使用dplyr
函数来映射2个数据框,但这里有一个使用基数R purrr
,
mapply
给出,
DF1[-1] <- mapply(dplyr::coalesce, DF1[-1], DF2[-1])
注意:正如@Moody_Mudskipper所述,生成新数据框而不更改 E F G H
1 chr1_100203723_100203724 <NA> <NA> <NA>
2 chr1_100212951_100212952 rs760764323 A,G, 0.000008,0.999992,
3 chr1_10032235_10032236 <NA> <NA> <NA>
4 chr1_100327060_100327061 <NA> <NA> <NA>
5 chr1_100346889_100346890 <NA> <NA> <NA>
6 chr1_100347237_100347238 rs749372877 C,G,T, 0.000008,0.000008,0.999983,
7 chr1_100357190_100357191 <NA> <NA> <NA>
8 chr1_100358057_100358058 <NA> <NA> <NA>
9 chr2_182852606_182852607 rs773426830 C,T, 0.999967,0.000033,
10 chr2_202492077_202492078 rs750583431 C,G, 0.000013,0.999987,
11 chr2_203760838_203760839 <NA> <NA> <NA>
12 chr2_215976351_215976352 rs113648834 C,T, 0.999934,0.000066,
13 chr2_220354644_220354645 <NA> <NA> <NA>
14 chr2_234749403_234749404 <NA> <NA> <NA>
15 chr2_11802110_11802111 rs371327070 A,G, 0.000044,0.999956,
16 chr2_31167747_31167748 rs201375957 A,C,T, 0.000008,0.999887,0.000105,
或purrr
的{{1}}版本将是< / p>
DF1
答案 2 :(得分:0)
另一种天真的做法是使用paste0
> df1 <- data.frame(E = c('A','B','C'), F=c('0.9,1',NA,NA), G=c(NA,'0.98,0.34',NA), H=c(NA,'0.98,0.34',NA), stringsAsFactors = F)
> df2 <- data.frame(E = c('A','B','C'), F=c(NA,'1,3',NA), G=c(NA,NA,'5,6,7'), H=c(NA,NA,NA), stringsAsFactors = F)
> df1[is.na(df1)] <- ''
> df2[is.na(df2)] <- ''
>
> mapply(paste, df1[-1], df2[-1])
F G H
[1,] "0.9,1 " " " " "
[2,] " 1,3" "0.98,0.34 " "0.98,0.34 "
[3,] " " " 5,6,7" " "
根据mapply