R - 以df填充列

时间:2016-07-03 16:47:09

标签: r fill na

                a                      b                  P116 P127 P125 P107 P101 P220 P135                                 
1 P116,P115,P113,P120,P112,  P128,P125,P127,P123,P126,    NA   NA   NA   NA   NA   NA   NA
2 P116,P115,P113,P120,P112,  P128,P125,P127,P123,P126,    NA   NA   NA   NA   NA   NA   NA 
3 P120,P117,P116,P115,P119,      P98,P94,P96,P99,P93,     NA   NA   NA   NA   NA   NA   NA
4      P34,P36,P40,P39,P37,  P108,P106,P107,P110,P109,    NA   NA   NA   NA   NA   NA   NA
5 P123,P127,P125,P118,P198,  P135,P132,P134,P138,P131,    NA   NA   NA   NA   NA   NA   NA
6 P142,P148,P149,P140,P150,      P80,P81,P89,P87,P86,     NA   NA   NA   NA   NA   NA   NA

我有一个数据框,其中a和b列的某些值与其他列的名称相匹配。我想用数字代替NA: 1(如果列“a”的行中的值与列3:9的名称匹配),则0(如果列“a”,“b”中的值与列3:9的名称不匹配),则为-1(如果值列“b”的行与列3:9的名称匹配

应该是这样的。

              a                          b               P116 P127 P125 P107 P101 P220 P135                          
1 P116,P115,P113,P120,P112,  P128,P125,P127,P123,P126,    1    -1   -1   0    0    0    0
2 P116,P115,P113,P120,P112,  P128,P125,P127,P123,P126,    1    -1   -1   0    0    0    0 
3 P120,P117,P116,P115,P119,      P98,P94,P96,P99,P93,     1     0    0   0    0    0    0
4      P34,P36,P40,P39,P37,  P108,P106,P107,P110,P109,    0     0    0  -1    0    0    0
5 P123,P127,P125,P118,P198,  P135,P132,P134,P138,P131,    0     1    1   0    0    0   -1
6 P142,P148,P149,P140,P150,      P80,P81,P89,P87,P86,     0     0    0   0    0    0    0

3 个答案:

答案 0 :(得分:2)

我们可以尝试

df[-(1:2)] <- Reduce(`+`,Map(`*`, lapply(c("a", "b"), function(nm) 
       do.call(rbind, lapply(strsplit(df[[nm]], ","), function(x)
         +(names(df)[-(1:2)] %in% x)))), c(1, -1)))
 df
 #                          a                         b P116 P127 P125 P107 P101 P220 P135
 #1 P116,P115,P113,P120,P112, P128,P125,P127,P123,P126,    1   -1   -1    0    0    0    0
 #2 P116,P115,P113,P120,P112, P128,P125,P127,P123,P126,    1   -1   -1    0    0    0    0
 #3 P120,P117,P116,P115,P119,      P98,P94,P96,P99,P93,    1    0    0    0    0    0    0
 #4      P34,P36,P40,P39,P37, P108,P106,P107,P110,P109,    0    0    0   -1    0    0    0
 #5 P123,P127,P125,P118,P198, P135,P132,P134,P138,P131,    0    1    1    0    0    0   -1
 #6 P142,P148,P149,P140,P150,      P80,P81,P89,P87,P86,    0    0    0    0    0    0    0

答案 1 :(得分:1)

我没有正确测试它,并且在较大的数据集上可能会很慢,但这是我非常类似于R的尝试:

假设您的数据框名为document.querySelector(".leaflet-popup-pane").addEventListener("load", function (event) { var tagName = event.target.tagName, popup = map._popup; // Currently open popup, if any. if (tagName === "IMG" && popup) { popup.update(); } }, true); // Capture the load event, because it does not bubble.

df

如果for (row in 1:nrow(df)) { for (col in 3:ncol(df)) { if (grepl(colnames(df)[col], df[row, "a"])) { df[row, col] <- 1 } else if (grepl(colnames(df)[col], df[row, "b"])) { df[row, col] <- -1 } else { df[row, col] <- 0 } } } grepl中的字符串与列名匹配,则循环播放并使用a返回逻辑匹配。

答案 2 :(得分:0)

这是一种经过测试的功能方法。

给出您的数据框:

df=data.frame(a=c(
    "P116,P115,P113,P120,P112,", 
    "P116,P115,P113,P120,P112,", 
    "P120,P117,P116,P115,P119,", 
    "     P34,P36,P40,P39,P37,", 
    "P123,P127,P125,P118,P198,", 
    "P142,P148,P149,P140,P150,"    
  ),
  b=c(
    "P128,P125,P127,P123,P126,", 
    "P128,P125,P127,P123,P126,", 
    "     P98,P94,P96,P99,P93,", 
    "P108,P106,P107,P110,P109,", 
    "P135,P132,P134,P138,P131,",    
    "     P80,P81,P89,P87,P86," 
  ),    
  P116=NA, P127=NA, P125=NA, P107=NA, P101=NA, P220=NA, P135=NA,
  stringsAsFactors=FALSE)

解决方案是:

sel=lapply(as.list(df[, 1:2]), function(col)
    t(sapply(col, function(x) match(strsplit(x, "," )[[1]], names(df)[-(1:2)], nomatch=0))))
dfm=as.matrix(df[, -(1:2)])
k=-1
lapply(sel, function(selr){
    i<<-0;  k<<-k*-1
    apply(selr, 1, function(j) {
        i <<- i+1
        dfm[cbind(i,j)]<<- k
    })}
    )
dfm[is.na(dfm)]=0     
df[, -(1:2)]=dfm

你得到:

df
                            a                         b  P116 P127 P125 P107 P101 P220 P135  
## 1 P116,P115,P113,P120,P112, P128,P125,P127,P123,P126,    1   -1   -1    0    0    0    0 
## 2 P116,P115,P113,P120,P112, P128,P125,P127,P123,P126,    1   -1   -1    0    0    0    0 
## 3 P120,P117,P116,P115,P119,      P98,P94,P96,P99,P93,    1    0    0    0    0    0    0 
## 4      P34,P36,P40,P39,P37, P108,P106,P107,P110,P109,    0    0    0   -1    0    0    0 
## 5 P123,P127,P125,P118,P198, P135,P132,P134,P138,P131,    0    1    1    0    0    0   -1 
## 6 P142,P148,P149,P140,P150,      P80,P81,P89,P87,P86,    0    0    0    0    0    0    0 

请下次使用dput(<your dataframe>)让您的问题更容易回答。