如何在不考虑Na值的情况下返回多列,并按R中的其他列名称分组?

时间:2019-04-02 13:28:24

标签: r dataframe variables

   const basicAuth = username + ':' + password;
    let Headers = new HttpHeaders();
    Headers = Headers.append('Authorization', 'Basic ' + btoa(basicAuth));
    return this.http.post<any>(`${this.authUrl}getLoggedInUserInfo.json`, null, { headers: Headers })

给予

mexico <- c(1,2,5,1,NA,1)
argentina <- c(2,2,2,2,NA,2)
italy<- c(NA,10,10,10,NA,10)
spain <- c(NA,NA,11,11,11,11)
england <- c(5,NA,10,NA,NA,12)
germany <- c(1,NA,NA,NA,NA,10)

Data_Risk = data.frame( Mexico, Argentina, Italy, Spain, England, Germany)

Data_Risk 

在这种情况下,我无需考虑不适用的情况,因此我尝试了此操作

 mexico     argentina italy spain england germany

1      1         2    NA    NA       5       1
2      2         2    10    NA      NA      NA
3      5         2    10    11      10      NA
4      1         2    10    11      NA      NA
5     NA        NA    NA    11      NA      NA
6      1         2    10    11      12      10

结果:

Data_Risk <- as.data.table(Data_Risk)
my_c <- !apply(Data_Risk, 1, is.na)[,1]
my_L <- Data_Risk[1]
as.data.frame(my_L)[my_c]

在这种情况下,我不仅需要考虑一行,而且还考虑所有这些。
此外,按行分组需要放在新列中,无需考虑 值,因此最终表必须如下所示:

  Mexico Argentina England Germany
1      1         2       5       1

3 个答案:

答案 0 :(得分:1)

存在一个关于所需内容的问题,但是如果要在每个行中分别用以下非NA替换每个NA,则下面给出该形式的矩阵:

library(zoo)
t(apply(Data_Risk, 1, na.locf0, fromLast = TRUE))

给予:

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    2    5    5    5    1
[2,]    2    2   10   NA   NA   NA
[3,]    5    2   10   11   10   NA
[4,]    1    2   10   11   NA   NA
[5,]   11   11   11   11   NA   NA
[6,]    1    2   10   11   12   10

或者如果您要将每行的NA移至末尾:

t(apply(Data_Risk, 1, function(x) c(na.omit(x), rep(NA, sum(is.na(x))))))

给予:

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    2    5    1   NA   NA
[2,]    2    2   10   NA   NA   NA
[3,]    5    2   10   11   10   NA
[4,]    1    2   10   11   NA   NA
[5,]   11   NA   NA   NA   NA   NA
[6,]    1    2   10   11   12   10

或等效地:

t(apply(Data_Risk, 1, function(x) "length<-"(na.omit(x), length(x))))

答案 1 :(得分:0)

我们可以逐行使用apply,找出非NA索引,将它们替换为列名,并用NA追加其余索引。

t(apply(Data_Risk, 1, function(x) {
    inds <- which(!is.na(x))
   c(names(Data_Risk)[inds], rep(NA,ncol(Data_Risk) - length(inds)))
}))

#        [,1]         [,2]     [,3]      [,4]      [,5]      [,6]     
#[1,] "mexico" "argentina" "england" "germany" NA        NA       
#[2,] "mexico" "argentina" "italy"   NA        NA        NA       
#[3,] "mexico" "argentina" "italy"   "spain"   "england" NA       
#[4,] "mexico" "argentina" "italy"   "spain"   NA        NA       
#[5,] "spain"  NA          NA        NA        NA        NA       
#[6,] "mexico" "argentina" "italy"   "spain"   "england" "germany"

如果要将最终输出作为数据帧,请在apply中将data.frame()换行。

答案 2 :(得分:0)

一种选择是查看which(!is.na(Data_Risk), arr.ind = T)并将其扩展为宽格式,将col变量替换为order(col),并添加一个colnm列以用作值.var在扩展到长(dcast)过程中。

library(data.table)
library(magrittr)

nms <- as.data.table(which(!is.na(Data_Risk), arr.ind = T))

nms[, .(colnm = names(Data_Risk)[col], col = paste0('var', order(col)))
    , by = row] %>% 
  dcast(row ~ col, value.var = 'colnm')

#    row   var1      var2    var3    var4    var5    var6
# 1:   1 mexico argentina england germany    <NA>    <NA>
# 2:   2 mexico argentina   italy    <NA>    <NA>    <NA>
# 3:   3 mexico argentina   italy   spain england    <NA>
# 4:   4 mexico argentina   italy   spain    <NA>    <NA>
# 5:   5  spain      <NA>    <NA>    <NA>    <NA>    <NA>
# 6:   6 mexico argentina   italy   spain england germany

等效的dplyr代码:

library(dplyr)

nms <- as.data.frame(which(!is.na(Data_Risk), arr.ind = T))

nms %>% 
  group_by(row) %>% 
  mutate(colnm = names(Data_Risk)[col],
         col = paste0('var', order(col))) %>% 
  spread(col, value = colnm) %>% 
  ungroup