如何选择以另一列的值为条件的列? (dplyr)

时间:2018-03-02 20:25:30

标签: r dplyr

Name1    Col1    Col2    Col3    Name2    Col4    Col5    Col6    Col7
John     A       A       A       Alex       B       B       B      1
Alex     B       B       B       John       A       A       A      0

查看上面的数据框,我想根据Col7的值选择数据。具体来说,如果Col7 = 1,那么我想选择第1,2和3列。如果Col7 = 0,则选择Cols 4,5,6。 Col 4,5,6与Cols 1,2,3的变量相同,只是与Alex而不是John相关联(第1行)。因此,两次选择John的数据,对于每一对都是相同的。

我在想“Dplyr”中的某种形式的选择会起作用,但我在条件选择方面遇到了麻烦。

我的最终数据框架如下所示:

Name1    Col1    Col2    Col3
John      A       A       A
John      A       A       A

4 个答案:

答案 0 :(得分:1)

嗨尝试一些非常基本的东西(结合filter和select_at):

df1 <- df %>% 
  filter(Col7 == 1) %>% 
  select_at(vars(Name = Name1, Col1, Col2, Col3))
df2 <- df %>% 
  filter(Col7 == 0) %>% 
  select_at(vars(Name = Name2, Col1 = Col4, Col2 = Col5, Col3 = Col6))
df <- bind_rows(df1, df2)

您可以获得所需的数据框:

> df
  Name Col1 Col2 Col3
1 John    A    A    A
2 John    A    A    A

答案 1 :(得分:0)

您可以使用data.table或reshape2中的melt,然后在条件下保持联接:

library(data.table)
setDT(d)

d[, row := .I]
md = melt(d, id=c("row", "Col7"), 
  meas = Map(c, 1:4, 5:8), 
  variable.factor = FALSE,
  variable.name = "colset",
  value.name = names(d)[1:4])
#    row Col7 colset Name1 Col1 Col2 Col3
# 1:   1    1      1  John    A    A    A
# 2:   2    0      1  Alex    B    B    B
# 3:   1    1      2  Alex    B    B    B
# 4:   2    0      2  John    A    A    A

cond = data.table(Col7 = 0:1, colset = c("2", "1"))
#    Col7 colset
# 1:    0      2
# 2:    1      1

res = md[cond, on=names(cond), nomatch=0]
#    row Col7 colset Name1 Col1 Col2 Col3
# 1:   2    0      2  John    A    A    A
# 2:   1    1      1  John    A    A    A

此方法扩展到两组以上的列,例如meas=Map(c, 1:4, 5:8, 9:12)

答案 2 :(得分:0)

在基地R:

create table TableC as select 
a.ID, 
case when b.Field1=1000 and a.Field1=50 then 20 else 0 end as FieldA,
case when b.Field2=15 and a.Field2=100 then 100 else 0 end as FieldB
from TableA a, TableB b
where a.ID=b.ID
order by 1

答案 3 :(得分:0)

这是一个tidyverse(比dplyr更多的tidyr)方法。这是相当冗长的,因为你的原始数据不是很整齐,所以大多数代码只是变成一个长形式,清理并传播回广泛的形式。

library(tidyverse)

df <- data_frame(Name1 = c("John", "Alex"), 
                 Col1 = c("A", "B"), Col2 = c("A", "B"), Col3 = c("A", "B"), 
                 Name2 = c("Alex", "John"), 
                 Col4 = c("B", "A"), Col5 = c("B", "A"), Col6 = c("B", "A"), 
                 Col7 = c(1L, 0L))

df %>% 
    # reshape to long form
    gather(col, col_val, num_range('Col', 1:6)) %>% 
    gather(name_var, name, contains('Name')) %>% 
    # clean, subset, clean for spreading
    mutate(col = parse_number(col), 
           name_var = parse_number(name_var)) %>% 
    filter(ifelse(Col7 == 1, 
                  col %in% 1:3 & name_var == 1, 
                  col %in% 4:6 & name_var == 2)) %>% 
    mutate(col = paste0('Col', col %% 3 + 1), 
           name_var = 'Name') %>% 
    # reshape back to wide form
    spread(name_var, name) %>% 
    spread(col, col_val) %>% 
    # clean
    select(-Col7)
#> # A tibble: 2 x 4
#>   Name  Col1  Col2  Col3 
#>   <chr> <chr> <chr> <chr>
#> 1 John  A     A     A    
#> 2 John  A     A     A