如何根据初始列A,B,C值设置列C值

时间:2017-11-13 19:58:45

标签: r data.table

我有下表:

A     B       C
food  fruit   apple
food  fruit   
food  drink
food  fruit   
car   suv     ford
car   sedan   bmw
car   suv
car   sedan

期望的结果:

 A     B       C
food  fruit   apple
food  fruit   apple
food  drink
food  fruit   apple 
car   suv     ford
car   sedan   bmw
car   suv     ford
car   sedan   bmw

如何根据A列,B列中的值完成C列?例如,如果A列中的值=食物而B列=水果,则应填写C列。理想情况下,我想这样做而不必手动输入A,B对列和相应的C列值,因为我的表有数千种这样的组合。

非常感谢任何帮助!

3 个答案:

答案 0 :(得分:2)

使用data.table的两种选择:

library(data.table)
setDT(d1)[, C := C[C != ''], by = .(A,B)][]
setDT(d1)[, C := ifelse(all(C == ''), '', C[C != '']), by = .(A,B)][]

两者都给出了:

> d1
      A     B     C
1: food fruit apple
2: food fruit apple
3: food drink      
4: food fruit apple
5:  car   suv  ford
6:  car sedan   bmw
7:  car   suv  ford
8:  car sedan   bmw

使用dplyr的替代方案:

library(dplyr)
d1 %>% 
  group_by(A, B) %>% 
  summarise(C = ifelse(all(C == ''), '', C[C != ''])) %>% 
  right_join(., d1, by = c('A','B')) %>% 
  select(A, B, C = C.x)

给出了类似的结果。

答案 1 :(得分:1)

这是一个使用data.table的解决方案。

library(data.table)
setDT(dx)[,id:=1:.N] ## create variable to conserve origin order

dx[,C:={
  val <- unique(C[nzchar(C)])  
  if(length(val)==0) val <- ""    ## case empty C
  if(length(val)>1) val <- val[1] ## case multiple values

  rep(val,length(C))
  }, "A,B"][order(id)][,id:=NULL]

#       A     B     C
# 1: food fruit apple
# 2: food fruit apple
# 3: food drink      
# 4: food fruit apple
# 5:  car   suv  ford
# 6:  car sedan   bmw
# 7:  car   suv  ford
# 8:  car sedan   bmw

其中:

dx <- read.table(text="A     B       C
food  fruit   apple
food  fruit   
food  drink
food  fruit   
car   suv     ford
car   sedan   bmw
car   suv
car   sedan",header=TRUE,fill=TRUE,stringsAsFactors=FALSE)

答案 2 :(得分:0)

来自fill的{​​{1}}的解决方案:

tidyr

<强>结果:

library(dplyr)
library(tidyr)

df %>%
  mutate(C = ifelse(C == "", NA, C)) %>%
  group_by(A, B) %>%
  fill(C) 

获取原始行顺序:

# A tibble: 8 x 3
# Groups:   A, B [4]
      A     B     C
  <chr> <chr> <chr>
1   car sedan   bmw
2   car sedan   bmw
3   car   suv  ford
4   car   suv  ford
5  food drink  <NA>
6  food fruit apple
7  food fruit apple
8  food fruit apple

<强>结果:

df %>%
  mutate(C = ifelse(C == "", NA, C),
         ID = row_number()) %>%
  group_by(A, B) %>%
  fill(C) %>%
  arrange(ID) %>%
  select(-ID)

数据:

# A tibble: 8 x 3
# Groups:   A, B [4]
      A     B     C
  <chr> <chr> <chr>
1  food fruit apple
2  food fruit apple
3  food drink  <NA>
4  food fruit apple
5   car   suv  ford
6   car sedan   bmw
7   car   suv  ford
8   car sedan   bmw