根据数据帧2中的ID替换数据帧一

时间:2018-02-15 08:26:47

标签: r dataframe

我想将以下示例中的第一个df替换为基于ID列的数据帧2中的行。 例如:假设人X在数据帧1上有100个项目,但是当我们查看数据帧2时,我们看到他实际上只有50个项目,而其他50个项目用于Z,所以在最终结果中,我们应该有一行具有50个项目的人X和具有50个项目的人Z的另一行,两者具有相同的ID。

数据框1

ID      Name        Status  Items
16      Amy B       Closed  100
10      Erik C      Closed  80
14      Paul R      Closed  20
17      Chris K     Closed  40
19      Ali I        Closed   60
22      Jenny A     Closed  40

数据框2

ID  Name    Items
14  Paul R  10
14  Sarah K 10
22  Jenny A 30
22  Brian L 10

结果

ID  Name    Status  Items
16  Amy B   Closed  100
10  Erik C  Closed  80
14  Paul R  Closed  10
14  Sarah K Closed  10
17  Chris K Closed  40
19  Ali I   Closed  60
22  Jenny A Closed  30
22  Brian L Closed  10

2 个答案:

答案 0 :(得分:1)

看起来你在这里进行了一些合并,并优先考虑"项目"在数据框2中。

请尝试使用dplyr包和left_join()以及full_join()的代码。

加载数据......

df1 <- read.table(header=TRUE, stringsAsFactors = FALSE, text=
'ID      Name        Status  Items
16      Amy_B       Closed  100
10      Erik_C      Closed  80
14      Paul_R      Closed  20
17      Chris_K     Closed  40
19      Ali_I        Closed   60
22      Jenny_A     Closed  40')


df2 <- read.table(header = TRUE, stringsAsFactors = FALSE, text =
"ID  Name    Items
14  Paul_R  10
14  Sarah_K 10
22  Jenny_A 30
22  Brian_L 10")

合并表格

# add the status column to df2
df <- left_join(df2, df1 %>% select(ID, Status), by = 'ID')
# ID    Name Items Status
# 14  Paul_R    10 Closed
# 14 Sarah_K    10 Closed
# 22 Jenny_A    30 Closed
# 22 Brian_L    10 Closed

# combine both data frames by merging for both ID and Name
df <- full_join(df, df1, 
                by = c('ID', 'Name', 'Status'),
                suffix = c('.1', '.2'))
# ID    Name Items.1 Status Items.2
# 14  Paul_R      10 Closed      20
# 14 Sarah_K      10 Closed      NA
# 22 Jenny_A      30 Closed      40
# 22 Brian_L      10 Closed      NA
# 16   Amy_B      NA Closed     100
# 10  Erik_C      NA Closed      80
# 17 Chris_K      NA Closed      40
# 19   Ali_I      NA Closed      60

# create a new column which selects the df2 value if that exists, otherwise uses df1 value
df <- df %>% 
    mutate(Items = ifelse(is.na(Items.1), Items.2, Items.1)) %>% 
    select(-Items.1, -Items.2)
# ID    Name Status Items
# 14  Paul_R Closed    10
# 14 Sarah_K Closed    10
# 22 Jenny_A Closed    30
# 22 Brian_L Closed    10
# 16   Amy_B Closed   100
# 10  Erik_C Closed    80
# 17 Chris_K Closed    40
# 19   Ali_I Closed    60

全部放在一起......

left_join(df2, df1 %>% select(ID, Status), by = 'ID') %>%
full_join(df1,
          by = c('ID', 'Name', 'Status'), 
          suffix = c('.1', '.2')) %>% 
    mutate(Items = ifelse(is.na(Items.1), Items.2, Items.)) %>% 
    select(-Items.1, -Items.2)

将下表作为输出:

ID    Name Status Items
14  Paul_R Closed    10
14 Sarah_K Closed    10
22 Jenny_A Closed    30
22 Brian_L Closed    10
16   Amy_B Closed   100
10  Erik_C Closed    80
17 Chris_K Closed    40
19   Ali_I Closed    60

答案 1 :(得分:0)

假设您的实际数据与样本数据一样规则,您有冗余信息,重要信息是:

  • df1
  • 中ID的未分割项目数量
  • df2
  • 中的拆分项目金额
  • 状态,与df3
  • 中的ID相关联

我们所做的是首先将Status信息添加到df2(merge(df2,df1[c(1,3)])),然后我们rbind df1df2的相关项目信息}。

rbind(df1[!df1$ID%in% df2$ID,],merge(df2,df1[c(1,3)]))

#    ID    Name Status Items
# 1  16 Amy B   Closed   100
# 2  10 Erik C  Closed    80
# 4  17 Chris K Closed    40
# 5  19 Ali I   Closed    60
# 11 14 Paul R  Closed    10
# 21 14 Sarah K Closed    10
# 3  22 Jenny A Closed    30
# 41 22 Brian L Closed    10

数据

df1 <- read.table(text="ID      Name        Status  Items
16      'Amy B  '     Closed  100
10      'Erik C '     Closed  80
14      'Paul R '     Closed  20
17      'Chris K'     Closed  40
19      'Ali I  '      Closed   60
22      'Jenny A'     Closed  40",h=T,strin=F)

df2<- read.table(text="ID  Name    Items
14  'Paul R ' 10
14  'Sarah K' 10
22  'Jenny A' 30
22  'Brian L' 10",h=T,strin=F)