根据部分匹配和其他规则组合或汇总行

时间:2014-07-23 05:07:35

标签: r

我有一个数据帧df1:

df1 <- data.frame(
  Lot = c("13VC011","13VC018","13VC011A","13VC011B","13VC018A","13VC018C","13VC018B"),
  Date = c("2013-07-12","2013-07-11","2013-07-13","2013-07-14","2013-07-16","2013-07-18","2013-07-19"),
  Step = c("A","A","B","B","C","C","C"),
  kg = c(31,32,14,16,10,11,10))

有时在某个特定的步骤&#39; a&#39; Lot&#39;按指示分为A,B或C.我想对这些进行求和并得到一个数据框,告诉我每个步骤的每一步的总公斤数。

例如,输出应如下所示:

df2 <- data.frame(
  Lot = c("13VC011","13VC011","13VC018","13VC018"),
  Step = c("A","B","A","C"),
  kg = c(31,30,32,31))

所以有两个要求。如果&#39; Lot&#39;匹配,无论尾随字母和步骤匹配,然后总和发生。如果两个条件都不满足,那么只需将订单项按原样转入df2。


第2部分: 所以我想介绍一个第三个要求。在某些情况下,Lot分为两个或三个部分,但并非所有数据都存在。在这种情况下,使用这些解决方案可以掩盖这一点,并使一个批次的公斤显得比实际要低得多。

我想要做的是找到一种方法来指示数据集是否包含13VC011A,但没有13VC011B。或者,如果我们看到一个&#39; B&#39;但没有&#39; A&#39;或者是&#39; C&#39;但没有&#39; B&#39;或者&#39; A&#39;。

所以现在原始数据框是:

df1 <- data.frame(
  Lot = c("13VC011","13VC018","13VC011A","13VC011B","13VC018A","13VC018C","13VC018B","13VC020B"),
  Date = c("2013-07-12","2013-07-11","2013-07-13","2013-07-14","2013-07-16","2013-07-18","2013-07-19","2013-07-22"),
  Step = c("A","A","B","B","C","C","C","B"),
  kg = c(31,32,14,16,10,11,10,18))

结果df2应该类似于:

df2 <- data.frame(
  Lot = c("13VC011","13VC011","13VC018","13VC018","13VC020B"),
  Step = c("A","B","A","C","B"),
  kg = c(31,30,32,31,18),
  Partial = c(F,F,F,F,T))

2 个答案:

答案 0 :(得分:2)

   df1$Lot <- gsub("[[:alpha:]]$","",df1$Lot) #replace the character element at the end of string with `""`
   aggregate(kg~Lot+Step,df1, FUN=sum)
   #    Lot Step kg
 #1 13VC011    A 31
 #2 13VC011    B 30
 #3 13VC018    A 32
 #4 13VC018    C 31

或使用dplyr

  library(stringr)
  library(dplyr)
  df1%>%
  group_by(Lot=str_extract(Lot,perl('.*\\d(?=[A-Z]?$)')), Step) %>%
  summarize(kg=sum(kg))
   #Source: local data frame [4 x 3]
   #Groups: Lot

   #     Lot Step kg
   #1 13VC011    A 31
   #2 13VC011    B 30
   #3 13VC018    A 32
   #4 13VC018    C 31

解释

regex

.*:选择多个元素

\\d:后跟数字

(?=[A-Z]?$):并预测字符元素或(?)不在字符串的$末尾。

`

答案 1 :(得分:1)

> aggregate(kg ~Lot + Step, data=df1, FUN=sum)
       Lot Step kg
1  13VC011    A 31
2  13VC018    A 32
3 13VC011A    B 14
4 13VC011B    B 16
5 13VC018A    C 10
6 13VC018B    C 10
7 13VC018C    C 11

在那一点上,我终于理解了你的意思&#34;无论后面的字母是什么&#34;并想知道聚合的公式方法是否可以用其中一个术语处理R函数:

> aggregate(kg ~substr(Lot,1,7) + Step, data=df1, FUN=sum)
  substr(Lot, 1, 7) Step kg
1           13VC011    A 31
2           13VC018    A 32
3           13VC011    B 30
4           13VC018    C 31