我有一个数据帧df1:
df1 <- data.frame(
Lot = c("13VC011","13VC018","13VC011A","13VC011B","13VC018A","13VC018C","13VC018B"),
Date = c("2013-07-12","2013-07-11","2013-07-13","2013-07-14","2013-07-16","2013-07-18","2013-07-19"),
Step = c("A","A","B","B","C","C","C"),
kg = c(31,32,14,16,10,11,10))
有时在某个特定的步骤&#39; a&#39; Lot&#39;按指示分为A,B或C.我想对这些进行求和并得到一个数据框,告诉我每个步骤的每一步的总公斤数。
例如,输出应如下所示:
df2 <- data.frame(
Lot = c("13VC011","13VC011","13VC018","13VC018"),
Step = c("A","B","A","C"),
kg = c(31,30,32,31))
所以有两个要求。如果&#39; Lot&#39;匹配,无论尾随字母和步骤匹配,然后总和发生。如果两个条件都不满足,那么只需将订单项按原样转入df2。
第2部分: 所以我想介绍一个第三个要求。在某些情况下,Lot分为两个或三个部分,但并非所有数据都存在。在这种情况下,使用这些解决方案可以掩盖这一点,并使一个批次的公斤显得比实际要低得多。
我想要做的是找到一种方法来指示数据集是否包含13VC011A,但没有13VC011B。或者,如果我们看到一个&#39; B&#39;但没有&#39; A&#39;或者是&#39; C&#39;但没有&#39; B&#39;或者&#39; A&#39;。
所以现在原始数据框是:
df1 <- data.frame(
Lot = c("13VC011","13VC018","13VC011A","13VC011B","13VC018A","13VC018C","13VC018B","13VC020B"),
Date = c("2013-07-12","2013-07-11","2013-07-13","2013-07-14","2013-07-16","2013-07-18","2013-07-19","2013-07-22"),
Step = c("A","A","B","B","C","C","C","B"),
kg = c(31,32,14,16,10,11,10,18))
结果df2应该类似于:
df2 <- data.frame(
Lot = c("13VC011","13VC011","13VC018","13VC018","13VC020B"),
Step = c("A","B","A","C","B"),
kg = c(31,30,32,31,18),
Partial = c(F,F,F,F,T))
答案 0 :(得分:2)
df1$Lot <- gsub("[[:alpha:]]$","",df1$Lot) #replace the character element at the end of string with `""`
aggregate(kg~Lot+Step,df1, FUN=sum)
# Lot Step kg
#1 13VC011 A 31
#2 13VC011 B 30
#3 13VC018 A 32
#4 13VC018 C 31
或使用dplyr
library(stringr)
library(dplyr)
df1%>%
group_by(Lot=str_extract(Lot,perl('.*\\d(?=[A-Z]?$)')), Step) %>%
summarize(kg=sum(kg))
#Source: local data frame [4 x 3]
#Groups: Lot
# Lot Step kg
#1 13VC011 A 31
#2 13VC011 B 30
#3 13VC018 A 32
#4 13VC018 C 31
regex
.*
:选择多个元素
\\d
:后跟数字
(?=[A-Z]?$)
:并预测字符元素或(?
)不在字符串的$
末尾。
`
答案 1 :(得分:1)
> aggregate(kg ~Lot + Step, data=df1, FUN=sum)
Lot Step kg
1 13VC011 A 31
2 13VC018 A 32
3 13VC011A B 14
4 13VC011B B 16
5 13VC018A C 10
6 13VC018B C 10
7 13VC018C C 11
在那一点上,我终于理解了你的意思&#34;无论后面的字母是什么&#34;并想知道聚合的公式方法是否可以用其中一个术语处理R函数:
> aggregate(kg ~substr(Lot,1,7) + Step, data=df1, FUN=sum)
substr(Lot, 1, 7) Step kg
1 13VC011 A 31
2 13VC018 A 32
3 13VC011 B 30
4 13VC018 C 31