ID | GroupID | Sales | Date
v | a1 | 1 | 2011
w | a1 | 3 | 2010
x | b1 | 8 | 2007
y | b1 | 3 | 2006
z | c3 | 2 | 2006
GroupID | numIDs | MaxSales
a1 | 2 | 3
b1 | 2 | 8
c3 | 1 | 2
对于df_groupIDs,我想获得该组中最大销售额的事件的ID和日期。所以组" a1"在df_workingFile中有2个事件," v"和" w"。我想确定那个事件" w"具有Max销售额并将其信息输入df_groupIDs。最终输出应如下所示:
GroupID | numIDs | MaxSales | ID | Date
a1 | 2 | 3 | w | 2010
b1 | 2 | 8 | x | 2007
c3 | 1 | 2 | z | 2006
i = 1
for (groupID in df_groupIDs$groupID) {
groupEvents <- subset(df_workingFile, df_workingFile$groupID == groupID)
index <- match(df_groupIDs$maxSales[i], groupEvents$Sales)
df_groupIDs$ID[i] = groupEvents$ID[index]
df_groupIDs$Date[i] = groupEvents$Date[index]
i = i+1
答案 0 :(得分:4)
df_workingFile %>%
group_by(GroupID) %>% # for each group id
arrange(desc(Sales)) %>% # sort by Sales (descending)
slice(1) %>% # keep the top row
inner_join(df_groupIDs) # join to df_groupIDs
select(GroupID, numIDs, MaxSales, ID, Date)
# keep the columns you want in the order you want
另一个更简单的方法, if Sales
inner_join(df_groupIDs, df_workingFile,
by = c("GroupID" = "GroupID", "MaxSales" = "Sales"))
答案 1 :(得分:1)
sqldf("select g.GroupID, g.numIDs, max(w.Sales) MaxSales, w.ID, w.Date
from df_groupIDs g left join df_workingFile w using(GroupID)
group by GroupID")
GroupID numIDs MaxSales ID Date
1 a1 2 3 w 2010
2 b1 2 8 x 2007
3 c3 1 2 z 2006
Lines1 <- "
ID | GroupID | Sales | Date
v | a1 | 1 | 2011
w | a1 | 3 | 2010
x | b1 | 8 | 2007
y | b1 | 3 | 2006
z | c3 | 2 | 2006"
df_workingFile <- read.table(text = Lines1, header = TRUE, sep = "|", strip.white = TRUE)
Lines2 <- "
GroupID | numIDs | MaxSales
a1 | 2 | 3
b1 | 2 | 8
c3 | 1 | 2"
df_groupIDs <- read.table(text = Lines2, header = TRUE, sep = "|", strip.white = TRUE)