Material DocDate Name Address Unit Price
1258486 3/17/2017 FEHLIG BROS BOX asd 8.95
1258486 5/11/2017 FEHLIG BROS BOX asd 9.5
1258486 12/11/2017 FEHLIG BROS_BOX asd 10.5
1250000 12/20/2017 Krones ALPHA afg 11.5
我有一个以上数据框。我需要像下面这样基于日期(3/17/2017)出现的框架。所以我需要下面的输出
Material Name/address/Unit Price
1258486 FEHLIG BROS BOX/asd/8.95/9.5/10.5
1250000 Krones/ALPHA/afg/11.5
答案 0 :(得分:1)
使用data.table
可以尝试
df <- read.table(stringsAsFactors = FALSE, header = TRUE,
text ="Material DocDate Name Address Unit Price
1258486 3/17/2017 FEHLIG BROS_BOX asd 8.95
1258486 5/11/2017 FEHLIG BROS_BOX asd 9.5
1258486 12/11/2017 FEHLIG BROS_BOX asd 10.5
1250000 12/20/2017 Krones ALPHA afg 11.5
")
df$DocDate <- as.Date(df$DocDate,'%m/%d/%Y')
library(data.table)
setDT(df)[,.(newVar = paste(Name, Address, Unit, paste(.SD$Price,collapse = "/"), sep = "/") )
,by = Material][,.(newVar = newVar[1]), Material]
#returns
Material newVar
1: 1258486 FEHLIG/BROS_BOX/asd/8.95/9.5/10.5
2: 1250000 Krones/ALPHA/afg/11.5
答案 1 :(得分:1)
这是使用dplyr
的替代方法。首先是示例数据:
data <- data.frame(stringsAsFactors=FALSE,
Material = c(1258486L, 1258486L),
DocDate = c("3/17/2017", "5/11/2017"),
Name = c("FEHLIG BROS BOX", "FEHLIG BROS BOX"),
Address = c("asd", "asd"),
Unit_Price = c(8.95, 9.5))
然后是获取您答案的一组步骤。 (顺便说一句,我相信,如果有多个Material
行共享相同的“最早日期”,那么到目前为止提供的所有解决方案都会为您提供多行输出。 Unit_Price == min(Unit_Price)
(如果在这里有合理的平局)。
filter
(编辑:代码中的固定错字)
答案 2 :(得分:0)
根据您对问题的更改进行的完整编辑:
# create example data (notice this differs slightly from your table above)
df <- read.csv(stringsAsFactors = FALSE, header = TRUE,
text ="Material, DocDate, Name, Address, UnitPrice
1258486, 3/17/2017, FEHLIG BROS BOX, asd, 8.95
1258486, 5/11/2017, FEHLIG BROS BOX, asd, 9.50
1258486, 12/11/2017, FEHLIG BROS_BOX, asd, 10.5
1250000, 12/20/2017, Krones ALPHA, afg, 11.5")
# let's use data.table
library(data.table)
df_orig <- as.data.table(df)
df_orig[ , DocDate := as.Date(DocDate,format="%m/%d/%Y")][order(DocDate)]
# create one string per Name-Material pair
df_intermed <- df_orig[ , .(newvar = paste(Name[1], Address[1], paste(UnitPrice, collapse="/"), sep="/")), by=.(Material, Name)]
# aggregate those strings across Names, so one row per Material
df_final <- df_intermed[ , .(newvar = paste(newvar, collapse=",")), by=Material]