我正在尝试转换行(具有一个键,但是由于更改期开始日期和更改期结束日期发生多次更改而被复制)。我认为将它们转换为行将删除重复的值。我尝试使用Python进行数据透视,但由于值将是date列,因此无法执行任何操作。
这是我所拥有的:
PS-我有数百万个订单的记录。我需要一种可以使其自动化的解决方案。
答案 0 :(得分:1)
Python解决方案:
import pandas as pd
df = pd.DataFrame({"Change Period Start":["2/2/2019", "2/2/2019", "2/2/2019", "9/11/2019"],
"Change Period End":["9/11/2019", "9/11/2019", "5/5/2019", "9/11/2019"],
"Change Period Supplier":["1/1/2020", "1/1/2020", "1/1/2025", "9/11/2019"]})
df.drop_duplicates(subset=['Change Period Supplier'])
Change Period Start Change Period End Change Period Supplier
2/2/2019 9/11/2019 1/1/2020
2/2/2019 5/5/2019 1/1/2025
9/11/2019 9/11/2019 9/11/2019
R解决方案:
Change.Period.Start <- c("2/2/2019", "2/2/2019", "2/2/2019", "9/11/2019")
Change.Period.End <- c("9/11/2019", "9/11/2019", "5/5/2019", "9/11/2019")
Change.Period.Supplier <- c("1/1/2020", "1/1/2020", "1/1/2025", "9/11/2019")
df = data.frame(Change.Period.Start, Change.Period.End, Change.Period.Supplier)
df[!duplicated(df$Change.Period.Supplier), ]
Change.Period.Start Change.Period.End Change.Period.Supplier
1 2/2/2019 9/11/2019 1/1/2020
3 2/2/2019 5/5/2019 1/1/2025
4 9/11/2019 9/11/2019 9/11/2019
根据OP的评论更新了R版本
GR.Key <- c("A", "A", "A", "B")
Change.Period.Start <- c("2/2/2019", "2/2/2019", "2/2/2019", "9/11/2019")
Change.Period.End <- c("9/11/2019", "9/11/2019", "5/5/2019", "9/11/2019")
Change.Period.Supplier <- c("1/1/2020", "1/1/2020", "1/1/2025", "9/11/2019")
df = data.frame(GR.Key, Change.Period.Start, Change.Period.End, Change.Period.Supplier)
library(data.table)
dcast(df, GR.Key ~ paste0("Change.Period.Start", rowid(GR.Key)), value.var = "Change.Period.Start")
GR.Key Change.Period.Start1 Change.Period.Start2 Change.Period.Start3
1 A 2/2/2019 2/2/2019 2/2/2019
2 B 9/11/2019 <NA> <NA>