我有一个交易数据集,我想根据客户ID对其进行转换。样品如下。
CustomerID Description
17850 WHITE HANGING HEART T-LIGHT HOLDER
17850 WHITE METAL LANTERN
13047 ASSORTED COLOUR BIRD ORNAMENT
13047 POPPY'S PLAYHOUSE BEDROOM
13047 POPPY'S PLAYHOUSE KITCHEN
我希望这个数据集按以下顺序排列: -
17850 WHITE HANGING HEART T-LIGHT HOLDER, WHITE METAL LANTERN
13047 ASSORTED COLOUR BIRD ORNAMENT,POPPY'S PLAYHOUSE BEDROOM, POPPY'S PLAYHOUSE KITCHEN
数据集采用csv格式,每个值都在单独的单元格中。 任何人都可以建议任何方法在excel或R或python中执行此操作吗?
答案 0 :(得分:0)
在Python中,您可以使用pandas。
安装它,然后尝试
import pandas as pd
# Read the cvs file
df = pd.read_csv('yourFileName.csv')
# Group by CustomerID and join Descriptions with commas
df.groupby('CustomerID')['Description'].apply(','.join)
# Save the result in cvs file
df.to_csv('resultFileName.csv', index=False)
答案 1 :(得分:0)
您可以使用aggregate()
功能,创建我自己的数据,您可以为上面的数据框执行此操作。根据{{1}}号码,Customer
被连接
Texts
答案 2 :(得分:0)
其他方法包括使用plyr
和data.table
。 data.table可能更高效,更简单,并提供控制。
library(plyr)
ddply(df, .(ID), summarize, Text = paste(Text, collapse = ","))
或
require(DT)
DT <- data.table(df)
# group the table by ID and then add a new column by pasting the list
# of values in each group together.
DT[, list(Text = paste(Text, collapse = ",")), by = ID]
ID Text
1: 17850 WHITE HANGING HEART T-LIGHT HOLDER,WHITE METAL LANTERN
2: 13047 ASSORTED COLOUR BIRD ORNAMENT,POPPY'S PLAYHOUSE BEDROOM, POPPY'S PLAYHOUSE KITCHEN
数据
df <- data.frame(ID = c(17850,17850,13047,13047,13047),
Text = c("WHITE HANGING HEART T-LIGHT HOLDER","WHITE METAL LANTERN",
" ASSORTED COLOUR BIRD ORNAMENT","POPPY'S PLAYHOUSE BEDROOM",
" POPPY'S PLAYHOUSE KITCHEN"))