Question

我有一个项目列表和购买次数。大多数这些项目是多个类别的一部分。有些是一个类别的一部分，另外两个，有的超过两个。

现在，我希望每个类别产生一个raking，显示转换后的项目，无论一个项目出现在多个类别中。那很好。

同一字符串中的类别由>字符串分隔。

ItemId           Category                        PCC
5063660193       Go to Gifts                     2
24154563660193   Go to Gifts&gt;All Gifts        1

我希望它成为：

ItemId          Category      PCC
5063660193      Go to Gifts   2
24154563660193  Go to Gifts   1
24154563660193  All Gifts     1

然后，只需要在SQL中使用rank() over()函数对它们进行排名。如果在SQL中这是不可能的，我可以使用R ..也许在这种情况下reshape函数可能会派上用场。

Answer 1

以下是使用R -

的解决方案

# Your data
df <- read.table(text="ItemId           Category                        PCC
             5063660193       'Go to Gifts'                     2
             24154563660193   'Go to Gifts&gt;All Gifts'        1",
            header=T, stringsAsFactors=FALSE)


# Split Category at each "&gt;"
s <- strsplit(df$Category , "&gt;")


# Get length of each split string :length 1 if there was no "&gt;" 
l <- lapply(s , length)

# Repeat the rows where there are "&gt;"
new.df <- df[ rep(1:nrow(df) , l) , ]

# Split Category into its the compnients seperated by "&gt;"
new.df$Category <- unlist(s)

Answer 2

这是使用数据表的一种稍微简单的方式。

library(data.table)
dt <- data.table(df)
result <- dt[,strsplit(as.character(Category),"&gt;"),by=list(ItemId,PCC)]
setnames(result,"V1","Category")
result
#            ItemId                    PCC           Category
# 1:     5063660193                      2        Go to Gifts
# 2: 24154563660193                      1        Go to Gifts
# 3: 24154563660193                      1          All Gifts

将字符串拆分为多行，以生成数据集以进一步按类别排序

2 个答案: