我正在使用协同过滤构建产品推荐引擎(在R中)。为了使更多有利可图的项目位于建议的顶部,我们开发了一个灵活业务规则,如图1所示。业务规则应用于对推荐器输出进行排序。
+---------------+----------+-----------------+
| Sort Priority | Level 1 | Level 2 |
+---------------+----------+-----------------+
| 1 | Brand | Versatile Foods |
+---------------+----------+-----------------+
| | | Agro |
+---------------+----------+-----------------+
| | | Specialty Foods |
+---------------+----------+-----------------+
| | | |
+---------------+----------+-----------------+
| 2 | Category | Dairy |
+---------------+----------+-----------------+
| | | Produce |
+---------------+----------+-----------------+
| | | Seafood |
+---------------+----------+-----------------+
| | | |
+---------------+----------+-----------------+
| 3 | Seasonal | Y |
+---------------+----------+-----------------+
| | | N |
+---------------+----------+-----------------+
figure 1
业务规则: 对表格进行排序时,“品牌”列应优先于 应优先于季节性的类别。这由列排序优先级的值决定。
在品牌专栏中进行分类时,Versatile Foods优先于Agro和Agro 特色食品。 如果未显示品牌列中的值 在规则中,值必须按字母顺序排序。
相同的排序逻辑应该应用于规则定义中的每个条目。
随着推荐算法的发展。可以更改/编辑业务规则以使其具有更少或更多级别。对于例如未来可能会添加一个额外的level1条目,比如Type(Kosher,Vegan,Halal)等。规则将如下所示:
+---------------+----------+-----------------+
| Sort Priority | Level 1 | Level 2 |
+---------------+----------+-----------------+
| 1 | Brand | Versatile Foods |
+---------------+----------+-----------------+
| | | Agro |
+---------------+----------+-----------------+
| | | Specialty Foods |
+---------------+----------+-----------------+
| | | |
+---------------+----------+-----------------+
| 2 | Category | Dairy |
+---------------+----------+-----------------+
| | | Produce |
+---------------+----------+-----------------+
| | | Seafood |
+---------------+----------+-----------------+
| | | |
+---------------+----------+-----------------+
| 3 | Type | Kosher |
+---------------+----------+-----------------+
| | | Halal |
+---------------+----------+-----------------+
| | | Vegan |
+---------------+----------+-----------------+
| | | |
+---------------+----------+-----------------+
| 4 | Seasonal | Y |
+---------------+----------+-----------------+
| | | N |
+---------------+----------+-----------------+
figure 2
我需要帮助在R中构建一个脚本,它将按上述业务规则对上表(加载到数据帧)进行排序。 我想要解决的真正问题是,每次向规则添加新条目时,我都不想更改代码。
输入数据(由推荐引擎输出)将是这种类型(图3)。
+-----+-----------------+----------+----------+
| SKU | Brand | Category | Seasonal |
+-----+-----------------+----------+----------+
| 1 | Versatile Foods | Dairy | Y |
+-----+-----------------+----------+----------+
| 2 | Agro | Produce | Y |
+-----+-----------------+----------+----------+
| 3 | Specialty Foods | Seafood | N |
+-----+-----------------+----------+----------+
| 4 | Agro | Produce | N |
+-----+-----------------+----------+----------+
| 5 | Specialty Foods | Organic | Y |
+-----+-----------------+----------+----------+
| 6 | Agro | Meat | N |
+-----+-----------------+----------+----------+
| 7 | Versatile Foods | Seafood | N |
+-----+-----------------+----------+----------+
| 8 | USA Bread | Bakery | Y |
+-----+-----------------+----------+----------+
| 9 | Specialty Foods | Seafood | N |
+-----+-----------------+----------+----------+
| 10 | Versatile Foods | Seafood | N |
+-----+-----------------+----------+----------+
figure 3
使用如图1中的规则定义,脚本的输出应该如下表所示。
请注意Brand = USA业务规则中没有出现的面包如何放在排序列表的底部。
另外,对于第4项和第6项,记录类别='生产'被列在记录的上方,类别='肉类'作为条目'肉类'没有在商业规则中找到但是“生产”#39;是。
+-----+-----------------+----------+----------+
| SKU | Brand | Category | Seasonal |
+-----+-----------------+----------+----------+
| 1 | Versatile Foods | Dairy | Y |
+-----+-----------------+----------+----------+
| 7 | Versatile Foods | Seafood | N |
+-----+-----------------+----------+----------+
| 10 | Versatile Foods | Seafood | N |
+-----+-----------------+----------+----------+
| 2 | Agro | Produce | Y |
+-----+-----------------+----------+----------+
| 4 | Agro | Produce | N |
+-----+-----------------+----------+----------+
| 6 | Agro | Meat | N |
+-----+-----------------+----------+----------+
| 3 | Specialty Foods | Seafood | N |
+-----+-----------------+----------+----------+
| 9 | Specialty Foods | Seafood | N |
+-----+-----------------+----------+----------+
| 5 | Specialty Foods | Organic | Y |
+-----+-----------------+----------+----------+
| 8 | USA bread | Bakery | Y |
+-----+-----------------+----------+----------+
figure 4
答案 0 :(得分:0)
您可以使用因子编码来订购您想要的东西。例如:
> lvl <- c('Versatile Foods', 'Agro', 'Specialty Foods')
> lvl <- append(lvl, sort(setdiff(unique(df$Brand), lvl)))
>
> df$Brand <- factor(df$Brand, levels=lvl)
>
> lvl <- c("Dairy", "Produce", "Seafood")
> lvl <- append(lvl, sort(setdiff(unique(df$Category), lvl)))
>
> df$Category <- factor(df$Category, levels=lvl)
>
> df$Seasonal <- factor(df$Seasonal, levels=c('Y', 'N'))
>
>
> df[order(df$Brand, df$Category, df$Seasonal), ]
SKU Brand Category Seasonal
1 1 Versatile Foods Dairy Y
7 7 Versatile Foods Seafood N
10 10 Versatile Foods Seafood N
2 2 Agro Produce Y
4 4 Agro Produce N
6 6 Agro Produce N
3 3 Specialty Foods Seafood N
9 9 Specialty Foods Seafood N
5 5 Specialty Foods Organic Y
8 8 USA Bread Bakery Y
答案 1 :(得分:0)
此方法涉及定义排序排名表,然后在与主表合并后使用新列执行排序。
library(dplyr)
rank <- data_frame(Brand = c('Versatile Foods','Agro','Specialty Foods'),
Brand_rank = c(1,2,3))
df <- left_join(df, rank, on="Brand") %>%
arrange(Brand_rank, Brand, Category, Seasonal) %>%
select(-Brand_rank)
df
# A tibble: 10 × 4
# SKU Brand Category Seasonal
# <dbl> <chr> <chr> <chr>
#1 1 Versatile Foods Dairy Y
#2 7 Versatile Foods Seafood N
#3 10 Versatile Foods Seafood N
#4 4 Agro Produce N
#5 6 Agro Produce N
#6 2 Agro Produce Y
#7 5 Specialty Foods Organic Y
#8 3 Specialty Foods Seafood N
#9 9 Specialty Foods Seafood N
#10 8 USA Bread Bakery Y