我有一个名为data.frame
的{{1}},只有一个名为transactions
的字段,因此第i行由一个带有第i个交易项的向量组成,它看起来像这样: / p>
items
我想将其转换为二进制矩阵,以便每个元素都说明如果已为给定交易购买了给定对象,则它应如下所示:
> head(transactions)
items
1 Cake, Fudge
2 Coffee, Tea
3 Coffee, Choco, Tea
4 Coffee
5 Bread, Muffin, Jam
6 Coffee
我找不到没有阴暗的嵌套for循环的方法。这都是从 Cake Fudge Coffee Tea Choco Bread Muffin Jam
1 1 1 0 0 0 0 0 0
2 0 0 1 1 0 0 0 0
3 0 0 1 1 1 0 0 0
4 0 0 1 0 0 0 0 0
5 0 0 0 0 0 1 1 1
6 0 0 1 0 0 0 0 0
包中申请apriori
的全部内容,如果你们中的任何一个可以帮助我的话,将不胜感激。
谢谢!
答案 0 :(得分:3)
我们可以创建新的列以将每一行(row
)和要代表的值(如果存在的值是1(spread_value
)进行分组。我们使用separate_rows
将每个逗号分隔的值分成单独的行。然后,我们spread
的值从长到宽,如果没有值,我们将fill
设为0。
library(tidyverse)
df %>%
mutate(row = row_number(), spread_value = 1) %>%
separate_rows(items, sep = ",") %>%
mutate(items = trimws(items)) %>%
spread(items, spread_value, fill = 0) %>%
select(-row)
# Bread Cake Choco Coffee Fudge Jam Muffin Tea
#1 0 1 0 0 1 0 0 0
#2 0 0 0 1 0 0 0 1
#3 0 0 1 1 0 0 0 1
#4 0 0 0 1 0 0 0 0
#5 1 0 0 0 0 1 1 0
#6 0 0 0 1 0 0 0 0
答案 1 :(得分:2)
有splitstackshape
中的cSplit_e
函数。
df1 <- splitstackshape::cSplit_e(
data = df,
split.col = "items",
sep = ", ",
mode = "binary",
fixed = TRUE,
type = "character",
fill = 0L,
drop = TRUE
)
names(df1) <- sub("^items_", "", names(df1))
df1
# Bread Cake Choco Coffee Fudge Jam Muffin Tea
#1 0 1 0 0 1 0 0 0
#2 0 0 0 1 0 0 0 1
#3 0 0 1 1 0 0 0 1
#4 0 0 0 1 0 0 0 0
#5 1 0 0 0 0 1 1 0
#6 0 0 0 1 0 0 0 0
数据
df <- structure(list(items = c("Cake, Fudge", "Coffee, Tea", "Coffee, Choco, Tea",
"Coffee", "Bread, Muffin, Jam", "Coffee")), .Names = "items", class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
答案 2 :(得分:0)
一个非dplyr的选择:
library(magrittr)
library(stringr)
uniq_words <- df[["items"]] %>%
strsplit(", ") %>%
unlist() %>%
unique()
sol <- outer(df[["items"]], uniq_words, str_detect) * 1L
colnames(sol) <- uniq_words
sol
Cake Fudge Coffee Tea Choco Bread Muffin Jam
[1,] 1 1 0 0 0 0 0 0
[2,] 0 0 1 1 0 0 0 0
[3,] 0 0 1 1 1 0 0 0
[4,] 0 0 1 0 0 0 0 0
[5,] 0 0 0 0 0 1 1 1
[6,] 0 0 1 0 0 0 0 0
数据
df <- data.frame(
items = c(
"Cake, Fudge", "Coffee, Tea", "Coffee, Choco, Tea",
"Coffee", "Bread, Muffin, Jam", "Coffee"
),
stringsAsFactors = FALSE
)