我有以下格式的数据:
|id|genre1|genre2 |genre3 |
|1 |action|comedy |romance|
|2 |comedy|romance| |
|3 |romance| | |
我想将我的数据转换为以下格式:
|id|action|comedy|romance|
|1 |1 |1 |1 |
|2 |0 |1 |1 |
|3 |0 |0 |1 |
这样做的最佳方式是什么?
答案 0 :(得分:2)
假设空元素是空字符串(即它们不包含空格),您可以先用NA
替换这些元素,然后使用 reshape2 包重新整形数据。 / p>
is.na(df) <- df == ""
library(reshape2)
dcast(melt(df, 1, na.rm = TRUE), id ~ value, length)
# id action comedy romance
# 1 1 1 1 1
# 2 2 0 1 1
# 3 3 0 0 1
或者是一个有趣的单行,保持原始数据不变。
dcast(melt(replace(df, df == "", NA), 1, na.rm = TRUE), id ~ value, length)
# id action comedy romance
# 1 1 1 1 1
# 2 2 0 1 1
# 3 3 0 0 1
使用的原始数据:
df <- structure(list(id = 1:3, genre1 = c("action", "comedy", "romance"
), genre2 = c("comedy", "romance", ""), genre3 = c("romance",
"", "")), .Names = c("id", "genre1", "genre2", "genre3"), class = "data.frame", row.names = c(NA,
-3L))
答案 1 :(得分:1)
您可以使用重塑。
library(dplyr)
library(tidyr)
df %>%
gather(number, genre, genre1:genre3) %>%
filter(genre != "") %>%
select(-number) %>%
mutate(one = 1) %>%
spread(genre, one, fill = 0)
答案 2 :(得分:1)
使用基数R,您可以使用reshape
和table
:
mydf <-data.frame(id=1:3,
genre1=c("action","comedy","romance"),
genre2=c("comedy","romance",NA),
genre3=c("romance",NA,NA))
colnames(mydf)[2:4] <- paste0("genre.",colnames(mydf)[2:4])
m_data <- reshape(mydf,direction="long", varying=2:4)
with(m_data, table(id, genre))
genre
id action comedy romance
1 1 1 1
2 0 1 1
3 0 0 1