我有以下数据框:
df1(原始数据)
X | Y | Z |
apple | rest | town |
town | map | guide |
rest | full | down |
df2(相同的列名,但在左侧添加了唯一值
uniquevalue | X | Y | Z |
apple | 0 | 0 | 0 |
rest | 0 | 0 | 0 |
town | 0 | 0 | 0 |
map | 0 | 0 | 0 |
guide | 0 | 0 | 0 |
full | 0 | 0 | 0 |
down | 0 | 0 | 0 |
我想创建以下内容,如果原始数据中存在交集,则检查每个组合为1。
uniquevalue | X | Y | Z |
apple | 1 | 0 | 0 |
rest | 1 | 1 | 0 |
town | 1 | 0 | 1 |
map | 0 | 1 | 0 |
guide | 0 | 0 | 1 |
full | 0 | 1 | 0 |
down | 0 | 0 | 1 |
答案 0 :(得分:2)
使用is.element
,这与使用%in%
相同:
unqval <- unique(unlist(df1))
data.frame(unqval, sapply(df1, is.element, el=unqval)+0)
# unqval X Y Z
#1 apple 1 0 0
#2 town 1 0 1
#3 rest 1 1 0
#4 map 0 1 0
#5 full 0 1 0
#6 guide 0 0 1
#7 down 0 0 1
答案 1 :(得分:2)
以下是来自mtabulate
qdapTools
选项
library(qdapTools)
t(mtabulate(df1))
# X Y Z
#apple 1 0 0
#down 0 0 1
#full 0 1 0
#guide 0 0 1
#map 0 1 0
#rest 1 1 0
#town 1 0 1
答案 2 :(得分:1)
这可能会创造一个史上最丑陋的答案,但这里有......
df1 <- read.table(text = "X|Y| Z
apple|rest|town
town|map|guide
rest|full|down", header = TRUE, sep = "|")
library(reshape2)
library(tidyr)
library(dplyr)
gathered <- gather(df1, value = uniquevalue)
casted <- dcast(gathered, uniquevalue ~ key)
final <- casted %>%
mutate_at(c("X", "Y", "Z"), .funs = function(m) 1 * !is.na(m)) %>%
group_by(uniquevalue) %>%
summarise_all(.funs = sum)
> final
# A tibble: 7 × 4
uniquevalue X Y Z
<chr> <dbl> <dbl> <dbl>
1 apple 1 0 0
2 down 0 0 1
3 full 0 1 0
4 guide 0 0 1
5 map 0 1 0
6 rest 1 1 0
7 town 1 0 1
为了解释,gather
调用会产生此数据框。
> gathered
key uniquevalue
1 X apple
2 X town
3 X rest
4 Y rest
5 Y map
6 Y full
7 Z town
8 Z guide
9 Z down
我们然后dcast
进入以下结果:
> casted
uniquevalue X Y Z
1 apple apple <NA> <NA>
2 down <NA> <NA> down
3 full <NA> full <NA>
4 guide <NA> <NA> guide
5 map <NA> map <NA>
6 rest rest rest <NA>
7 town town <NA> town
最后一次调用只是用0
替换缺失的值,用1
替换非缺失值,然后将uniquevalue
列加起来/ squishes。我们得到:
> final
# A tibble: 7 × 4
uniquevalue X Y Z
<chr> <dbl> <dbl> <dbl>
1 apple 1 0 0
2 down 0 0 1
3 full 0 1 0
4 guide 0 0 1
5 map 0 1 0
6 rest 1 1 0
7 town 1 0 1