基于唯一值列表标记数据帧交集[R]

时间:2017-01-10 22:55:03

标签: r dataframe

我有以下数据框:

df1(原始数据)

  X   |   Y   |    Z   |
apple | rest  | town   |
town  | map   | guide  |
rest  | full  | down   |

df2(相同的列名,但在左侧添加了唯一值

uniquevalue |  X   |   Y   |    Z   |
apple       |  0   |   0   |    0   |
rest        |  0   |   0   |    0   |
town        |  0   |   0   |    0   |
map         |  0   |   0   |    0   |
guide       |  0   |   0   |    0   |
full        |  0   |   0   |    0   |
down        |  0   |   0   |    0   |

我想创建以下内容,如果原始数据中存在交集,则检查每个组合为1。

uniquevalue |  X   |   Y   |    Z   |
apple       |  1   |   0   |    0   |
rest        |  1   |   1   |    0   |
town        |  1   |   0   |    1   |
map         |  0   |   1   |    0   |
guide       |  0   |   0   |    1   |
full        |  0   |   1   |    0   |
down        |  0   |   0   |    1   |

3 个答案:

答案 0 :(得分:2)

使用is.element,这与使用%in%相同:

unqval <- unique(unlist(df1))
data.frame(unqval, sapply(df1, is.element, el=unqval)+0)
#  unqval X Y Z
#1  apple 1 0 0
#2   town 1 0 1
#3   rest 1 1 0
#4    map 0 1 0
#5   full 0 1 0
#6  guide 0 0 1
#7   down 0 0 1

答案 1 :(得分:2)

以下是来自mtabulate

qdapTools选项
library(qdapTools)
t(mtabulate(df1))
#      X Y Z
#apple 1 0 0
#down  0 0 1
#full  0 1 0
#guide 0 0 1
#map   0 1 0
#rest  1 1 0
#town  1 0 1

答案 2 :(得分:1)

这可能会创造一个史上最丑陋的答案,但这里有......

df1 <- read.table(text = "X|Y| Z   
apple|rest|town
town|map|guide
rest|full|down", header = TRUE, sep = "|")


library(reshape2)
library(tidyr)
library(dplyr)
gathered <- gather(df1, value = uniquevalue)

casted <- dcast(gathered, uniquevalue ~ key)

final <- casted %>%
  mutate_at(c("X", "Y", "Z"), .funs = function(m) 1 * !is.na(m)) %>%
  group_by(uniquevalue) %>%
  summarise_all(.funs = sum)

> final
# A tibble: 7 × 4
  uniquevalue     X     Y     Z
        <chr> <dbl> <dbl> <dbl>
1       apple     1     0     0
2        down     0     0     1
3        full     0     1     0
4       guide     0     0     1
5         map     0     1     0
6        rest     1     1     0
7        town     1     0     1

为了解释,gather调用会产生此数据框。

> gathered
  key uniquevalue
1   X       apple
2   X        town
3   X        rest
4   Y        rest
5   Y         map
6   Y        full
7   Z        town
8   Z       guide
9   Z        down

我们然后dcast进入以下结果:

> casted
  uniquevalue     X    Y     Z
1       apple apple <NA>  <NA>
2        down  <NA> <NA>  down
3        full  <NA> full  <NA>
4       guide  <NA> <NA> guide
5         map  <NA>  map  <NA>
6        rest  rest rest  <NA>
7        town  town <NA>  town

最后一次调用只是用0替换缺失的值,用1替换非缺失值,然后将uniquevalue列加起来/ squishes。我们得到:

> final
# A tibble: 7 × 4
  uniquevalue     X     Y     Z
        <chr> <dbl> <dbl> <dbl>
1       apple     1     0     0
2        down     0     0     1
3        full     0     1     0
4       guide     0     0     1
5         map     0     1     0
6        rest     1     1     0
7        town     1     0     1