Question

我正在寻找一些想法，以更好地说明分类变量之间的关系。

以下是可复制数据：

t1 <- data.frame(A = c("Apple", "Rose, Apple", "Country"), 
                 B = c("Fruit", "Plant", "Peru, Japan"))

输出

            A           B
1       Apple       Fruit
2 Rose, Apple       Plant
3     Country Peru, Japan

您可以看到Apple与水果和植物有关。是否有很好的图形解决方案以热图格式彩色显示各个变量？

Answer 1

我会想到这样的东西：

library(data.table)

dt <- data.table(type = as.factor(c("Apple", "Rose", "Apple", "Rose", "Apple")),
                 type2 = as.factor(c("Fruit", "Plant", "Plant", "Tree", "Tree")))

首先，我们获得了具有不同组合的表格：

dt 
    type type2
1: Apple Fruit
2:  Rose Plant
3: Apple Plant
4:  Rose  Tree
5: Apple  Tree

然后我们获得了一些统计信息（计数和相对百分比）：

dt2 <- dt[ , .(count = .N), by = .(type, type2)]

dt2[ , percentage.count := count / sum(count) * 100 , by = "type"]

dt2

    type type2 count percentage.count
1: Apple Fruit     1         33.33333
2:  Rose Plant     1         50.00000
3: Apple Plant     1         33.33333
4:  Rose  Tree     1         50.00000
5: Apple  Tree     1         33.33333

在这里我们可以看到apple与1/3的时间有Fruit，1/3的时间与Plant和1/3的时间有关与Tree。

可以这样绘制：

ggplot(data = dt2,
       aes(x = type, fill = type2)) +
  geom_bar(position = "fill")

这就像有一个“饼”，即有多少行具有相同的type-type2组合，但是至少可以看出哪些类型比其他类型更相关。 / p>

关系类别变量的图形表示

1 个答案: