可视化两组数据之间的关联

时间:2009-09-21 22:26:01

标签: r

每个数据点都有A和B的配对,并且A中有多个条目,B中有多个条目.IE多个校正子和多个诊断,尽管每个数据点都有一个单一的综合症诊断对。

非常感谢的示例,建议或想法

这是数据的样子。我希望看到A和B的值之间的联系(有多少GG与TT相关联等)。两者都是标称数据类型。

ID,A ,B 
1,GG,TT
2,AA,SS
3,BB,XX
4,DD,SS
5,DD,TT
6,CC,XX
7,HH,ZZ
8,AA,TT
9,CC,RR
10,DD,ZZ
11,AA,XX
12,AA,TT
13,DD,SS
14,DD,XX
15,AA,YY
16,CC,ZZ
17,FF,SS
18,FF,XX
19,BB,VV
20,GG,VV
21,GG,SS
22,AA,RR
23,AA,TT
24,AA,SS
25,CC,VV
26,CC,TT
27,FF,RR
28,GG,UU
29,CC,TT
30,BB,ZZ
31,II,TT
32,FF,RR
33,BB,SS
34,GG,YY
35,FF,RR
36,BB,VV
37,II,RR
38,CC,YY
39,FF,VV
40,AA,XX
41,AA,ZZ
42,GG,VV
43,BB,UU
44,II,UU
45,II,SS
46,DD,SS
47,AA,UU
48,BB,VV
49,GG,TT
50,BB,TT

4 个答案:

答案 0 :(得分:7)

由于您的数据是二分的,我建议在一侧绘制第一个因子中的点,在另一个上绘制另一个因子中的点,在它们之间用线条,如下所示:

enter image description here

我用来生成此代码的代码是:

## Make up data.
data <- data.frame(X1=sample(state.region, 10),
                   X2=sample(state.region, 10))

## Set up plot window.
plot(0, xlim=c(0,1), ylim=c(0,1),
     type="n", axes=FALSE, xlab="", ylab="")

factor.to.int <- function(f) {
  (as.integer(f) - 1) / (length(levels(f)) - 1)
}

segments(factor.to.int(data$X1), 0, factor.to.int(data$X2), 1,
         col=data$X1)
axis(1, at = seq(0, 1, by = 1 / (length(levels(data$X1)) - 1)),
     labels = levels(data$X1))
axis(3, at = seq(0, 1, by = 1 / (length(levels(data$X2)) - 1)),
     labels = levels(data$X2))

答案 1 :(得分:5)

这就是我的工作。颜色越深表示A和B的组合越重要。

dataset <- data.frame(A = sample(LETTERS[1:5], 200, prob = runif(5), replace = TRUE), B = sample(LETTERS[1:5], 200, prob = runif(5), replace = TRUE))
Counts <- as.data.frame(with(dataset, table(A, B)))
library(ggplot2)
ggplot(Counts, aes(x = A, y = B, fill = Freq)) + geom_tile() + scale_fill_gradient(low = "white", high = "black")

或者如果你喜欢行

library(ggplot2)
dataset <- data.frame(A = sample(letters[1:5], 200, prob = runif(5), replace = TRUE), B = sample(letters[1:5], 200, prob = runif(5), replace = TRUE))
Counts <- as.data.frame(with(dataset, table(A, B)))
Counts$X <- 0
Counts$Xend <- 1
Counts$Y <- as.numeric(Counts$A)
Counts$Yend <- as.numeric(Counts$B)
ggplot(Counts, aes(x = X, xend = Xend, y = Y, yend = Yend, size = Freq)) +
geom_segment() + scale_x_continuous(breaks = 0:1, labels = c("A", "B")) + 
scale_y_continuous(breaks = 1:5, labels = letters[1:5])

此第三个选项使用geom_text()为数据点添加标签。

library(ggplot2)
dataset <- data.frame(
    A = sample(letters[1:5], 200, prob = runif(5), replace = TRUE), 
    B = sample(LETTERS[20:26], 200, prob = runif(7), replace = TRUE)
)
Counts <- as.data.frame(with(dataset, table(A, B)))
Counts$X <- 0
Counts$Xend <- 1
Counts$Y <- as.numeric(Counts$A)
Counts$Yend <- as.numeric(Counts$B)
ggplot(Counts, aes(x = X, xend = Xend, y = Y, yend = Yend)) + 
geom_segment(aes(size = Freq)) + 
scale_x_continuous(breaks = 0:1, labels = c("A", "B")) + 
scale_y_continuous(breaks = -1) + 
geom_text(aes(x = X, y = Y, label = A), colour = "red", size = 7, hjust = 1, vjust = 1) + 
geom_text(aes(x = Xend, y = Yend, label = B), colour = "red", size = 7, hjust = 0, vjust = 0)

答案 2 :(得分:3)

也许是mosaicplot:

X <- structure(list(
  ID = 1:50,
  A = structure(c(6L, 1L, 2L, 4L, 4L, 3L, 7L, 1L, 3L, 4L, 1L, 1L, 4L, 4L, 1L, 3L, 5L, 5L, 2L, 6L, 6L, 1L, 1L, 1L, 3L, 3L, 5L, 6L, 3L, 2L, 8L, 5L, 2L, 6L, 5L, 2L, 8L, 3L, 5L, 1L, 1L, 6L, 2L, 8L, 8L, 4L, 1L, 2L, 6L, 2L), .Label = c("AA","BB", "CC", "DD", "FF", "GG", "HH", "II"), class = "factor"),
  B = structure(c(3L, 2L, 6L, 2L, 3L, 6L, 8L, 3L, 1L, 8L, 6L, 3L, 2L, 6L, 7L, 8L, 2L, 6L, 5L, 5L, 2L, 1L, 3L, 2L, 5L, 3L, 1L, 4L, 3L, 8L, 3L, 1L, 2L, 7L, 1L, 5L, 1L, 7L, 5L, 6L, 8L, 5L, 4L, 4L, 2L, 2L, 4L, 5L, 3L, 3L), .Label = c("RR", "SS", "TT", "UU", "VV", "XX", "YY", "ZZ"), class = "factor")
  ), .Names = c("ID", "A", "B"), class = "data.frame", row.names = c(NA, -50L)
)

mosaicplot(with(X,table(A,B)))

对于您的示例数据集:

mosaicplot

答案 3 :(得分:2)

谢谢!我认为每个类中元素之间的连接性最好通过Jonathon和Thierry给出的链接图示例来显示。 Thierry的第二个显示幅度绝对是我将要开始的地方。

<强>更新 感谢大家的想法和提示!

我来到了包含可视化此类数据功能的二分包。我认为这是我想要展示的关系的清晰可视化。

确实

    library(bipartite)
    dataset <- data.frame(
         A = sample(letters[1:5], 200, prob = runif(5), replace = TRUE), 
         B = sample(LETTERS[20:26], 200, prob = runif(7), replace = TRUE)
     )
    datamat <- as.matrix(table(dataset$A, dataset$B))
    visweb(datamat, text = "interaction", textsize = .8)

,并提供: visweb result

无法将图片作为新用户投放:(