Count all values in a column based on string in another column in R for a Venn diagram

时间:2015-07-28 15:41:05

标签: r replace plot dataframe

I have a file that I converted to a dataframe that looks as follows:

D <- data.frame(
    V1 =c("B", "A_B", "A_B_C", "C_D", "A_C", "C_B_D", "C", "C_A_B_D", "B_C", "C_A_D", "A_D", "D", "A", "B_D", "A_B_D"), 
    V2 = c(15057, 5, 9, 1090, 4, 1250, 3943, 11, 2517, 5, 5, 2280, 5, 1735, 4))

I need to convert this dataframe into a list of numbers that I can use to create a 4-way venn plot. In this example the values are the correct values if added correctly. I did this manually but since I need to create several similar plots I would like to find a way to do this more efficiently.

library("VennDiagram")
venn.plot <- draw.quad.venn(
  area1 = 48,
  area2 = 20588,
  area3 = 8829,
  area4 = 6380,
  n12 = 29,
  n13 = 29,
  n14 = 25,
  n23 = 3787,
  n24 = 3000,
  n34 = 2356,
  n123 = 20,
  n124 = 15,
  n134 = 16,
  n234 = 1261,
  n1234 = 11,
  category = c("A", "B", "C", "D"),
  fill = c("orange", "red", "green", "blue"),
  lty = "dashed",
  cex = 2,
  cat.cex = 2,
  cat.col = c("orange", "red", "green", "blue")
);

In this case I would need to count up all values from D$V2 that has an "A" in column V1 and so on. Then I would need to order appropriately for the venn plot function.

1 个答案:

答案 0 :(得分:8)

Here's what I would do

# setup
myset = LETTERS[1:4]

# create dummies
D[,myset] <- lapply(myset, grepl, D$V1)

# construct counts
myn    <- length(myset)
mynums <- unlist(sapply(seq(myn), function(n) 
    apply(if (n==myn) matrix(seq(myn)) else combn(myn,n), 2, function(x)
        with(D, sum( V2[Reduce("&", mget(myset[x]))] ))
)))

# pass counts to plotter
do.call(draw.quad.venn, c(as.list(unname(mynums)), list(
  category = myset,
  fill = c("orange", "red", "green", "blue"),
  lty = "dashed",
  cex = 2,
  cat.cex = 2,
  cat.col = c("orange", "red", "green", "blue")
)))

enter image description here