不可否认,我是R的新手,但我设法获取了一个大型数据集并提取了我想要的数据并使用plyr将其放入数据框中。我一直试图组合(并计算)重复的行和列。
举个例子,我有......
> df
X x.APPLES x.BANANAS x.PEARS x.ORANGES x.GRAPES x.KIWIS x.APPLES.1 x.ORANGES.1
1 A APPLES
2 B APPLES
3 C APPLES
4 D BANANAS
5 E BANANAS
6 F BANANAS
7 G BANANAS
8 H PEARS ORANGES GRAPES
9 I PEARS ORANGES GRAPES
10 C PEARS ORANGES GRAPES
11 C PEARS ORANGES GRAPES
12 R PEARS ORANGES GRAPES
13 A KIWIS
14 B APPLES
15 Y APPLES
16 A ORANGES
17 J ORANGES
我希望......
X x.APPLES x.BANANAS x.PEARS x.ORANGES x.GRAPES x.KIWIS COUNT
1 A APPLES (1) ORANGES (1) KIWIS (1) 3
2 B APPLES (2) 2
3 C APPLES (1) PEARS (1) ORANGES (2) GRAPES (2) 3
4 D BANANAS (1) 1
5 E BANANAS (1) 1
6 F BANANAS (1) 1
7 G BANANAS (1) 1
8 H PEARS (1) ORANGES (1) GRAPES (1) 1
9 I PEARS (1) ORANGES (1) GRAPES (1) 1
10 R PEARS (1) ORANGES (1) GRAPES (1) 1
11 Y APPLES (1) 1
12 J ORANGES (1) 1
13 COUNT 5 4 4 7 5 1 NA
这是我的实际代码:
library("jsonlite")
library("plyr")
anom <- fromJSON("https://api.fda.gov/drug/event.json?search=_exists_:seriousnesscongenitalanomali&limit=25")
reactions <- anom$results$patient$reaction
drugs <- llply(anom$results$patient$drug, function(x) x$medicinalproduct)
l <- mapply(c, reactions, drugs, SIMPLIFY=FALSE)
df <- ldply (l, data.frame)
答案 0 :(得分:1)
我下载了您的实际数据,并将数据转换为两列data.frame,您可以使用下面的示例转换为您想要的输出。
require(jsonlite)
anom <- fromJSON("https://api.fda.gov/drug/event.json?search=_exists_:seriousnesscongenitalanomali&limit=5")
## Extract the reactions and drugs as character vectors
reactions <- lapply(anom$results$patient$reaction,
function(x) as.character(unlist(x)))
drugs <- lapply(anom$results$patient$drug,
function(x) as.character(unlist(x$medicinalproduct)))
## Use expand.grid to make subset data.frames with all drug/reaction
## combinations for every patient
l <- mapply(expand.grid, reactions, drugs, SIMPLIFY = FALSE)
## Collapse all the subset data.frames into one
two_col <- do.call(rbind, l)
如果我们假设你有两列data.frame要开始:
require(reshape2)
fruits <- c("Bannana", "Apple", "Orange", "Grape", "Kiwi")
example <- data.frame(ID = sample(LETTERS[1:6], 25, replace = TRUE),
Fruit = sample(fruits, 25, replace = TRUE))
# > example
# ID Fruit
# 1 F Kiwi
# 2 A Apple
# 3 F Kiwi
# ...
dcast(example, ID~Fruit, length, value.var = "Fruit")
more_complex <- function(x) {
x_len <- length(x)
x <- paste0(unique(x), " (", x_len, ")")
x
}
dcast(example, ID~Fruit, more_complex, value.var = "Fruit")
# > dcast(example, ID~Fruit, more_complex, value.var = "Fruit")
# ID Apple Bannana Grape Kiwi Orange
# 1 A Apple (2) Bannana (2) Grape (2) (0) Orange (2)
# 2 B Apple (1) (0) (0) Kiwi (1) Orange (2)
# 3 C (0) Bannana (2) (0) Kiwi (1) Orange (1)
# 4 D (0) Bannana (1) (0) (0) Orange (1)
# 5 E (0) (0) Grape (1) Kiwi (1) (0)
# 6 F (0) Bannana (1) Grape (1) Kiwi (2) Orange (1)
another_option <- function(x) {
x_len <- length(x)
if (x_len == 0) return(NA_character_)
x <- paste0(unique(x), " (", x_len, ")")
x
}
dcast(example, ID~Fruit, another_option, value.var = "Fruit")
# > dcast(example, ID~Fruit, another_option, value.var = "Fruit")
# ID Apple Bannana Grape Kiwi Orange
# 1 A Apple (2) Bannana (2) Grape (2) <NA> Orange (2)
# 2 B Apple (1) <NA> <NA> Kiwi (1) Orange (2)
# 3 C <NA> Bannana (2) <NA> Kiwi (1) Orange (1)
# 4 D <NA> Bannana (1) <NA> <NA> Orange (1)
# 5 E <NA> <NA> Grape (1) Kiwi (1) <NA>
# 6 F <NA> Bannana (1) Grape (1) Kiwi (2) Orange (1)