我有一个数据框,我从.csv文件中读取,看起来像这样:
job name `phone number`
<chr> <chr> <int>
1 developer john 654
2 developer mike 321
3 developer albert 987
4 manager dana 741
5 manager guy 852
6 manager anna 936
7 developer dan 951
8 developer shean 841
9 administrative rebeca 357
10 administrative krissy 984
11 administrative hilma 651
12 administrative otis 325
13 administrative piper 654
14 manager mendy 984
15 manager corliss 321
DT = structure(list(job = c("developer", "developer", "developer",
"manager", "manager", "manager", "developer", "developer", "administrative",
"administrative", "administrative", "administrative", "administrative",
"manager", "manager"), name = c("john", "mike", "albert", "dana",
"guy", "anna", "dan", "shean", "rebeca", "krissy", "hilma", "otis",
"piper", "mendy", "corliss"), phone = c(654L, 321L, 987L, 741L,
852L, 936L, 951L, 841L, 357L, 984L, 651L, 325L, 654L, 984L, 321L
)), .Names = c("job", "name", "phone"), row.names = c(NA, -15L
), class = "data.frame")
我想将其转换为列表列表,例如:
myList$developer
会给我一个包含所有开发人员的列表,然后
myList$developer$john
会给我一个与名为John的开发者相关联的电话号码列表。有没有简单的方法呢?
如果您对我为什么要这样做感到好奇:我正在使用的实际数据框很大,所以通过4个参数找到一个特定的条目(在这个例子中我可以找到一个特定的带有2个参数的入口:job,name)使用过滤器占用太多时间。我认为嵌套列表的哈希表结构可能需要花费很多时间来构建,但是可以在O(1)中搜索,这对我来说绝对有效。 如果我错了,你有更好的方法,我也很乐意听到它。
答案 0 :(得分:5)
我正在使用的实际数据框是巨大的,因此通过4个参数查找特定条目(在此示例中,我可以找到具有2个参数的特定条目:job,name)使用过滤器花费太多时间。我认为嵌套列表的哈希表结构可能需要花费很多时间来构建,但是可以在O(1)中搜索,这对我来说绝对有效。如果我错了,你有更好的方法,我也很乐意听到它。
显然名称查找behaves like O(n), not O(1)。
一种可能更好的方法是使用data.table,它使用二进制搜索。
library(data.table)
setDT(DT, key = c("job", "name"))
get_phones = function(..., d = DT) d[list(...), phone]
使用示例
get_phones("developer", "john")
# [1] 654
get_phones("administrative")
# [1] 651 984 325 654 357
请参阅vignette("datatable-keys-fast-subset")
或(可能已过时)copy online。
答案 1 :(得分:3)
您可以使用带有split
的双lapply
和drop = TRUE
- 参数。使用drop = TRUE
将丢弃不会发生的级别,从而阻止创建空列表元素。
使用:
l <- split(dat, dat$job, drop = TRUE)
nestedlist <- lapply(l, function(x) split(x, x[['name']], drop = TRUE))
或者一气呵成:
nestedlist <- lapply(split(dat, dat$job, drop = TRUE),
function(x) split(x, x[['name']], drop = TRUE))
给出:
> nestedlist $administrative $administrative$hilma job name phonenumber 11 administrative hilma 651 $administrative$krissy job name phonenumber 10 administrative krissy 984 $administrative$otis job name phonenumber 12 administrative otis 325 $administrative$piper job name phonenumber 13 administrative piper 654 $administrative$rebeca job name phonenumber 9 administrative rebeca 357 $developer $developer$albert job name phonenumber 3 developer albert 987 $developer$dan job name phonenumber 7 developer dan 951 $developer$john job name phonenumber 1 developer john 654 $developer$mike job name phonenumber 2 developer mike 321 $developer$shean job name phonenumber 8 developer shean 841 $manager $manager$anna job name phonenumber 6 manager anna 936 $manager$corliss job name phonenumber 15 manager corliss 321 $manager$dana job name phonenumber 4 manager dana 741 $manager$guy job name phonenumber 5 manager guy 852 $manager$mendy job name phonenumber 14 manager mendy 984
使用过的数据:
dat <- structure(list(job = c("developer", "developer", "developer", "manager", "manager", "manager", "developer", "developer", "administrative", "administrative", "administrative", "administrative", "administrative", "manager", "manager"),
name = c("john", "mike", "albert", "dana", "guy", "anna", "dan", "shean", "rebeca", "krissy", "hilma", "otis", "piper", "mendy", "corliss"),
phonenumber = c(654L, 321L, 987L, 741L, 852L, 936L, 951L, 841L, 357L, 984L, 651L, 325L, 654L, 984L, 321L)),
.Names = c("job", "name", "phonenumber"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15"))