我有一个类似下面的数据框。我想折叠它,以便每个唯一坐标是其子ID的列表。
subID latlon
1 S20298920 29.2178694, -94.9342990
2 S35629295 26.7063982, -80.7168961
3 S35844314 26.7063982, -80.7168961
4 S35833936 26.6836236, -80.3512144
7 S30634757 42.4585456, -76.5146989
8 S35834082 26.4330582, -80.9416786
9 S35857972 26.4330582, -80.9416786
10 S35833885 26.7063982, -80.7168961
所以,在这里,我希望(26.7063982,-80.7168961)是一个包含(S35629295,S35844314)和(29.2178694,-94.9342990)的列表,只是一个包含(S20298920)的列表。我认为列表清单是最有意义的。
答案 0 :(得分:1)
使用aggregate
:
out <- aggregate(data=df,subID~latlon,FUN = function(t) list(sort(paste(t))))
由于您的数据集庞大且繁琐,下面的示例代码使用了更易于阅读的淡化数据。
out <- aggregate(data=df,name~ID,FUN = function(t) list(sort(paste(t))))
out
ID name
1 1 apple, orange
2 2 orange
3 3 apple, orange
数据:强>
df <- data.frame(ID=c(1,1,2,3,3),
name=c('apple', 'orange', 'orange', 'orange', 'apple'))
答案 1 :(得分:0)
with(data,tapply(subID,latlon,as.list))
输出:
$`26.4330582 -80.9416786`
$`26.4330582 -80.9416786`[[1]]
[1] "S35834082"
$`26.4330582 -80.9416786`[[2]]
[1] "S35857972"
$`26.6836236 -80.3512144`
$`26.6836236 -80.3512144`[[1]]
[1] "S35833936"
:
:
:
数据:
data=read.table(text="subID latlon
S20298920 '29.2178694 -94.9342990'
S35629295 '26.7063982 -80.7168961'
S35844314 '26.7063982 -80.7168961'
S35833936 '26.6836236 -80.3512144'
S30634757 '42.4585456 -76.5146989'
S35834082 '26.4330582 -80.9416786'
S35857972 '26.4330582 -80.9416786'
S35833885 '26.7063982 -80.7168961' ",h=T,stringsAsFactors=F)
答案 2 :(得分:0)
在tidyverse中,您可以使用tidyr::nest
来嵌套数据框:
library(tidyverse)
df <- data_frame(subID = c("S20298920", "S35629295", "S35844314", "S35833936", "S30634757", "S35834082", "S35857972", "S35833885"),
latlon = c("29.2178694, -94.934299", "26.7063982, -80.7168961", "26.7063982, -80.7168961", "26.6836236, -80.3512144", "42.4585456, -76.5146989", "26.4330582, -80.9416786", "26.4330582, -80.9416786", "26.7063982, -80.7168961"))
df %>% nest(subID)
#> # A tibble: 5 x 2
#> latlon data
#> <chr> <list>
#> 1 29.2178694, -94.934299 <tibble [1 x 1]>
#> 2 26.7063982, -80.7168961 <tibble [3 x 1]>
#> 3 26.6836236, -80.3512144 <tibble [1 x 1]>
#> 4 42.4585456, -76.5146989 <tibble [1 x 1]>
#> 5 26.4330582, -80.9416786 <tibble [2 x 1]>
或仅与list
汇总以制作向量的列表列:
df %>%
group_by(latlon) %>%
summarise_all(list)
#> # A tibble: 5 x 2
#> latlon subID
#> <chr> <list>
#> 1 26.4330582, -80.9416786 <chr [2]>
#> 2 26.6836236, -80.3512144 <chr [1]>
#> 3 26.7063982, -80.7168961 <chr [3]>
#> 4 29.2178694, -94.934299 <chr [1]>
#> 5 42.4585456, -76.5146989 <chr [1]>