Question

我有一个行政区列表和一个地方列表（如this one）。每个地方都在一个自治市镇。什么是在R中存储这种层次结构的最佳方式，考虑到我希望有一种方便和可读的方式来访问这些，并使用此列表在地点级别上累积数据自治市镇。

我想出了以下内容：

localities <- list("Mitte" = c("Mitte", "Moabit", "Hansaviertel", "Tiergarten", "Wedding", "Gesundbrunnen",
                   "Friedrichshain-Kreuzberg" = c("Friedrichshain", "Kreuzberg")
                  )

但我不确定这是否是最优雅和最便捷的方式。

如果我想在本地级别分配其他信息，我可以通过将c(...)替换为其他一些电话来实现，例如rbind(c('0201', '0202'), c("Friedrichshain", "Kreuzberg"))如果我想向自治市镇添加其他信息 - 级别（如每个列表的缩写名称和全名），我该怎么做？

编辑：例如，我想将像this这样的表压缩成一个自治市镇版本。

Answer 1

很难知道如何更好地了解您打算如何使用它，但我强烈建议从嵌套列表结构转移到数据框架结构：

library(reshape2)
loc.df <- melt(localities)

这就是熔融数据的样子：

           value                       L1
1          Mitte                    Mitte
2         Moabit                    Mitte
3   Hansaviertel                    Mitte
4     Tiergarten                    Mitte
5        Wedding                    Mitte
6  Gesundbrunnen                    Mitte
7 Friedrichshain Friedrichshain-Kreuzberg
8      Kreuzberg Friedrichshain-Kreuzberg

然后，您可以使用所有标准数据框和其他计算：

loc.df$population <- sample(100:500, nrow(loc.df))    # make up population
tapply(loc.df$population, loc.df$L1, mean)            # population by borough

给出了Borough的平均人口：

Friedrichshain-Kreuzberg                    Mitte 
                278.5000                 383.8333

对于更复杂的计算，您可以使用 data.table and dplyr

Answer 2

您可以使用data.frame库直接将所有这些数据提取到XML。

library(XML)
theurl <- "http://en.wikipedia.org/wiki/Boroughs_and_localities_of_Berlin#List_of_localities"
tables<-readHTMLTable(theurl)

boroughs<-tables[[1]]$Borough
localities<-tables[c(3:14)]
names(localities) <- as.character(boroughs)
all<-do.call("rbind", localities)

Answer 3

@Roland，我认为由于前面提到的原因，您会发现数据框优于列表，但也因为您引用的网页上还有其他数据。如果您愿意，可以轻松加载到数据框中。例如，根据人口密度或其他提供的项目进行比较＆＃34;免费＆＃34;在页面上将是数据框的快照。

如何在R中存储树/嵌套列表？

3 个答案: