我正在尝试将人口普查的FIPS代码,县级唯一标识符,“邻接列表”转换为实际的邻接列表或边缘列表,然后最终转换为邻接矩阵。以下是人口普查FIPS代码数据:http://www2.census.gov/geo/docs/reference/county_adjacency.txt。
问题在于,它不是对该短语的任何传统理解中的“邻接列表”。我是R的新手,请原谅任何错误或缺乏最佳实践......
我的直觉告诉我在列表中循环,将数据子集化为唯一的邻接列表,将每个列表转换为矩阵,然后将矩阵绑定到一个大的二进制矩阵中。我在网上搜索了如何执行此操作,但所有示例都包含非常简单,干净的数据。 :(
人口普查显示像这样的FIPS代码:
"Bullock County, AL" 01011 "Barbour County, AL" 01005
"Bullock County, AL" 01011
"Macon County, AL" 01087
"Montgomery County, AL" 01101
"Pike County, AL" 01109
"Russell County, AL" 01113
"Butler County, AL" 01013 "Butler County, AL" 01013
"Conecuh County, AL" 01035
"Covington County, AL" 01039
"Crenshaw County, AL" 01041
"Lowndes County, AL" 01085
"Monroe County, AL" 01099
"Wilcox County, AL" 01131
当我将URL读入R:
时,文本文件数据显示如下[1] "\"Autauga County, AL\"\t01001\t\"Autauga County, AL\"\t01001" "\t\t\"Chilton County, AL\"\t01021" "\t\t\"Dallas County, AL\"\t01047"
[4] "\t\t\"Elmore County, AL\"\t01051" "\t\t\"Lowndes County, AL\"\t01085" "\t\t\"Montgomery County, AL\"\t01101"
[7] "\"Baldwin County, AL\"\t01003\t\"Baldwin County, AL\"\t01003" "\t\t\"Clarke County, AL\"\t01025" "\t\t\"Escambia County, AL\"\t01053"
[10] "\t\t\"Mobile County, AL\"\t01097"
我使用stringr包的正则表达式来提取实际代码。现在数据如下所示:
> str(cleaner)
List of 100
$ : chr [1:2] "01001" "01001"
$ : chr "01021"
$ : chr "01047"
$ : chr "01051"
$ : chr "01085"
$ : chr "01101"
$ : chr [1:2] "01003" "01003"
$ : chr "01025"
$ : chr "01053"
$ : chr "01097"
$ : chr "01099"
$ : chr "01129"
$ : chr "12033"
我可以对邻接列表中“first”项后面的元素进行分组,如下所示:
#function that groups FIPS codes, displays them by index value
reduce_fips <- function(locations, vect) {
out <- list()
for (i in 1:length(locations)) {
if (i == length(locations)) {
out[[i]] <- locations[i]:length(vect)
} else {
out[[i]] <- locations[i]:(locations[i + 1] - 1)
}
}
out
}
out <- reduce_fips(adj_list_start, fips_codes) #produces adj list values
#problem: some adj list start points contain 2 different values of fips codes
fips_adj_df <- data.frame(cleaner = sapply(out, function(x) x[1]))
fips_adj_df
fips_adj_df$adjacent <- out
#problem: how to transform this into a matrix or connected nodes
这会产生如下所示的输出。但是,它在逻辑上不正确,并且在内存方面搜索会很昂贵。
cleaner adjacent
1 1 1, 2, 3, 4, 5, 6
2 7 7, 8, 9, 10, 11, 12, 13
3 14 14, 15, 16, 17, 18, 19, 20, 21, 22
4 23 23, 24, 25, 26, 27, 28, 29
5 30 30, 31, 32, 33, 34, 35, 36
6 37 37, 38, 39, 40, 41, 42
7 43 43, 44, 45, 46, 47, 48, 49
8 50 50, 51, 52, 53, 54, 55
9 56 56, 57, 58, 59, 60, 61
10 62 62, 63, 64, 65, 66, 67, 68, 69
最终,我想要一个像这样的二进制矩阵,显示FIPS代码是否在地理上彼此相邻。例如,假设100,101和102彼此相邻而103仅与102相邻,我希望矩阵显示这样的信息。
FIPS
FIPS 100 101 102 103
102 1 1 1 1
101 1 1 1 0
100 1 1 1 0
答案 0 :(得分:0)
你在这个问题上有很多事情要做,所以我会尝试将其分解。
首先,您可以使用read.csv从文本文件中获取信息。
df <- read.csv("county_adjacency.txt", sep="\t", stringsAsFactors = FALSE, header = FALSE)
# Drop the names for the counties, you don't need them
df <- df[,c("V2","V4")]
使用动物园图书馆的na.locf填写na值。
library(zoo)
df$V2 <- na.locf(df$V2)
列出你的fips。用它来制作矩阵。
fips <-unique(df$V2)
fips.matrix <- matrix(data=0, nrow = length(fips), ncol = length(fips), dimnames = list(fips,fips))
根据txt文件中的坐标,用1填充矩阵。
df <- as.character(df)
fips.matrix[as.matrix(df)] <-1