对多个geojson文件运行分析

时间:2018-01-07 19:24:29

标签: r for-loop gis geojson lapply

我有大约113个geojson文件,我以前主要在QGIS中处理过这些文件。我现在的目标是能够同时将所有文件导入R并对附加到每个相应层的基础属性表进行分析。我已经找到了导入一个文件并在转换成数据框后进行任何所需分析的最佳方法。我在文件夹中的文件都是这样的:0cfb16c1-90c2-412d-bb60-2fec34c75e9a.geojson

我用于此步骤的代码是:

library(rgdal)
map1 <- readOGR(dsn = "/Users/chris/Documents/GeorgetownMPPMSFS/McCourtMPP/BIGWork/BIGDataFiles/maps/sampled_maps/0cfb16c1-90c2-412d-bb60-2fec34c75e9a.geojson", layer = "0cfb16c1-90c2-412d-bb60-2fec34c75e9a")
summary(map1)
map1 <- as.data.frame(map1)

我想在所有geojson文件上运行我在该地图上所做的相同分析,而不必逐个进行。我进行的分析涉及选举重新划分指标,并包括在内:

cfbdata$reptotal <- (cfbdata$surveyed_republican_percentage/100)*cfbdata$surveyed_total
cfbdata$demtotal <- (cfbdata$surveyed_democrat_percentage/100)*cfbdata$surveyed_total
cfbdata$NAME <- NULL
aggdata <-aggregate(cfbdata, by=list(cfbdata$cluster), 
                    FUN=sum, na.rm=TRUE)
# Rep district victory is 1 and Dem district victory is 0
aggdata$result <- ifelse(aggdata$reptotal > aggdata$demtotal,1, ifelse(aggdata$demtotal > aggdata$reptotal,0, NA))

EffGapCalc <- subset(aggdata, select=c("cluster","reptotal","demtotal","surveyed_total", "result"))

# Step 1: Calculate Dem Wasted, Rep Wasted, and Net Wasted

EffGapCalc$repwasted <- ifelse(EffGapCalc$result == 1, EffGapCalc$reptotal - (.51*EffGapCalc$surveyed_total), ifelse(EffGapCalc$result == 0, EffGapCalc$reptotal, NA))

EffGapCalc$demwasted <- ifelse(EffGapCalc$result == 0, EffGapCalc$demtotal - (.51 * EffGapCalc$surveyed_total), ifelse(EffGapCalc$result == 1, EffGapCalc$demtotal, NA))

EffGapCalc$netwasted <- abs(EffGapCalc$repwasted - EffGapCalc$demwasted)

# Step 2: Sum Total Wasted Rep and Dem Votes
totrepwasted <- sum(EffGapCalc$repwasted)
totdemwasted <- sum(EffGapCalc$demwasted)
netwaste <- ifelse(totrepwasted>totdemwasted, totrepwasted-totdemwasted, ifelse(totrepwasted<totdemwasted, totdemwasted-totrepwasted))
netwaste
# Democrats had a net waste (more wasted votes) of 74289.6

# Step 3: Divide Net Wasted by Total Number of Votes Case
sum(EffGapCalc$surveyed_total)
totalsurvtot <- sum(EffGapCalc$surveyed_total)
netwaste/totalsurvtot
# Efficiency Gap = .0359 [3.60%]

目标是对所有113个GEOJSON文件运行相同的分析,并获得113个“效率差距”数字列表,如上面的.0359。

我已经在stackoverflow和其他地方搜索过一些问题,但还没有找到合适的解决方案。虽然我最初认为for循环最适合这个,但基于我在其他地方读到的内容,似乎lapply()实际上可能是更好的选择。我所面临的挑战是确保正确导入作为'lapply()'

的一部分

我尝试使用的代码失败了:

library(rgdal)
fileNames <- list.files(path = "/Users/chris/Documents/GeorgetownMPPMSFS/McCourtMPP/BIGWork/BIGDataFiles/maps/sampled_maps", pattern="*.geojson", full.names = TRUE)

lapply(fileNames, function(x) {
  map1 <- readOGR(dsn = x, layer = x)
  map1 <- as.data.frame(map1)
  out <- map1$reptotal <- (map1$surveyed_republican_percentage/100)*map1$surveyed_total;
  map1$demtotal <- (map1$surveyed_democrat_percentage/100)*map1$surveyed_total;
  map1$NAME <- NULL;
  aggdata <-aggregate(map1, by=list(map1$cluster), 
                      FUN=sum, na.rm=TRUE);
  aggdata$result <- ifelse(aggdata$reptotal > aggdata$demtotal,1, ifelse(aggdata$demtotal > aggdata$reptotal,0, NA));

  EffGapCalc <- subset(aggdata, select=c("cluster","reptotal","demtotal","surveyed_total", "result"));
  # Step 1: Calculate Dem Wasted, Rep Wasted, and Net Wasted
  EffGapCalc$repwasted <- ifelse(EffGapCalc$result == 1, EffGapCalc$reptotal - (.51*EffGapCalc$surveyed_total), ifelse(EffGapCalc$result == 0, EffGapCalc$reptotal, NA));

  EffGapCalc$demwasted <- ifelse(EffGapCalc$result == 0, EffGapCalc$demtotal - (.51 * EffGapCalc$surveyed_total), ifelse(EffGapCalc$result == 1, EffGapCalc$demtotal, NA));

  EffGapCalc$netwasted <- abs(EffGapCalc$repwasted - EffGapCalc$demwasted);

  # Step 2: Sum Total Wasted Rep and Dem Votes
  totrepwasted <- sum(EffGapCalc$repwasted);
  totdemwasted <- sum(EffGapCalc$demwasted);
  netwaste <- ifelse(totrepwasted>totdemwasted, totrepwasted-totdemwasted, ifelse(totrepwasted<totdemwasted, totdemwasted-totrepwasted));
  netwaste

  # Step 3: Divide Net Wasted by Total Number of Votes Case
  totalsurvtot <- sum(EffGapCalc$surveyed_total);
  netwaste/totalsurvtot;

  write.table(out, "/Users/chris/Documents/GeorgetownMPPMSFS/McCourtMPP/BIGWork/BIGDataFiles", sep="\t", quote=F, row.names=F, col.names=T)
})

在这一点上,我一直试图弄清楚这两天,而且我只是变得更加困惑。任何帮助将不胜感激!

1 个答案:

答案 0 :(得分:1)

简单的测试代码:

lapply(fileNames, function(x) {
  map1 <- readOGR(dsn = x, layer = x)
}

假设您的案例失败,我们知道问题出在那一行。这使得这里的人更容易看到它更简单的问题。请始终尽量减少您的问题,这将有助于我们帮助您,在许多情况下,它可以让您自己解决。出发...

对于geoJSON,

readOGR需要文件路径和图层名称,并且该代码将使用geojson包中的测试文件将文件路径作为图层名称提供,如此::

> testfile <- list.files(path = path, pattern="*.geojson", full.names = TRUE)[5]

快速检查我们得到了它:

> file.exists(testfile)
[1] TRUE

然后尝试阅读:

> d = readOGR(dsn=testfile, layer=testfile)
Error in ogrInfo(dsn = dsn, layer = layer, encoding = encoding, use_iconv = use_iconv,  : 
  Cannot open layer

那么我们如何从文件路径中获取图层名称?我们有ogrListLayers

> ogrListLayers(testfile)
[1] "OGRGeoJSON"
attr(,"driver")
[1] "GeoJSON"
attr(,"nlayers")
[1] 1

现在看起来很奇怪,但它是一个图层名称的矢量和一些额外的属性,你可以忽略它们。该测试层的图层名称为“OGRGeoJSON”。假设您知道geoJSON只是一层,您可以这样做:

> d = readOGR(dsn=testfile, layer=ogrListLayers(testfile))
OGR data source with driver: GeoJSON 
Source: "/home/rowlings/R/x86_64-pc-linux-gnu-library/3.4/geojson/examples/linestring_one.geojson", layer: "OGRGeoJSON"
with 1 features
It has 2 fields
Warning message:
In readOGR(dsn = testfile, layer = ogrListLayers(testfile)) :
  Z-dimension discarded

现在我认为geoJSONs只能有一个图层,或者readOGR默认为第一个图层,所以如果你知道geoJSONs中只有一个图层,你可以省略 layer=参数并返回一个相同的对象:

> d2 = readOGR(dsn=testfile)
OGR data source with driver: GeoJSON 
Source: "/home/rowlings/R/x86_64-pc-linux-gnu-library/3.4/geojson/examples/linestring_one.geojson", layer: "OGRGeoJSON"
with 1 features
It has 2 fields
Warning message:
In readOGR(dsn = testfile) : Z-dimension discarded