我有一列坐标,我正在与strsplit()
分开并从gsub()
中删除不需要的字符。请注意, 3034行。
> head(bike_parking$Geom)
[1] "(37.7606289177, -122.410647009)" "(37.752476948, -122.410625009)"
[3] "(37.7871729481, -122.402401009)" "(37.7776039475, -122.422764009)"
[5] "(37.7658325695, -122.46649784)" "(37.7693399479, -122.432820008)"
> length(bike_parking$Geom)
[1] 3034
> sum(is.na(bike_parking$Geom))
[1] 0
出于某种原因,在我运行
之后dat <- data.frame(do.call(rbind, strsplit(as.vector(gsub("[()]", "", bike_parking$Geom)), split = ",")))
我留下 3033 。这是怎么发生的,我采取了哪些措施来弄清楚出了什么问题?
> head(dat)
X1 X2
1 37.7606289177 -122.410647009
2 37.752476948 -122.410625009
3 37.7871729481 -122.402401009
4 37.7776039475 -122.422764009
5 37.7658325695 -122.46649784
6 37.7693399479 -122.432820008
> nrow(dat)
[1] 3033
答案 0 :(得分:0)
看起来你的字符串到处都没有相同的结构。您将以某种方式知道它们共有哪个结构以正确分割它们。从问题下面的注释中,我推导出一些字符串可能不包含用于分割坐标的逗号。您可以删除所有逗号并在空白区域拆分字符串。我将在基础R中发布解决方案,并在stringr
- 包中发布解决方案。
选项1:基础R:
我们可以使用gsub()
从字符串中删除括号和逗号。然后我们可以使用strsplit()
在空格处拆分字符串。结果将是:
splitted <- strsplit(gsub("[(),]", "", bike_parking$Geom), " ")
# [[1]]
# [1] "37.7606289177" "-122.410647009"
# [[2]]
# [1] "37.752476948" "-122.410625009"
# [[3]]
# [1] "37.7871729481" "-122.402401009"
# [[4]]
# [1] "37.7776039475" "-122.422764009"
# [[5]]
# [1] "37.7658325695" "-122.46649784"
# [[6]]
# [1] "37.7693399479" "-122.432820008"
我们必须稍微重新组织这些结果,因此您最终会得到一个包含两列的data.frame:
sapply(1:2, function(x) sapply(splitted, `[[`, x))
# [,1] [,2]
# [1,] "37.7606289177" "-122.410647009"
# [2,] "37.752476948" "-122.410625009"
# [3,] "37.7871729481" "-122.402401009"
# [4,] "37.7776039475" "-122.422764009"
# [5,] "37.7658325695" "-122.46649784"
# [6,] "37.7693399479" "-122.432820008"
选项2:Stringr:此软件包包含一个函数str_split()
( not strsplit()
!),可让您跳过最后一步在基本R解决方案中,因为您可以立即获得data.frame而不是带有向量的列表:
str_split(gsub("[(),]", "", bike_parking$Geom), " ", simplify=TRUE)