字符串拆分后消失的行

时间:2017-09-29 16:41:57

标签: r dplyr gsub strsplit

我有一列坐标,我正在与strsplit()分开并从gsub()中删除不需要的字符。请注意, 3034行

> head(bike_parking$Geom)
[1] "(37.7606289177, -122.410647009)" "(37.752476948, -122.410625009)" 
[3] "(37.7871729481, -122.402401009)" "(37.7776039475, -122.422764009)"
[5] "(37.7658325695, -122.46649784)"  "(37.7693399479, -122.432820008)"

> length(bike_parking$Geom)
[1] 3034

 > sum(is.na(bike_parking$Geom))
[1] 0

出于某种原因,在我运行

之后
dat <- data.frame(do.call(rbind, strsplit(as.vector(gsub("[()]", "", bike_parking$Geom)), split = ",")))

我留下 3033 。这是怎么发生的,我采取了哪些措施来弄清楚出了什么问题?

> head(dat)
             X1              X2
1 37.7606289177  -122.410647009
2  37.752476948  -122.410625009
3 37.7871729481  -122.402401009
4 37.7776039475  -122.422764009
5 37.7658325695   -122.46649784
6 37.7693399479  -122.432820008

> nrow(dat)
[1] 3033

1 个答案:

答案 0 :(得分:0)

看起来你的字符串到处都没有相同的结构。您将以某种方式知道它们共有哪个结构以正确分割它们。从问题下面的注释中,我推导出一些字符串可能不包含用于分割坐标的逗号。您可以删除所有逗号并在空白区域拆分字符串。我将在基础R中发布解决方案,并在stringr - 包中发布解决方案。

选项1:基础R: 我们可以使用gsub()从字符串中删除括号和逗号。然后我们可以使用strsplit()在空格处拆分字符串。结果将是:

splitted <- strsplit(gsub("[(),]", "", bike_parking$Geom), " ")
# [[1]]
# [1] "37.7606289177"  "-122.410647009"
# [[2]]
# [1] "37.752476948"   "-122.410625009"
# [[3]]
# [1] "37.7871729481"  "-122.402401009"
# [[4]]
# [1] "37.7776039475"  "-122.422764009"
# [[5]]
# [1] "37.7658325695" "-122.46649784"
# [[6]]
# [1] "37.7693399479"  "-122.432820008"

我们必须稍微重新组织这些结果,因此您最终会得到一个包含两列的data.frame:

sapply(1:2, function(x) sapply(splitted, `[[`, x))
#      [,1]            [,2]            
# [1,] "37.7606289177" "-122.410647009"
# [2,] "37.752476948"  "-122.410625009"
# [3,] "37.7871729481" "-122.402401009"
# [4,] "37.7776039475" "-122.422764009"
# [5,] "37.7658325695" "-122.46649784" 
# [6,] "37.7693399479" "-122.432820008"

选项2:Stringr:此软件包包含一个函数str_split() not strsplit()!),可让您跳过最后一步在基本R解决方案中,因为您可以立即获得data.frame而不是带有向量的列表:

str_split(gsub("[(),]", "", bike_parking$Geom), " ", simplify=TRUE)