我正在处理一个数据集,其中一列(Place
)由位置句组成。
librabry(tidyverse)
example <- tibble(Datum = c("October 1st 2017",
"October 2st 2017",
"October 3rd 2017"),
Place = c("Tabiyyah Jazeera village, 20km south east of Deir Ezzor, Deir Ezzor Governorate, Syria",
"Abu Kamal, Deir Ezzor Governorate, Syria",
"شارع القطار al Qitar [train] street, al-Tawassiya area, north of Raqqah city centre, Raqqah governorate, Syria"))
我想用逗号分隔符拆分Place
列,因此我更喜欢使用tidyverse package
的解决方案。因为Place
的值有不同的长度,所以我想从右到左开始。因此,国家/地区Syria
是此数据框最后一列中的值。
哦,对于使用RegEx代码的奖金,我会删除阿拉伯字符吗?
提前致谢。
编辑:找到我的答案: 删除阿拉伯字符(感谢@ g5w):
gsub("[\u0600-\u06FF]", "", airstrikes_okt_clean$Plek)
以整齐的方式拆分列:
airstrikes_okt_clean <- separate(example,
Place,
into = c("detail",
"detail2",
"City_or_village",
"District",
"Country"),
sep = ",",
fill = "left")
答案 0 :(得分:1)
只需将字符串拆分为逗号即可。
lapply(strsplit(Place, ","), rev)
[[1]]
[1] " Syria" " Deir Ezzor Governorate"
[3] " 20km south east of Deir Ezzor" "Tabiyyah Jazeera village"
[[2]]
[1] " Syria" " Deir Ezzor Governorate"
[3] "Abu Kamal"
[[3]]
[1] " Syria" " Raqqah governorate"
[3] " north of Raqqah city centre" " al-Tawassiya area"
[5] "شارع القطار al Qitar [train] street"
要在分割前删除阿拉伯字符,请尝试
gsub("[\u0600-\u06FF]", "", Place)
[1] "Tabiyyah Jazeera village, 20km south east of Deir Ezzor, Deir Ezzor Governorate, Syria"
[2] "Abu Kamal, Deir Ezzor Governorate, Syria"
[3] " al Qitar [train] street, al-Tawassiya area, north of Raqqah city centre, Raqqah governorate, Syria"
答案 1 :(得分:0)
这是一个单行。
sapply(strsplit(example$Place, ","), function(x) trimws(x[length(x)]))
它将在最后一个逗号之后返回字符串,无论是Syria
还是其他任何逗号。