s <- "height(female), weight, BMI, and BMI."
在上面的字符串中,单词BMI重复两次。我希望字符串为:
"height (female), weight, and BMI."
我尝试了以下方法将字符串分解为独特的部分:
> unique(strsplit(s, " ")[[1]])
[1] "height" "(female)," "weight," "BMI," "and" "BMI."
但自“BMI”和“BMI”以来。不是相同的字符串,使用unique
并没有摆脱其中一个。
编辑:我怎样才能移动重复的短语? (即体重指数而不是BMI)
s <- "height (female), weight, weight, body mass index, body mass index."
s <- stringr::str_replace(s, "(?<=, |^)\\b([()\\w\\s]+),\\s(.*?)((?: and)?(?=\\1))", "\\2")
> stringr::str_replace(s, "(\\w+)(\\(.*?\\))", "\\1 \\2")
[1] "height (female), weight, body mass index, body mass index."
答案 0 :(得分:1)
首先使用这样的正则表达式替换不需要的重复项可能会有所帮助:
import urllib
import urllib.request
import json
googleGeocodeUrl = 'https://maps.googleapis.com/maps/api/place/textsearch/json?query='
keyword = "hospitales"
geolocation = "&location=-12.135,-77.023&radius=5000"
APIKEY = '&key='+'AIzaSyg5v17Ik'
url = googleGeocodeUrl + keyword + geolocation + APIKEY
print(url)
url = googleGeocodeUrl + keyword + geolocation + APIKEY
json_response = urllib.request.urlopen(url)
search = json_response.read().decode('utf-8')
searchjson = json.loads(search)
export = open('hopital.csv','w')
for place in searchjson['results']:
print(place['name'])
print(place['geometry']['location'])
export.write(place['name']+','+str(place['geometry']['location']['lng'])\
+','+str(place['geometry']['location']['lat'])+'\n')
export.close()
<强>解释强>
(?<=,|^)([()\w\s]+),\s(.*?)((?: and)?(?=\1))
前边界。 ((?<=, |^)\b
也应该有效,但没有正确锚定)\b
块元素([()\w\s]+),
中间的一切\s(.*?)((?: and)?
重复元素代码示例:
(?=\1))
输出:
#install.packages("stringr")
library(stringr)
s <- "height(female), weight, BMI, and BMI."
stringr::str_replace(s, "(?<=, |^)\\b([()\\w\\s]+),\\s(.*?)((?: and)?(?=\\1))", "\\2")
关于括号中的部分分离,请使用其他替换:
[1] "height(female), weight, and BMI."
输出:
stringr::str_replace(s, "(\\w+)(\\(.*?\\))", "\\1 \\2")
测试并整理东西:
[1] "height (female), weight, and BMI."
输出:
s <- c("height(female), weight, BMI, and BMI."
,"height(female), weight, whatever it is, and whatever it is."
,"height(female), weight, age, height(female), and BMI."
,"weight, weight.")
s <- stringr::str_replace(s, "(?<=, |^)\\b([()\\w\\s]+),\\s(.*?)((?: and)?(?=\\1))", "\\2")
stringr::str_replace(s, "(\\w+)(\\(.*?\\))", "\\1 \\2")
答案 1 :(得分:1)
您可以尝试使用此正则表达式:
(\b\w+\b)[^\w\r\n]+(?=.*\1)
并用空字符串替换每个匹配
<强> Click for Demo 强>
<强> Check the Ruby Code 强>
<强>输入强>
height(female), weight, BMI, BMI, BMI, BMI, BMI, BMI, BMI, BMI, BMI, BMI, and BMI.
height(female), weight, BMI, age, and BMI.
<强>输出强>
height(female), weight, and BMI.
height(female), weight, age, and BMI.
<强>解释强>
(\b\w+\b)
- 匹配由字边界包围的单词字符的1 +次出现并在第1组中捕获它[^\w\r\n]+
- 匹配任何既不是单词也不是换行符的字符的出现次数。因此,这将匹配,
,.
或空格。(?=.*\1)
- 正向前瞻以验证组1中匹配的内容必须在字符串的后面再次出现。只有在这种情况下才会进行更换。注意:这将保留重复单词的最后一次出现。
或者,如果重复的单词也包含空格,则可以使用(\b[^,]+)[, ]+(?=.*\1)
。
答案 2 :(得分:0)
library(stringr)
s <- "height(female), weight, BMI, and BMI, and more even more BMI."
pieces <- unlist(str_split(s, "\\b"))
non_word <- !grepl("\\w", pieces)
# if you want to keep just the last instance of a duplicated word
non_duped <- !duplicated(pieces, fromLast = TRUE)
paste0(pieces[non_word | non_duped], collapse = "")
#> [1] "height(female), weight, , , and even more BMI."
# if you want to keep just the first instance of a duplicated word
non_duped <- !duplicated(pieces, fromLast = FALSE)
paste0(pieces[non_word | non_duped], collapse = "")
#> [1] "height(female), weight, BMI, and , more even ."