我有一个字符串变量response
:
where where where is it
I'm going there
where where did you say
sometimes it is where you think
i think its where where you go
its everywhere where you are
i am planning on going where where where i want to
如您所见,“ where”一词经常被重复。我想用“ where”替换字符串“ where where”和“ where where where”(甚至“ where where where where”)。
但是,我不想用“ where”代替“ everywherewhere”。
我知道我可以手动执行此操作,但是我希望将代码压缩为尽可能少的行。
这是我到目前为止一直在尝试的事情:
gen temp = regexr(response, " (where)+ where ", " where ")
replace temp = regexr(response, "^(where)+ where ", "where ")
这些是我运行上面的代码后的结果:
where where is it
I'm going there
where did you say
sometimes it is where you think
i think its where where you go
its everywhere where you are
i am planning on going where where where i want to
相反,我希望最终数据看起来像这样:
where is it
I'm going there
where did you say
sometimes it is where you think
i think its where you go
its everywhere where you are
i am planning on going where i want to
我一直在使用“(where)+”来捕获“ where where”和“ where wherewhere”,但是似乎不起作用。我还将代码分成两个命令,一个以“ ^(where)”开头,另一个以“(where)”开头,以避免捕获“ everywhere”中的“ where”,但似乎代码无法捕获出现在句子中间的“哪里”。
答案 0 :(得分:1)
使用Stata的字符串函数的快速修复方法如下:
clear
input str50 string1
"where where where is it"
"I'm going there"
"where where did you say"
"sometimes it is where you think"
"i think its where where you go"
"its everywhere where you are"
"i am planning on going where where where i want to"
end
generate tag1 = !strmatch(string1, "*everywhere where*")
generate tag2 = ( length(string1) - length(subinstr(string1, "where", "", .)) ) / 5
generate string2 = cond(tag1 == 1, stritrim(subinstr(string1, "where", "", tag2-1)), string1)
list string2, separator(0)
+----------------------------------------+
| string2 |
|----------------------------------------|
1. | where is it |
2. | I'm going there |
3. | where did you say |
4. | sometimes it is where you think |
5. | i think its where you go |
6. | its everywhere where you are |
7. | i am planning on going where i want to |
+----------------------------------------+