在句子中查找单词并创建指标变量

时间:2018-12-07 11:29:33

标签: stata

我有一个带有各种句子的变量:

Cats are good pets, for they are clean and are not noisy.
Abstraction is often one floor above you.
She wrote a long letter to Charlie, but he didn't read it.
Where do random thoughts come from?
Mary plays the piano.
I want more detailed information.
I'd rather be a bird than a fish.
When I was little I had a car door slammed shut on my hand. I still remember it quite vividly.
Malls are great places to shop; John can find everything he needs under one roof.
My Mum tries to be cool by saying that she likes all the same things that I do.

如果找到名称,如何创建变量name == 1

如果句子中的任何单词与我选择的单词匹配(例如name == 2),我也希望拥有变量letter

我尝试了以下操作:

gen name = regexm(sentence, "letter* & (Charlie | Mary | John)*")` 

但是,这不起作用。我在所有观察结果中只得到name == 0

2 个答案:

答案 0 :(得分:1)

考虑示例中的句子:

clear

input strL sentence
"Cats are good pets, for they are clean and are not noisy."
"Abstraction is often one floor above you."
"She wrote a long letter to Charlie, but he didn't read it."
"Where do random thoughts come from?"
"Mary plays the piano."
"I want more detailed information."
"I'd rather be a bird than a fish."
"When I was little I had a car door slammed shut on my hand. I still remember it quite vividly."
"Malls are great places to shop; John can find everything he needs under one roof."
"My Mum tries to be cool by saying that she likes all the same things that I do."           
end

通过组合strmatch()ustrregexm() 功能:

generate name = strmatch(sentence, "*letter*") + ustrregexm(sentence, "(Charlie|Mary|John)")

您可以获得所需的输出:

list name, separator(0)

     +------+
     | name |
     |------|
  1. |    0 |
  2. |    0 |
  3. |    2 |
  4. |    0 |
  5. |    1 |
  6. |    0 |
  7. |    0 |
  8. |    0 |
  9. |    1 |
 10. |    0 |
     +------+

答案 1 :(得分:1)

正则表达式很棒,但Catch-22则是您必须非常努力地学习语言。如果并且当您精通时,就会看到好处。

我将把它留给其他答案,以提供智能的正则表达式解决方案。这里的目的是强调其他字符串函数也可以使用。在这里,我利用strpos()返回肯定结果的事实,如果它在另一个字符串中找到一个字符串,则返回true。而且,该Stata可以解析为单词,因此即使(例如)仅当是单词时才找到字符串,从第一原则上讲并不太困难。

clear 
input strL whatever 
"Cats are good pets, for they are clean and are not noisy."
"Abstraction is often one floor above you."
"She wrote a long letter to Charlie, but he didn't read it."
"Where do random thoughts come from?"
"Mary plays the piano."
"I want more detailed information."
"I'd rather be a bird than a fish."
"When I was little I had a car door slammed shut on my hand. I still remember it quite vividly."
"Malls are great places to shop; John can find everything he needs under one roof."
"My Mum tries to be cool by saying that she likes all the same things that I do."
end 

gen wanted1 = strpos(whatever, "Charlie") | strpos(whatever, "Mary") | strpos(whatever, "John") 

* cat or cats as a word 
gen wanted2 = 0 
gen wordcount = wordcount(whatever) 
su wordcount, meanonly 
local J = r(max) 
quietly foreach w in cat cats { 
    forval j = 1/`J' { 
        replace wanted2 = 1 if word(lower(whatever), `j') == "`w'" 
    }
} 

gen what = substr(whatever, 1, 40) 
list wanted? what, sep(0) 

     +--------------------------------------------------------------+
     | wanted1   wanted2                                       what |
     |--------------------------------------------------------------|
  1. |       0         1   Cats are good pets, for they are clean a |
  2. |       0         0   Abstraction is often one floor above you |
  3. |       1         0   She wrote a long letter to Charlie, but  |
  4. |       0         0        Where do random thoughts come from? |
  5. |       1         0                      Mary plays the piano. |
  6. |       0         0          I want more detailed information. |
  7. |       0         0          I'd rather be a bird than a fish. |
  8. |       0         0   When I was little I had a car door slamm |
  9. |       1         0   Malls are great places to shop; John can |
 10. |       0         0   My Mum tries to be cool by saying that s |
     +--------------------------------------------------------------+