从变量中的每个句子中获取日期

时间:2018-04-22 13:17:15

标签: stata

我有以下文字:

This sentence contains a certain date which is 06-08-2003.  
On this date, 29-12-1945 my grandmother was born.  
12-04-1997 was an important year for celebrations.  

我想在date函数的变量中获取substr() 似乎不起作用?

1 个答案:

答案 0 :(得分:2)

您没有向我们展示您的代码,因此我们无法告诉您substr()的错误。 也就是说,如果您知道substr()中所需项目的位置,string函数将按预期工作。

在这种情况下,dates出现在每个string内的不同位置。获得所需输出的一种方法是使用strpos()函数来查找 连字符是。然后,您可以将其用作参考点来计算每个字符串中date的起始位置:

clear
set obs 3

input str60 string
"This sentence contains a certain date which is 06-08-2003."
"On this date, 29-12-1945 my grandmother was born."
"12-04-1997 was an important year for celebrations."
end

generate new_string = ""

forvalues i = 1 / 3 {
    local pos = strpos(string[`i'], "-") - 2
    replace new_string = substr(string, `pos', 10) in `i'
}


list string new_string

   +-------------------------------------------------------------------------+
   |                                                     string   new_string |
   |-------------------------------------------------------------------------|
1. | This sentence contains a certain date which is 06-08-2003.   06-08-2003 |
2. |          On this date, 29-12-1945 my grandmother was born.   29-12-1945 |
3. |         12-04-1997 was an important year for celebrations.   12-04-1997 |                                                                        
   +-------------------------------------------------------------------------+

此方法假定dates中的strings 一致 。也就是说,它们都具有相同的格式并且没有错误。但是,实际上通常情况并非如此。

获得所需输出的更好方法是使用regexregexs

generate new_string = regexs(1) + "-" + regexs(2) + "-" + regexs(3)+ regexs(4) if ///
regex(string,"(0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])[- /.](19|20)([0-9][0-9])")

以上 正则表达式 不仅可以找到每个date中的每个string,还可以使用一些逻辑条件检查是否前者是有效的。例如:

replace string = "On this date, 29-131945 my grandmother was born." in 2

drop new_string

generate new_string = regexs(1) + "-" + regexs(2) + "-" + regexs(3)+ regexs(4) if ///
regex(string,"(0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])[- /.](19|20)([0-9][0-9])")


list string new_string

   +-------------------------------------------------------------------------+
   |                                                     string   new_string |
   |-------------------------------------------------------------------------|
1. | This sentence contains a certain date which is 06-08-2003.   06-08-2003 |
2. |           On this date, 29-131945 my grandmother was born.              |
3. |         12-04-1997 was an important year for celebrations.   12-04-1997 |
   +-------------------------------------------------------------------------+

如您所见,如果第二个date中的string29-13-194529-131945,则相应的观察结果为空。因此,这种方法通常会阻止您获得非感性结果,同时还可以识别有问题的案例。

但请注意,即使这种方法也不是防弹的,您必须通过更改正则表达式来引入额外的灵活性 如果你想处理更复杂的案件。