从不同变量的每个句子中获取两个日期

时间:2018-04-22 17:34:47

标签: stata

我现在有以下文字:

Two important events took place on 19/11/1923 and 30/02/1934 respectively. 

我想提取两个dates,但我希望它们保存在不同的变量中。

我已经尝试了我previous question中描述的regex解决方案,但在这种情况下,它没有按预期工作。

是否可以保存两个日期?

1 个答案:

答案 0 :(得分:2)

每当您提出问题时,提供您尝试过的代码和reproducible example都很重要。有关如何提出好问题的提示,请阅读this page

考虑您当前和以前的例子:

clear

input str80 string
"This sentence contains a certain date which is 06-08-2003."
"Two important events took place on 19-11-1923 and 30-02-1934 respectively."
"On this date, 29-12-1945 my grandmother was born."
"12-04-1997 was an important year for celebrations."
end

list string

   +----------------------------------------------------------------------------+
   |                                                                     string |
   |----------------------------------------------------------------------------|
1. |                 This sentence contains a certain date which is 06-08-2003. |
2. | Two important events took place on 19-11-1923 and 30-02-1934 respectively. |
3. |                          On this date, 29-12-1945 my grandmother was born. |
4. |                         12-04-1997 was an important year for celebrations. |
   +----------------------------------------------------------------------------+

是的,可以在regex循环中将assertfor合并来提取这两个日期:

clonevar temp_string = string
generate date1 = ""
generate date2 = ""

local reg_ex "(0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])[- /.](19|20)([0-9][0-9])"

forvalues i = 1 / 4 {
    local dates
    local j = 0
    while `j' == 0 {
        capture assert regex(temp_string[`i'],"`reg_ex'")

        if _rc == 0 {
            local dates = "`dates' " + regexs(1) + "-" + regexs(2) + "-" + regexs(3) + regexs(4)
            replace temp_string = regexr(temp_string[`i'], "`reg_ex'", "null") in `i'
        }

        else {
            local dates_n : word count `dates' 

            if `dates_n' == 1 {
                replace date1 = trim("`dates'") in `i'
            }

            else {
                tokenize `dates'
                replace date1 = "`1'" in `i'
                replace date2 = "`2'" in `i'
            }

            local j = 1
        }
    }
}

drop temp_string

这段代码实际上是做什么的,检查每个string是否包含多个日期。如果False,则会将日期保存在变量date1中。如果True,则第二个日期会保存在单独的变量date2中。在这种情况下:

list date1 date2

   +-------------------------+
   |      date1        date2 |
   |-------------------------|
1. | 06-08-2003              |
2. | 19-11-1923   30-02-1934 |
3. | 29-12-1945              |
4. | 12-04-1997              |
   +-------------------------+

您可以轻松调整此示例以提取更多日期。