如何使用dateparser从字符串中提取实际日期?

时间:2019-02-11 11:32:30

标签: python python-3.x string datetime tuples

问题

当我使用dateparser在字符串中搜索日期时,会得到一个元组,其中既包含日期,也包含字符串和datetime.datetime对象。我只想要该字符串,并且该字符串中有多个可能,每个分开。

关于如何将文本与结果隔离的任何想法-删除datetime.datetime对象?

原因:

我想使用变量然后在找到日期之前解析单词。

from dateparser.search import search_dates
para = search_dates("Competition opens 1/03/19 at 6:00 AM and closes 17/05/19 at 5:00 PM", settings={'STRICT_PARSING': True, 'DATE_ORDER': 'DMY'})
for x in para[0]:
    print (x)
    print(type(x))

我正在寻找的是'1/03 / 19,6:00 AM和'

输出:

1/03/19 at 6:00 AM and
<class 'str'>
2019-03-01 06:00:00
<class 'datetime.datetime'>

尝试

我尝试了以下方法:

第一:

from dateparser.search import search_dates
para = search_dates("Competition opens 1/03/19 at 6:00 AM and closes 17/05/19 at 5:00 PM", settings={'STRICT_PARSING': True, 'DATE_ORDER': 'DMY'})
for x in para[0]:
    date_time = x[0]
    date_string =  x[1]
    print(date_time)

输出:

TypeError: 'datetime.datetime' object is not subscriptable

而且,这个:

from dateparser.search import search_dates
para = search_dates("Competition opens 1/03/19 at 6:00 AM and closes 17/05/19 at 5:00 PM", settings={'STRICT_PARSING': True, 'DATE_ORDER': 'DMY'})
for x in para[0]:
    print (x(0))

输出:

TypeError: 'str' object is not callable

最后:

from dateparser.search import search_dates
para = search_dates("Competition opens 1/03/19 at 6:00 AM and closes 17/05/19 at 5:00 PM", settings={'STRICT_PARSING': True, 'DATE_ORDER': 'DMY'})
for x in para:
    date_string =  x[0]
    print(date_string)
    print(type(date_string))

输出:

1/03/19 at 6:00 AM and
<class 'str'>
17/05/19 at 5:00 PM
<class 'str'>

1 个答案:

答案 0 :(得分:0)

您已经指出,元组包含两个元素。字符串和日期时间对象。例如

('1/03/19 at 6:00 AM and', datetime.datetime(2019, 3, 1, 6, 0))
  • 您可以通过索引元组来仅隔离字符串。

例如

from dateparser.search import search_dates
para = search_dates("Competition opens 1/03/19 at 6:00 AM and closes 17/05/19 at 5:00 PM", settings={'STRICT_PARSING': True, 'DATE_ORDER': 'DMY'})
for x in para:
    date_string =  x[0]
    print(date_string)

您可能还希望从文本中删除和。。您可以通过剥离来做到这一点。即

date_string = x[0].strip('and')

输出

1/03/19 at 6:00 AM 
17/05/19 at 5:00 PM

如果您只想使用字符串并且要完全放弃日期时间,请使用列表推导来创建 para 变量。在下面的示例中,para填充的只是字符串列表而不是元组。日期时间被完全丢弃

para = [d[0] for d in search_dates("Competition opens 1/03/19 at 6:00 AM and closes 17/05/19 at 5:00 PM", settings={'STRICT_PARSING': True, 'DATE_ORDER': 'DMY'})]
print(para)
# Output is just a 1D list of strings
# ['1/03/19 at 6:00 AM and', '17/05/19 at 5:00 PM']
print(para[0].strip('and'))
# Output is first string in the list with 'and' stripped off
# 1/03/19 at 6:00 AM