我有几个句子,并且每个句子都放在数据帧的一行中。我正在从这些句子中提取日期。我碰到了“ datefinder”包。
当我向“ string_with_dates”发送单个句子时,它会正确提取所有日期并返回。
import datefinder
string_with_dates = ''' They have released Proposals for period October 1, 2018 ’ September 30, 2019. Manufacturers are encouraged to submit proposals for stores located basis throughout the fiscal year ending September 30, 2018, pending availability of funds., '''
matches = datefinder.find_dates(string_with_dates)
for match in matches:
match = str(match)
print(match)
output = 2018-10-01 00:00:00
2019-09-30 00:00:00
2018-09-30 00:00:00
但是,当我放置一个数据框的多个句子并使用“ for”循环进行循环时,它变得一团糟。它不会在数据框的单元格中正确显示多个日期(如果有)。 description_df是我的数据框的名称。在第9列中,有句子,在第13列中,我希望存储提取的日期。
import datefinder
for i in range (len(description_df)):
string_with_dates = description_df.iloc[i,9]
matches = datefinder.find_dates(string_with_dates)
for match in matches:
match = str(match)
print(match)
description_df.iloc[i,13] = match
Output of the extracted date column of the dataframe is:
2019-09-30 00:00:00
2019-05-07 00:00:00
""
0310-08-07 00:00:00
2019-08-07 00:00:00