假设我有一个看起来像这样的数据框:
df2 = pd.DataFrame(['Apple, 10/01/2016, 31/10/18, david/kate', 'orange', 'pear', 'Apple', '10/01/2016', '02/20/2017'], columns=['A'])
>>> df2
A file_name
0 Apple, 10/01/2016, 31/10/18, david/kate a.txt
1 orange a.txt
2 pear b.txt
3 Apple a.txt
4 10/01/2016 d.txt
5 02/20/2017 e.txt
我想要的只是提取此数据框中的日期,因此输出将如下所示:
A file_name
0 10/01/2016, 31/10/18 a.txt
1 Nothing to return a.txt
2 Nothing to return b.txt
3 Nothing to return a.txt
4 10/01/2016 d.txt
5 02/20/2017 e.txt
有人对此有任何建议吗?我不确定从哪里开始。
编辑#1:
我编辑了原始数据框并输出了结果,以更好地反映我的需求。
答案 0 :(得分:2)
与您期望的输出不完全匹配,但是这种结构可能更好,可以轻松转换为所需的内容。
基本上,这是正则表达式的工作。此代码应该找到数字/数字/数字形式的任何内容:
SELECT t1.number,
t1.tagvalue
FROM elbat t1
WHERE t1.tagvalue = 'MLB'
AND EXISTS (SELECT *
FROM elbat t2
WHERE t2.number = t1.number
AND t2.tagvalue = 'NFL')
OR t1.tagvalue = 'NFL'
AND EXISTS (SELECT *
FROM elbat t2
WHERE t2.number = t1.number
AND t2.tagvalue = 'MLB');
答案 1 :(得分:1)
使用extractall
添加reindex(df2.index).fillna('Nothing to return')
df2.A.str.extractall(r'(((?:\d+[/-])?\d+[/-]\d+))')[0].groupby(level=0).apply(','.join)
Out[459]:
0 10/01/2016,31/10/18
4 10/01/2016
5 02/20/2017
Name: 0, dtype: object
更新
df2.A.str.extractall(r'(((?:\d+[/-])?\d+[/-]\d+))')[0].groupby(level=0).apply(','.join).reindex(df2.index).fillna('Nothing to return')
Out[463]:
0 10/01/2016,31/10/18
1 Nothing to return
2 Nothing to return
3 Nothing to return
4 10/01/2016
5 02/20/2017
Name: 0, dtype: object
答案 2 :(得分:1)
import datetime
import re
def my_func(row):
temp=''
for d in row.split(","):
match=re.match('(\d*/\d*/\d*)',d.strip())
if match:
temp =temp + match.group(0)+','
if(temp):
return temp[:-1]
return "Nothing to return"
df2.A=df2.A.apply(lambda x : my_func(x))
输出:
A file_name
0 10/01/2016, 31/10/18 a.txt
1 Nothing to return a.txt
2 Nothing to return b.txt
3 Nothing to return a.txt
4 10/01/2016 d.txt
5 02/20/2017 e.txt