如何在Pandas中使所有非日期值为null

时间:2015-09-05 03:15:02

标签: python pandas

我有一个excel doc,用户将日期和字符串放在同一列中。我想让每个字符串对象为null并保留所有日期。我怎么在熊猫里这样做?感谢。

1 个答案:

答案 0 :(得分:4)

在@Feff中提到的在DataFrame中转换日期的简单方法是pandas.DataFrame.convert_objects,它还处理数字和时间数据。以下是使用它的示例:

# contents of Sheet1 of test.xlsx
x  y             date1      z     date2      date3
1  fum        6/1/2016      7  9/1/2015    string3
2  fo         6/2/2016  alpha   string0  10/1/2016
3  fi         6/3/2016      9  9/3/2015  10/2/2016
4  fee        6/4/2016     10   string1    string4
5  dumbledum  6/5/2016   beta   string2  10/3/2015
6  dumbledee  6/6/2016     12  9/4/2015    string5

import pandas as pd
xl = pd.ExcelFile('test.xlsx')
df = xl.parse("Sheet1")
df1 = df.convert_objects(convert_dates='coerce')
# 'coerce' required for conversion to NaT on error
df1
Out[7]: 
   x          y      date1      z      date2      date3
0  1        fum 2016-06-01      7 2015-09-01        NaT
1  2         fo 2016-06-02  alpha        NaT 2016-10-01
2  3         fi 2016-06-03      9 2015-09-03 2016-10-02
3  4        fee 2016-06-04     10        NaT        NaT
4  5  dumbledum 2016-06-05   beta        NaT 2015-10-03
5  6  dumbledee 2016-06-06     12 2015-09-04        NaT

DataFrame中的各个列可以使用pandas.to_datetime进行转换,如@Jeff所指出的那样,以及使用pandas.Series.map,但是两者都没有完成。例如,使用pandas.to_datetime:

import pandas as pd
xl2 = pd.ExcelFile('test.xlsx')
df2 = xl2.parse("Sheet1")
for col in ['date1', 'date2', 'date3']:
    df2[col] = pd.to_datetime(df2[col],coerce=True, infer_datetime_format=True)
df2
Out[8]: 
   x          y      date1      z      date2      date3
0  1        fum 2016-06-01      7 2015-09-01        NaT
1  2         fo 2016-06-02  alpha        NaT 2016-10-01
2  3         fi 2016-06-03      9 2015-09-03 2016-10-02
3  4        fee 2016-06-04     10        NaT        NaT
4  5  dumbledum 2016-06-05   beta        NaT 2015-10-03
5  6  dumbledee 2016-06-06     12 2015-09-04        NaT

使用pandas.Series.map:

import pandas as pd
import datetime
xl3 = pd.ExcelFile('test.xlsx')
df3 = xl3.parse("Sheet1")
for col in ['date1', 'date2', 'date3']:
    df3[col] = df3[col].map(lambda x: x if isinstance(x,(datetime.datetime)) else None)
df3
Out[9]: 
   x          y      date1      z      date2      date3
0  1        fum 2016-06-01      7 2015-09-01        NaT
1  2         fo 2016-06-02  alpha        NaT 2016-10-01
2  3         fi 2016-06-03      9 2015-09-03 2016-10-02
3  4        fee 2016-06-04     10        NaT        NaT
4  5  dumbledum 2016-06-05   beta        NaT 2015-10-03
5  6  dumbledee 2016-06-06     12 2015-09-04        NaT

在excel文档中转换日期的一种前期方法是在解析其工作表时。这可以使用pandas.ExcelFile.parse的转换器选项来完成,其中函数派生自pandas.to_datetime,作为转换器中的函数dict并使用coerce = True启用它以强制错误到NaT。例如:

def converter(x):
    return pd.to_datetime(x,coerce=True,infer_datetime_format=True)
    # the following also works for this example
    # return pd.to_datetime(x,format='%d/%m/%Y',coerce=True)

converters={'date1': converter,'date2': converter, 'date3': converter}
xl4 = pd.ExcelFile('test.xlsx')
df4 = xl4.parse("Sheet1",converters=converters)
df4
Out[10]: 
   x          y      date1      z      date2      date3
0  1        fum 2016-06-01      7 2015-09-01        NaT
1  2         fo 2016-06-02  alpha        NaT 2016-10-01
2  3         fi 2016-06-03      9 2015-09-03 2016-10-02
3  4        fee 2016-06-04     10        NaT        NaT
4  5  dumbledum 2016-06-05   beta        NaT 2015-10-03
5  6  dumbledee 2016-06-06     12 2015-09-04        NaT