我在Excel中有一些日期输入,如下所示。
Case Number Status Date/Time Opened Date/Time Resolved Date/Time Closed
1 Closed 2/1/2017 7:15 AM 2/1/2017 10:44 AM 2/21/2017 11:50 AM
2 Assigned 2/2/2017 2:09 PM
3 Resolved 2/8/2017 10:32 AM 9/11/2017 8:49 PM
4 Closed 8/27/2018 6:00 AM 10/15/2018 9:10 AM 10/15/2018 9:10 AM
5 Resolved 12/26/2018 3:25 PM 2/11/2019 9:08 AM
最初,我将它们从上述模式转换为$year-$mm-$dd
。
Case Number Status Date/Time Opened Date/Time Resolved Date/Time Closed
1 Closed 2017-02-01 2017-02-01 2017-02-21
2 Assigned 2017-02-02 NaN NaN
3 Resolved 2017-02-08 2017-09-11 NaN
4 Closed 2018-08-27 2018-10-15 2018-10-15
5 Resolved 2018-12-26 2019-02-11 NaN
使用这些转换后的日期,我尝试提取$mon $year
格式的月份和年份。
我正在使用以下代码提取月份和年份。
df['Month Opened'] = pd.to_datetime(df["Date/Time Opened"]).map(lambda x: calendar.month_abbr[x.month] + " " + str(x.year))
在“打开日期/时间”中应用了此公式后,我可以看到它的工作原理如下。
Case Number Status Date/Time Opened Date/Time Resolved Date/Time Closed Month Opened
1 Closed 2017-02-01 2017-02-01 2017-02-21 Feb 2017
2 Assigned 2017-02-02 NaN NaN Feb 2017
3 Resolved 2017-02-08 2017-09-11 NaN Feb 2017
4 Closed 2018-08-27 2018-10-15 2018-10-15 Aug 2018
5 Resolved 2018-12-26 2019-02-11 NaN Dec 2018
这是我的完整代码-http://tpcg.io/X5S8Pe
import pandas as pd
import calendar
CaseDetails = {
'Case Number': [1, 2, 3, 4, 5],
'Status': ['Closed', 'Assigned', 'Resolved', 'Closed', 'Resolved'],
'Date/Time Opened': ['2/1/2017 7:15 AM', '2/2/2017 2:09 PM', '2/8/2017 10:32 AM', '8/27/2018 6:00 AM', '12/26/2018 3:25 PM'],
'Date/Time Resolved': ['2/1/2017 10:44 AM', '', '9/11/2017 8:49 PM', '10/15/2018 9:10 AM', '2/11/2019 9:08 AM'],
'Date/Time Closed': ['2/21/2017 11:50 AM', '', '', '10/15/2018 9:10 AM', '']
}
df = pd.DataFrame(CaseDetails,columns= ['Case Number', 'Status', 'Date/Time Opened', 'Date/Time Resolved', 'Date/Time Closed'])
df['Date/Time Opened'] = pd.to_datetime(df['Date/Time Opened']).dt.date
df['Date/Time Resolved'] = pd.to_datetime(df['Date/Time Resolved']).dt.date
df['Date/Time Closed'] = pd.to_datetime(df['Date/Time Closed']).dt.date
print (df)
df['Month Opened'] = pd.to_datetime(df["Date/Time Opened"]).map(lambda x: calendar.month_abbr[x.month] + " " + str(x.year))
df['Month Closed'] = pd.to_datetime(df["Date/Time Closed"]).map(lambda x: calendar.month_abbr[x.month] + " " + str(x.year))
print (df)
按预期,我的代码将“打开日期/时间”下的条目转换为所需的格式。尝试转换其他2个日期列时,出现以下错误。
Traceback (most recent call last):
File "main.py", line 21, in <module>
df['Month Closed'] = pd.to_datetime(df["Date/Time Closed"]).map(lambda x: calendar.month_abbr[x.month] + " " + str(x.year))
File "/usr/lib64/python2.7/site-packages/pandas/core/series.py", line 2158, in map
new_values = map_f(values, arg)
File "pandas/_libs/src/inference.pyx", line 1569, in pandas._libs.lib.map_infer (pandas/_libs/lib.c:66440)
File "main.py", line 21, in <lambda>
df['Month Closed'] = pd.to_datetime(df["Date/Time Closed"]).map(lambda x: calendar.month_abbr[x.month] + " " + str(x.year))
File "/usr/lib64/python2.7/calendar.py", line 56, in __getitem__
funcs = self._months[i]
TypeError: list indices must be integers, not float
我想知道是否有一种方法可以用空值隐藏列?
答案 0 :(得分:0)
这里可以使用Series.dt.strftime
-在缺少值的情况下效果很好:
df['Date/Time Opened'] = pd.to_datetime(df['Date/Time Opened']).dt.strftime('%b %Y')
df['Date/Time Resolved'] = pd.to_datetime(df['Date/Time Resolved']).dt.strftime('%b %Y')
df['Date/Time Closed'] = pd.to_datetime(df['Date/Time Closed']).dt.strftime('%b %Y')
替代方法是将apply
与列列表一起使用:
cols = ['Date/Time Opened','Date/Time Resolved','Date/Time Closed']
df[cols] = df[cols].apply(lambda x: pd.to_datetime(x).dt.strftime('%b %Y'))
print (df)
Case Number Status Date/Time Opened Date/Time Resolved Date/Time Closed
0 1 Closed Feb 2017 Feb 2017 Feb 2017
1 2 Assigned Feb 2017 NaT NaT
2 3 Resolved Feb 2017 Sep 2017 NaT
3 4 Closed Aug 2018 Oct 2018 Oct 2018
4 5 Resolved Dec 2018 Feb 2019 NaT
在您的解决方案中可以使用技巧np.nan != np.nan
,因此在您的函数中添加了if-else
语句:
f = lambda x: calendar.month_abbr[x.month] + " " + str(x.year) if x == x else np.nan
df['Date/Time Opened'] = pd.to_datetime(df['Date/Time Opened']).map(f)
df['Date/Time Resolved'] = pd.to_datetime(df['Date/Time Resolved']).map(f)
df['Date/Time Closed'] = pd.to_datetime(df['Date/Time Closed']).map(f)
print (df)
Case Number Status Date/Time Opened Date/Time Resolved Date/Time Closed
0 1 Closed Feb 2017 Feb 2017 Feb 2017
1 2 Assigned Feb 2017 NaN NaN
2 3 Resolved Feb 2017 Sep 2017 NaN
3 4 Closed Aug 2018 Oct 2018 Oct 2018
4 5 Resolved Dec 2018 Feb 2019 NaN
或替代:
f = lambda x: calendar.month_abbr[x.month] + " " + str(x.year) if x == x else np.nan
cols = ['Date/Time Opened','Date/Time Resolved','Date/Time Closed']
df[cols] = df[cols].apply(lambda x: pd.to_datetime(x).map(f))