我有一个包含几列的数据框,我想获取包含时间的两列之间的时间差。首先,我已经使用pd.to_datetime将两列转换为DateTime对象,但是当我减去这两列并将结果分配给新列时,结果以NaN值结束。
ops_data_clean_1.loc['Package committed-time'] =
pd.to_datetime(ops_data_clean_1['Package committed-time'])
ops_data_clean_1.loc['Flight launched-time'] =
pd.to_datetime(ops_data_clean_1['Flight launched-time'])
ops_data_clean_1['time_to_launch'] = ops_data_clean_1.loc['Flight
launched-time'] - ops_data_clean_1.loc['Package committed-time']
ops_data_clean_1.head()
答案 0 :(得分:1)
我认为您的麻烦在于您使用的.loc函数。
.loc ['Package commit-time']基本上说,选择具有值'Package commit-time'的ROW,没有。
但是您要选择具有该名称的列。使用简单的ops_data_clean_1 ['包装承诺时间']访问列或ops_data_clean_1.loc [:,'包装承诺时间']
有关.loc的更多信息,请访问:enter link description here
答案 1 :(得分:1)
我认为您的问题是仅从数据框中访问一列时使用loc
。您只需从代码中删除loc
即可解决此问题。
请参见以下玩具示例,
ops_data_clean_1 = pd.DataFrame()
ops_data_clean_1['Package committed-time'] = ['2018-01-01 00:00:30', '2018-01-01 00:49:00', '2018-03-01 00:00:45']
ops_data_clean_1['Flight launched-time'] = ['2018-01-01 01:00:30', '2018-01-01 02:49:00', '2018-03-01 00:54:45']
ops_data_clean_1['Package committed-time'] = pd.to_datetime(ops_data_clean_1['Package committed-time'])
ops_data_clean_1['Flight launched-time'] = pd.to_datetime(ops_data_clean_1['Flight launched-time'])
ops_data_clean_1['time_to_launch'] = ops_data_clean_1['Flight launched-time'] - ops_data_clean_1['Package committed-time']
ops_data_clean_1.head()
# Output
Package committed-time Flight launched-time time_to_launch
0 2018-01-01 00:00:30 2018-01-01 01:00:30 01:00:00
1 2018-01-01 00:49:00 2018-01-01 02:49:00 02:00:00
2 2018-03-01 00:00:45 2018-03-01 00:54:45 00:54:00
如果要使用loc
,则必须使用:
选择数据框的所有行,例如ops_data_clean_1.loc[:, 'Flight launched-time']
然后代码变成
ops_data_clean_1 = pd.DataFrame()
ops_data_clean_1['Package committed-time'] = ['2018-01-01 00:00:30', '2018-01-01 00:49:00', '2018-03-01 00:00:45']
ops_data_clean_1['Flight launched-time'] = ['2018-01-01 01:00:30', '2018-01-01 02:49:00', '2018-03-01 00:54:45']
ops_data_clean_1.loc[:, 'Package committed-time'] = pd.to_datetime(ops_data_clean_1['Package committed-time'])
ops_data_clean_1.loc[:, 'Flight launched-time'] = pd.to_datetime(ops_data_clean_1['Flight launched-time'])
ops_data_clean_1['time_to_launch'] = ops_data_clean_1.loc[:, 'Flight launched-time'] - ops_data_clean_1.loc[:, 'Package committed-time']
ops_data_clean_1.head()
# Output
Package committed-time Flight launched-time time_to_launch
0 2018-01-01 00:00:30 2018-01-01 01:00:30 01:00:00
1 2018-01-01 00:49:00 2018-01-01 02:49:00 02:00:00
2 2018-03-01 00:00:45 2018-03-01 00:54:45 00:54:00