我在Python中有一个数据框,其中包含以下字段和类型:
dd = {'sp': ['1,1', '3,2'], 'tt': ['a', 'b'], 'zz': [4.5, 2.1]}
for k in dd:
for i in range(len(dd[k])):
if type(dd[k][i]) == str and ',' in dd[k][i]:
dd[k][i] = float(dd[k][i].replace(',', '.'))
print(dd)
我希望浏览我的数据框并将偏移量(以秒为单位)添加到已发送的字段中。
我该怎么做?我尝试过以下方法:
sent datetime64[ns]
tz_offset int64
但是失败了,出现以下错误:
files['sent'] = files['sent'].apply(lambda x: x + np.timedelta64(files['tz_offset']), 's')
编辑:使用Pandas
答案 0 :(得分:1)
矢量化方法是转换类型并添加df['sent'] + df['tz_offset'].astype("timedelta64[s]")
In [346]: df
Out[346]:
sent tz_offset
0 2011-01-01 00:00:00 2
1 2011-01-01 00:01:00 0
2 2011-01-01 00:02:00 4
3 2011-01-01 00:03:00 0
4 2011-01-01 00:04:00 4
5 2011-01-01 00:05:00 4
6 2011-01-01 00:06:00 4
7 2011-01-01 00:07:00 1
8 2011-01-01 00:08:00 4
9 2011-01-01 00:09:00 4
In [347]: df['sent'] + df['tz_offset'].astype("timedelta64[s]")
Out[347]:
0 2011-01-01 00:00:02
1 2011-01-01 00:01:00
2 2011-01-01 00:02:04
3 2011-01-01 00:03:00
4 2011-01-01 00:04:04
5 2011-01-01 00:05:04
6 2011-01-01 00:06:04
7 2011-01-01 00:07:01
8 2011-01-01 00:08:04
9 2011-01-01 00:09:04
dtype: datetime64[ns]
In [348]: df.dtypes
Out[348]:
sent datetime64[ns]
tz_offset int32
dtype: object
或在apply
np.timedelta64()
In [349]: df.apply(lambda x: x['sent'] + np.timedelta64(x['tz_offset'], 's'), axis=1)
Out[349]:
0 2011-01-01 00:00:02
1 2011-01-01 00:01:00
2 2011-01-01 00:02:04
3 2011-01-01 00:03:00
4 2011-01-01 00:04:04
5 2011-01-01 00:05:04
6 2011-01-01 00:06:04
7 2011-01-01 00:07:01
8 2011-01-01 00:08:04
9 2011-01-01 00:09:04
dtype: datetime64[ns]
或,使用pd.offsets.timedelta(seconds=)
又名pd.Timedelta(seconds=)
In [350]: df.apply(lambda x: x['sent'] + pd.Timedelta(seconds=x['tz_offset']), axis=1)
Out[350]:
0 2011-01-01 00:00:02
1 2011-01-01 00:01:00
2 2011-01-01 00:02:04
3 2011-01-01 00:03:00
4 2011-01-01 00:04:04
5 2011-01-01 00:05:04
6 2011-01-01 00:06:04
7 2011-01-01 00:07:01
8 2011-01-01 00:08:04
9 2011-01-01 00:09:04
dtype: datetime64[ns]