我正在使用时间戳数据集。我必须计算观察的连续差异(时间戳)。时间戳是datetime64[ns]
类型。 dfnew
是pandas数据帧。
dfnew['timestamp'] = dfnew['timestamp'].astype('datetime64[ns]')
dfnew['dates']=dfnew['timestamp'].map(Timestamp.date)
uniqueDates=list(set(dfnew['dates']))#unique values of date in a list
#making a numpy array of timestamp for a particular date
x = np.array(dfnew[dfnew['dates']==uniqueDates[0]]['timestamp'])
y = np.ediff1d(x) #calculating consecutive difference of timestamp
print max(y)
49573580000000 nanoseconds
print min(y)
-86391523000000 nanoseconds
print y[1:20]
[ 92210000000 388030000000 0 211607000000 249337000000
19283000000 91407000000 120180000000 240050000000 30406000000
0 480337000000 13000000 491424000000 0
80980000000 388103000000 88850000000 120333000000]
dfnew['timestamp][0:20]
0 2013-12-19 09:03:21.223000
1 2013-12-19 11:34:23.037000
2 2013-12-19 11:34:23.050000
3 2013-12-19 11:34:23.067000
4 2013-12-19 11:34:23.067000
5 2013-12-19 11:34:23.067000
6 2013-12-19 11:34:23.067000
7 2013-12-19 11:34:23.067000
8 2013-12-19 11:34:23.067000
9 2013-12-19 11:34:23.080000
10 2013-12-19 11:34:23.080000
11 2013-12-19 11:34:23.080000
12 2013-12-19 11:34:23.080000
13 2013-12-19 11:34:23.080000
14 2013-12-19 11:34:23.080000
15 2013-12-19 11:34:23.097000
16 2013-12-19 11:34:23.097000
17 2013-12-19 11:34:23.097000
18 2013-12-19 11:34:23.097000
19 2013-12-19 11:34:23.097000
Name: timestamp
我有什么方法可以在hour
而不是nanoseconds
中获得输出。我可以使用普通分区转换它,但我正在寻找其他替代方案。
此外,当我将其保存到txt文件'纳秒'时,也存在。如何从保存到txt文件中删除此单元我只想保存该号码。
任何帮助表示赞赏
答案 0 :(得分:2)
尝试Series.diff()
:
import pandas as pd
import io
txt = """0 2013-12-19 09:03:21.223000
1 2013-12-19 11:34:23.037000
2 2013-12-19 11:34:23.050000
3 2013-12-19 11:34:23.067000
4 2013-12-19 11:34:23.067000
5 2013-12-19 11:34:23.067000
6 2013-12-19 11:34:23.067000
7 2013-12-19 11:34:23.067000
8 2013-12-19 11:34:23.067000
9 2013-12-19 11:34:23.080000
10 2013-12-19 11:34:23.080000
11 2013-12-19 11:34:23.080000
12 2013-12-19 11:34:23.080000
13 2013-12-19 11:34:23.080000
14 2013-12-19 11:34:23.080000
15 2013-12-19 11:34:23.097000
16 2013-12-19 11:34:23.097000
17 2013-12-19 11:34:23.097000
18 2013-12-19 11:34:23.097000
19 2013-12-19 11:34:23.097000
"""
s = pd.read_csv(io.BytesIO(txt), delim_whitespace=True, parse_dates=[[1,2]], header=None, index_col=1, squeeze=True)
s.diff()
结果:
0 NaT
1 02:31:01.814000
2 00:00:00.013000
3 00:00:00.017000
4 00:00:00
5 00:00:00
6 00:00:00
7 00:00:00
8 00:00:00
9 00:00:00.013000
10 00:00:00
11 00:00:00
12 00:00:00
13 00:00:00
14 00:00:00
15 00:00:00.017000
16 00:00:00
17 00:00:00
18 00:00:00
19 00:00:00
Name: 1_2, dtype: timedelta64[ns]