我准备了以下代码段:
import pandas as pd
time_series = pd.date_range('2018-01-01', periods=100, freq='ms')
df = pd.Series(range(len(time_series)), index=time_series)
print(df)
df = df.drop(df.between_time("00:00:00.003", "00:00:00.098").index)
过滤的时间范围应该与日期无关,并且仅考虑小时差异。我应该如何删除图中显示的“放置”部分中的不必要数据?循环应该一直持续到包含大约1亿行的数据帧结束为止。
答案 0 :(得分:0)
您可以尝试:
i = pd.date_range('2018-01-01', periods=100, freq='ms')
df = pd.DataFrame({'A': range(100)}, index=i)
df.drop(df.between_time(*pd.to_datetime(['00:00:00.003', '00:00:00.098']).time).index, inplace=True)
结果:
A
2018-01-01 00:00:00.000 0
2018-01-01 00:00:00.001 1
2018-01-01 00:00:00.002 2
2018-01-01 00:00:00.099 99
答案 1 :(得分:0)
根据您的系列,此代码将每行间隔3分钟
import pandas as pd
time_series = pd.date_range('2018-01-01', periods=100000000, freq='ms')
df = pd.Series(range(len(time_series)), index=time_series)
df2=df.to_frame()
df2.columns = ['every_3rd_minute']
df2 = df2[df2.every_3rd_minute % 180000 == 0]
print(df2)
产生
every_3rd_minute
2018-01-01 00:00:00 0
2018-01-01 00:03:00 180000
2018-01-01 00:06:00 360000
2018-01-01 00:09:00 540000
2018-01-01 00:12:00 720000
... ...
2018-01-02 03:33:00 99180000
2018-01-02 03:36:00 99360000
2018-01-02 03:39:00 99540000
2018-01-02 03:42:00 99720000
2018-01-02 03:45:00 99900000
[556 rows x 1 columns]