Question

我想通过迭代从特定行号开始的行来从pandas数据帧中读取数据。我知道那里有df.iterrows()，但它并没有让我指明我想要开始迭代的地方。

在我的具体情况下，我有一个csv文件可能看起来像这样：

Date, Temperature
21/08/2017 17:00:00,5.53
21/08/2017 18:00:00,5.58
21/08/2017 19:00:00,4.80
21/08/2017 20:00:00,4.59
21/08/2017 21:00:00,3.72
21/08/2017 22:00:00,3.95
21/08/2017 23:00:00,3.11
22/08/2017 00:00:00,3.07
22/08/2017 01:00:00,2.80
22/08/2017 02:00:00,2.75
22/08/2017 03:00:00,2.79
22/08/2017 04:00:00,2.76
22/08/2017 05:00:00,2.76
22/08/2017 06:00:00,3.06
22/08/2017 07:00:00,3.88

我想从特定时间点开始遍历每一行（让我们说8月22日午夜），所以我尝试这样实现：

df = pandas.read_csv('file.csv')
start_date = '22/08/2017 00:00:00'

// since it's sorted, I figured I could use binary search
result = pandas.Series(df['Date']).searchsorted(start_date)

result[0]实际上给了我正确的数字。

我想我能做的就是增加该数字并通过df.iloc[[x]]访问该行，但我觉得这样做很脏。

for x in range(result[0], len(df)):
    row = df.loc[[x]]

到目前为止我找到的所有答案只显示如何迭代整个表格。

Answer 1

将Date变为datetime。将Date设为index：

df.Date = pd.to_datetime(df.Date)

df = df.set_index('Date')

然后：

for date, row in df['22/08/2017 00:00:00':].iterrows():
    print(date.strftime('%c'), row.squeeze())

Tue Aug 22 00:00:00 2017 3.07
Tue Aug 22 01:00:00 2017 2.8
Tue Aug 22 02:00:00 2017 2.75
Tue Aug 22 03:00:00 2017 2.79
Tue Aug 22 04:00:00 2017 2.76
Tue Aug 22 05:00:00 2017 2.76
Tue Aug 22 06:00:00 2017 3.06
Tue Aug 22 07:00:00 2017 3.88

Answer 2

只需在调用iterrows()之前过滤您的数据框：

df['Date'] = pandas.to_datetime(df['Date'])
for idx, row in df[df['Date'] >= '2017-08-22'].iterrows():
    #
    # Whatever you want to do in the loop goes here
    #

请注意，将过滤参数'2017-08-22'转换为datetime对象并不是必需的，因为Pandas可以处理partial string indexing。

Pandas迭代指定行号的行

2 个答案: