Question

我正在尝试使用pandas.read_excel读取excel表。它的skiprows参数允许通过提供行号来跳过行。但是，我们如何根据模式匹配跳过行？我有不同的Excel工作表，其中我需要跳过的行数是可变的，因此提供行数对我的用例不起作用。有没有办法可以提供一种模式 - 例如在包含特定字符串的行（例如“Test”）之前跳过所有行？如果使用pandas read_excel无法实现这一点，是否有另一种解决方法可以通过这种方式将excel读入数据帧？任何建议将不胜感激。感谢。

Answer 1

我的建议是将整个Excel工作表读入数据框，然后删除不需要的行。举个简单的例子：

import pandas as pd

# Read out first sheet of excel workbook
df = pd.read_excel('workbook.xlsx')

# Find label of the first row where the value 'Test' is found (within column 0)
row_label = (df.iloc[:, 0] == 'Test').idxmax()

# Drop all rows above the row with 'Test'
df = df.loc[row_label:, :]

如何使用pandas.read_excel基于正则表达式跳过行？

1 个答案: