Question

我有一些凌乱的Excel电子表格，并希望将少量单元格导入数据框。包含我不感兴趣的信息（“。”）和包含空格（〜）的单元格的行数和列数从电子表格到电子表格各不相同。

以下是电子表格（输入）的示例：

~   ~   ~   ~   
.   .   .   .   .
~   .   .   .   
.   .   .   .   .
~   ~   ~       
~   Name    ID  Description Notes
12  a   AA  aA  None
3   b   BB  bB  sentence one
44  c   CC  cC  None
9   d   DD  dD  Int

我的逻辑是否正确我需要先将完整的电子表格导入数据框，然后将该数据帧解析为唯一字符串（名称）作为标题行，使用哪个数据帧？< / p>

这是我导入的方式，但我仍坚持解析第二个数据帧：

import pandas
file = 'example.xlsx'
xl = pandas.ExcelFile(file, dtype=str) #some cells have values and want just as strings
df = xl.parse("Sheet1")
for index, row in df.iterrows():
    for cell in row:
        if 'Name' in str(cell):
            header_start_row = index
            break

所需的pandas数据帧（输出）：

Name    ID  Description Notes
a   AA  aA  None
b   BB  bB  sentence one
c   CC  cC  None
d   DD  dD  Int

Answer 1

使用pandas.read_excel功能导入Excel工作表。它有一个可选的输入参数skiprows，允许您指定应忽略的顶部行。

import pandas as pd

file = 'example.xlsx'
skiprows_amount = 5
df = pd.read_excel(file, skiprows=range(skiprows_amount), dtype=str)

注意：此解决方案有一个限制，您必须提前知道要跳过的行数。

导入Excel数据帧

1 个答案: