Question

我有一个python脚本，可从csv文件中提取数据进行处理。我正在使用的数据看起来像这样

Time.     Col1.     Col2.     Col3.     Important
 0.1       .02       .03       .02      .02
 0.2.      .03       .03       .02      .03

该脚本提取“时间”和“重要”列进行处理。当所有列均存在时，脚本可以使用

正常运行

# specify the row headers seven lines down the file
    names = ['Time (s)', 'Col1', 'Col2', 'Col3', 'Important']
    df = pd.read_csv(fle,
                     delim_whitespace=True,
                     error_bad_lines=False,
                     encoding='utf-8',
                     names=names,
                     skiprows=7)

我的问题是，在生成文件时，用户可以拒绝某些列，这意味着我可以少增加一列。

Time.     Col1.    Col3.     Important
 0.1       .02       .03       .02
 0.2.      .03       .03       .03

但是，由于我的“名称”列表具有文件中可能包含的所有标题，因此脚本无法识别列标题，也不会处理这些文件。（它不会产生错误，只是生成的图是空白的，好像没有数据一样。）

此脚本会循环访问50个左右的文件文件夹，该文件包含3到9列之间的任意位置，但其中两个始终是“时间”和“重要”。

是否可以告诉熊猫“名称”列表中的某些或所有列标题？

我尝试仅将“名称”缩短为“时间”和“重要”，但是稍后在我必须索引“时间”列的特定位置时，这会在代码中产生错误。

Traceback (most recent call last):
  File "...inter-through-filesv2.py", line 55, in <module>
    loop_start = df[df['Time (s)'] == start].index.item()
  File     "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/core/base.py", line 719, in item
    return self.values.item()
ValueError: can only convert an array of size 1 to a Python scalar

Answer 1

有没有办法告诉熊猫，并不是所有的列标题都在可能会出现“名称”列表？

您可以简单地省略names参数：

df = pd.read_csv(fle, delim_whitespace=True, error_bad_lines=False,
                 encoding='utf-8', skiprows=7)

然后在后续步骤中分配列名：

n = len(df.columns)
df.columns = ['Time (s)'] + [f'Col{i}' for i in range(1, n-1)] + ['Important']

使用Pandas读取可变列标题时出现问题

1 个答案: