Question

使用熊猫，我想在遍历数据帧时执行以下操作：

    for body_part, columns in zip(self.body_parts, usecols_gen()):
        body_part_df = self.read_csv(usecols=columns)
        if self.normalize:
            body_part_df[r'x(\.\d)?'] = body_part_df[r'x(\.\d)?'].apply(lambda x: x/x_max)
        print(body_part_df)
        result[body_part] = body_part_df

我使用正则表达式是因为我所引用的列名被弄乱了：x，x.1，x.2，...，x.n

这给出了KeyError，我不明白原因。请帮忙。预先感谢。

Answer 1

您无法使用正则表达式查询DataFrame列，您可以做的是对其进行迭代，然后将功能应用于匹配的列，即：

import re

    # ...

    for body_part, columns in zip(self.body_parts, usecols_gen()):
        body_part_df = self.read_csv(usecols=columns)
        if self.normalize:
            for column in body_part_df:
                if re.match(r"x(\.\d)?", column):  # or re.search() for partial matches
                    body_part_df[column] = body_part_df[column].apply(lambda x: x/x_max)
        print(body_part_df)
        result[body_part] = body_part_df

用正则表达式pandas调用特定的列

1 个答案: