Question

我得到的一个奇怪的行为是我以前从未见过的代码，我以前在较小的数据集上使用了很多次。我正在使用Pandas dataframe read_table解析VCF文件。 VCF文件有一个标题，然后是9列+但是很多列的个人。在我使用new SelectList()遍历数据帧的每一行之前，我可以使用DropDownList()调用列“SVLEN”。当我检查for row in genomes_df.itertuples():时，它是一个Pandas对象。今天我在相同VCF格式的较大文件（350列与之前的10列）上运行我的脚本，它给了我row.SVLEN，因为现在type(row)是一个元组！

这里发生了什么？列名不同（AttributeError: 'tuple' object has no attribute 'SVLEN' vs type(row)）但我检查了名称中没有空格（在另一篇文章中读到它可能导致不同的行为）。

Answer 1

iterttuples documentation中提到了

使用大量列（> 255），将返回常规元组。

您可以在源代码here中看到：

        # Python 3 supports at most 255 arguments to constructor, and
        # things get slow with this many fields in Python 2
        if name is not None and len(self.columns) + index < 256:
            # `rename` is unsupported in Python 2.6
            try:
                itertuple = collections.namedtuple(name,
                                                   fields + list(self.columns),
                                                   rename=True)
                return map(itertuple._make, zip(*arrays))
            except Exception:
                pass

注意：对cpython调用/ namedtuples的255个参数的限制为fixed in python 3.7，因此可能会在将来的pandas版本中更改（在python 3.7 +上运行）。

Pandas itertuple返回不一致的类型，Pandas或tuple

1 个答案: