Question

我事先知道excel文件中不需要哪些列，我想在阅读文件时避免使用它们以提高性能。像这样：

const byte_word BYTES_TO_WORDS[] = {
    {0xB0, "zero"},
    // ...
};

// uninitialized array
__device__
byte_word DEV_BYTES_TO_WORDS[sizeof BYTES_TO_WORDS / sizeof(byte_word)];

// at startup, use `cudaMemCpyToSymbol()` to populate `DEV_BYTES_TO_WORDS`
// from `BYTES_TO_WORDS`.

文档中没有与此相关的内容。对此有什么解决方法吗？

Answer 1

您可以使用以下技术：

In [7]: cols2skip = [2,5,8]

In [8]: cols = [i for i in range(10) if i not in cols2skip]

In [9]: cols
Out[9]: [0, 1, 3, 4, 6, 7, 9]

然后

df = pd.read_excel(filename, usecols=cols)

Answer 2

如果您的熊猫版本允许（请先检查是否可以将函数传递给usecols），我会尝试以下方法：

import pandas as pd
df = pd.read_excel('large_excel_file.xlsx', usecols=lambda x: 'Unnamed' not in x,)

这应该跳过所有没有标题名称的列。您可以将“未命名”替换为不需要的列名列表。

在阅读excel框架时跳过特定的一组列 - pandas

2 个答案: