Question

我有5000行的excel文件，每行是17000列，可以使用python / pandas拆分该文件，因为现在当我尝试读取excel时，它返回MemoryError 如果我能以某种方式读取文件，则可以减少列

drop(list(myFile.filter(regex=r'(x|y)')))

有人可以帮我怎么做吗？

Answer 1

查看read_excel中的usecols参数

Answer 2

在熊猫中，您需要设置参数，并且应该为csv列指定数据类型。例如

low_memory =错误

df = pd.read_csv("YOURFILENAME.csv", delimiter = '|',error_bad_lines=False, 
                 index_col=False, 
                 dtype='unicode') # , # This or the other one
                 #dtype={"user_id": int, "username": "string"}, low_memory = False)

最佳做法是为您的各个列指定数据类型，以防万一，因为您的案例中有许多列。您可以简单地使用Try，除了第二列，然后遍历值（如果string具有string，如果int8具有int 8和int64具有相同的方式）

编辑：在read_excel的情况下指定Unicode

是否可以根据列将Excel文件拆分为多个切片？

2 个答案: