Question

我需要以下问题的帮助。

我有多个CSV文件，如下所示

1.csv

2.csv

我希望每个第10行重新设置列Length并将其并排放在一个结果中。例如，下面是我想要的输出

[[12 23 44 34 11]    [[  52.1   32.2   44.6   99.1  122.3]
 [39 79 45 56 15]]    [  43.2   79.4   45.5   56.3   15.4]]
[[35 23 66 33 12]    [[ 35.7  23.7  66.7  33.8  12.9]
 [34 21 43 44 55]]    [ 34.8  21.6  43.7  44.2  55.8]]

我尝试使用以下脚本，但它给了我一个类型错误。

myscript.py

import pandas as pd
import glob

df = [pd.read_csv(filename) for filename in glob.glob("Users/Ling/workspace/testing/*.csv")]

start = 0
for i in range(0, len(df.index)):
    if (i + 1)%10 == 0:
        result = df['Length'].iloc[start:i+1].reshape(2,5)
        start = i + 1
        print result

错误

TypeError: object of type 'builtin_function_or_method' has no len()

我不理解错误。我应该在For loop之后添加另一个start = 0，以便程序读取每个文件，或者可能有其他方法来解决此问题吗？

感谢您的帮助。

[UPDATE]

根据@cmaher的建议，我修改了myscript.py就像这样

import pandas as pd
import glob

df = [pd.read_csv(filename) for filename in glob.glob("Users/Ling/workspace/testing/*.csv")]

df = pd.concat(df) 
start = 0
for i in range(0, len(df.index)):
    if (i + 1)%10 == 0:
        result = df['Length'].iloc[start:i+1].reshape(2,5)
        start = i + 1
        print result

输出就像这样

[[  52.1   32.2   44.6   99.1  122.3]
 [  43.2   79.4   45.5   56.3   15.4]]
[[ 35.7  23.7  66.7  33.8  12.9]
 [ 34.8  21.6  43.7  44.2  55.8]]
[[ 12.  23.  44.  34.  11.]
 [ 39.  79.  45.  56.  15.]]
[[ 35.  23.  66.  33.  12.]
 [ 34.  21.  43.  44.  55.]]

这与我的预期不同。我希望像我在所需的输出中提供的那样并排放置。

Answer 1

正如您所写，df是DataFrame的列表，而不是DataFrame，因此.index是对列表方法.index()的引用。在for循环之前，只需添加df = pd.concat(df)（请参阅http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html），这是专门为连接pandas对象序列而构建的类方法。

编辑：这是带有添加步骤的代码

df = [pd.read_csv(filename) for filename in glob.glob("Users/Ling/workspace/testing/*.csv")]

df = pd.concat(df)

start = 0
for i in range(0, len(df.index)):
    if (i + 1)%10 == 0:
        result = df['Length'].iloc[start:i+1].reshape(2,5)
        start = i + 1
        print result

使用pandas加载和重塑多个csv文件

1 个答案: