Question

我需要帮助重塑csv文件中每行超过10000行的数据。例如，我有这个csv文件：

Ale Brick
1   ww
2   ee
3   qq
3   xx
5   dd
3   gg
7   hh
8   tt
9   yy
0   uu
1   ii
2   oo
3   pp
4   mm
1   ww
7   zz
1   cc
3   rr
6   tt
9   ll

我希望得到的是这个表格，其中只有“砖块”中的数据。列将被重新整形。

[['ww' 'ee' 'qq' 'xx' 'dd']
 ['gg' 'hh' 'tt' 'yy' 'uu']]

[['ii' 'oo' 'pp' 'mm' 'ww']
 ['zz' 'cc' 'rr' 'tt' 'll']]

我知道如何仅从0到第9行重新整形数据，但不知道如何在下一个第10行进行。这是我的剧本：

import pandas as pd

df = pd.read_csv("test.csv")

for i in range(0, len(df)):
    slct = df.head(10)
    result = slct['Brick'].reshape(2,5)

print result

此脚本仅打印以下结果

[['ww' 'ee' 'qq' 'xx' 'dd']
 ['gg' 'hh' 'tt' 'yy' 'uu']]

我希望它能够打印0到9行，第10到第19行，第20行到第29行的数据等等......

我已经阅读过pandas教程，但没有找到任何与我想要的相似的例子。

感谢您的帮助

Answer 1

您需要使用模运算符来批处理＆＃34;批处理＆＃34;重塑你的专栏。你正走在正确的轨道上。你只需要另一个迭代器来进行模运算。

import pandas as pd

df = pd.DataFrame({'brick': ['xx','yy','xa','bd','ev','bb','oo','pp','qq','bn','nv','bn','rr','qw','bn','cd','fd','bv','nm','ty']})

start = 0  # set start to 0 for slicing
for i in range(len(df.index)):
    if (i + 1) % 10 == 0:  # the modulo operation
        result = df['brick'].iloc[start:i+1].reshape(2,5)
        print result
        start = i + 1  # set start to next index

输出：

[['xx' 'yy' 'xa' 'bd' 'ev']
 ['bb' 'oo' 'pp' 'qq' 'bn']]
[['nv' 'bn' 'rr' 'qw' 'bn']
 ['cd' 'fd' 'bv' 'nm' 'ty']]

Answer 2

您可以按每10行进行分组，然后重新整形值

df.groupby(np.repeat(np.arange(len(df) / 10), 10))['Brick'].apply(lambda x: x.values.reshape(2,5))

0.0    [[ww, ee, qq, xx, dd], [gg, hh, tt, yy, uu]]
1.0    [[ii, oo, pp, mm, ww], [zz, cc, rr, tt, ll]]

Answer 3

import pandas as pd

df = pd.read_csv(`"`test.csv`"`)

data = df['Brick']

k=int(len(data)/10)+1

for x in range(k):

    temp=data[10*x:10*(x+1)]

    print temp.values.reshape(2,5)

如何使用pandas重塑每个第n行的数据？

3 个答案: