熊猫数据框在x行之后创建新列

时间:2018-10-31 15:58:01

标签: python pandas csv

我正在尝试基于CSV文件中的某些数据创建一个新的DataFrame。

我的数据的格式为:

1, 81.99525117808678
2, 78.79210736916842
3, 69.33703048261454
4, 53.12612416937101
5, 48.8442549498639
6, 48.8442549498639
7, 38.96011640562207
8, 33.66251691693962
9, 29.202159649144907
10, 27.77726568480279
1, 81.99525117808678
2, 78.79210736916842
3, 69.33703048261454
4, 53.12612416937101
5, 48.8442549498639
6, 48.8442549498639
7, 38.96011640562207
8, 33.66251691693962
9, 29.202159649144907
10, 27.77726568480279

第一个数字代表索引,第二个数字代表值。我想为每个唯一的运行创建一个新列。例如:

Index:       Run 1:             Run 2:
1,           81.99525117808678, 81.99525117808678
2,           78.79210736916842, 78.79210736916842
3,           69.33703048261454, 69.33703048261454
4,           53.12612416937101, 53.12612416937101
5,           48.8442549498639, 48.8442549498639
6,           48.8442549498639, 48.8442549498639
7,           38.96011640562207, 38.96011640562207
8,           33.66251691693962, 33.66251691693962
9,           29.202159649144907, 29.202159649144907
10,          27.77726568480279, 27.77726568480279

到目前为止,我有以下内容:

df = pd.read_csv(path, header=None, names=['Generation', 'Fitness'], index_col=0)

这将产生结果:

0   
1   81.995251
2   78.792107
3   69.337030
4   53.126124
5   48.844255
6   48.844255
7   38.960116
8   33.662517
9   29.202160
10  27.777266
1   81.995251
2   78.792107
3   69.337030
4   53.126124
5   48.844255
6   48.844255
7   38.960116
8   33.662517
9   29.202160
10  27.777266

1 个答案:

答案 0 :(得分:2)

您可以创建一个大小为10的reader迭代(有关详细信息,请参见docs),然后串联每个块:

reader = pd.read_csv('data.csv', sep=',', chunksize=10,
                       index_col=0, header=None, names=['Generation', 'Fitness'])

my_df = pd.concat((chunk for chunk in reader), axis=1)

>>> my_df
              Fitness    Fitness
Generation                      
1           81.995251  81.995251
2           78.792107  78.792107
3           69.337030  69.337030
4           53.126124  53.126124
5           48.844255  48.844255
6           48.844255  48.844255
7           38.960116  38.960116
8           33.662517  33.662517
9           29.202160  29.202160
10          27.777266  27.777266

如果您需要列名称,可以使用列表理解来重命名它们:

# python 3.6 or above
my_df.columns = [f'Run {i}' for i, _ in enumerate(my_df.columns,1)]
# Or:
my_df.columns = ['Run {}'.format(i) for i, _ in enumerate(my_df.columns,1)]
# Or:
my_df.columns = range(1,len(list(df))+1)
my_df = my_df.add_prefix('Run ')


>>> my_df
                Run 1      Run 2
Generation                      
1           81.995251  81.995251
2           78.792107  78.792107
3           69.337030  69.337030
4           53.126124  53.126124
5           48.844255  48.844255
6           48.844255  48.844255
7           38.960116  38.960116
8           33.662517  33.662517
9           29.202160  29.202160
10          27.777266  27.777266