在同一数据框中合并重复的列

时间:2019-08-14 19:25:52

标签: python pandas csv dataframe reshape

我有各种试图合并的csv文件中的数据。我将所有内容都放在一个Dataframe中。如何将数据合并到相应的A,B,C列中,并为每行包含一个标题?

for data_base in data:
    base_data.append(data_base['A'])
    base_data.append(data_base[' B'])
    base_data.append(data_base[' C'] )
#    np.append(base_data, np.nan)
df_name = pd.DataFrame(name_join)
df_data = pd.DataFrame(base_data)
trp = np.transpose(df_data)

实际:

A           B       C       A       B       C       A       B       C
0.7283  0.743   0.01    0.7283  0.7512  0.02    0.7283  0.7456  0.02
0.5165  0.488   0.03    0.5165  0.4756  0.04    0.5165  0.4707  0.05
0.5087  0.4781  0.03    0.5087  0.4611  0.05    0.5087  0.4467  0.06
0.4598  0.4834  0.02    0.4598  0.4938  0.03    0.4598  0.4793  0.02
0.4883  0.5235  0.04    0.4883  0.5173  0.03    0.4883  0.5278  0.04
0.5993  0.6229  0.02    0.5993  0.6223  0.02    0.5993  0.6258  0.03
0.5351  0.5983  0.06    0.5351  0.6029  0.07    0.5351  0.613   0.08
0.6105  0.6314  0.02    0.6105  0.6434  0.03    0.6105  0.6361  0.03
0.5946  0.6495  0.05    0.5946  0.6452  0.05    0.5946  0.6463  0.05
0.7335  0.7506  0.02    0.7335  0.7559  0.02    0.7335  0.7497  0.02

预期:

    A       B       C
Cow 0.7283  0.743   0.01
    0.5165  0.488   0.03
    0.5087  0.4781  0.03
    0.4598  0.4834  0.02
    0.4883  0.5235  0.04
    0.5993  0.6229  0.02
    0.5351  0.5983  0.06
    0.6105  0.6314  0.02
    0.5946  0.6495  0.05
    0.7335  0.7506  0.02
Cat 0.7283  0.7512  0.02
    0.5165  0.4756  0.04
    0.5087  0.4611  0.05
    0.4598  0.4938  0.03
    0.4883  0.5173  0.03
    0.5993  0.6223  0.02
    0.5351  0.6029  0.07
    0.6105  0.6434  0.03
    0.5946  0.6452  0.05
    0.7335  0.7559  0.02
Dog 0.7283  0.7456  0.02
    0.5165  0.4707  0.05
    0.5087  0.4467  0.06
    0.4598  0.4793  0.02
    0.4883  0.5278  0.04
    0.5993  0.6258  0.03
    0.5351  0.613   0.08
    0.6105  0.6361  0.03
    0.5946  0.6463  0.05
    0.7335  0.7497  0.02

1 个答案:

答案 0 :(得分:0)

这是一个基于Nycbros评论的解决方案。

import pandas as pd

# Dummy data
data_double = pd.DataFrame(data=[{'x': x, 'y': 2 * x} for x in range(5)])
data_triple = pd.DataFrame(data=[{'x': x, 'y': 3 * x} for x in range(5)])

print(data_double)

输出:

   x  y
0  0  0
1  1  2
2  2  4
3  3  6
4  4  8
print(data_triple)

输出:

   x   y
0  0   0
1  1   3
2  2   6
3  3   9
4  4  12

# You will need to get a list of keys which equate to your data
data = [data_double, data_triple]
keys = ['Double', 'Triple']

# Concatenate the dataframes in your data array, give it the keys to index with
combo = pd.concat(data, keys=keys)
print(combo)

输出:

          x   y
Double 0  0   0
       1  1   2
       2  2   4
       3  3   6
       4  4   8
Triple 0  0   0
       1  1   3
       2  2   6
       3  3   9
       4  4  12
# If you don't want the original indexes, you can drop them
combo = combo.reset_index(level=1, drop=True)
print(combo)

输出:

        x   y
Double  0   0
Double  1   2
Double  2   4
Double  3   6
Double  4   8
Triple  0   0
Triple  1   3
Triple  2   6
Triple  3   9
Triple  4  12