Question

我有一个很大的DataFrame，其中只有一列包含所有值。我需要将数据分成更多列。经过大量的反复试验，我放弃了，寻求您的帮助。

DataFrame的头部如下所示：这些行是一个Series对象。没有价值

                                                        column1
    ---------------------------------------------------------------------
    MultiIndex1  | 1.00   2.00   3.00   4.00   5.00   6.00   7.00
                 | 1.00   2.00   3.00   4.00   5.00   6.00   7.00
                 | 1.00   2.00   3.00   4.00   5.00   6.00   7.00
                 | 1.00   2.00   3.00   4.00   5.00   6.00   7.00
                 | 1.00   2.00   3.00   4.00   5.00   6.00   7.00
                 | 1.00   2.00   3.00   4.00   5.00   6.00   7.00

我想要的输出应如下所示：

                 column1|column2|column3|column4|column5|column6|column7
    ---------------------------------------------------------------------
    MultiIndex1  | 1.00 |  2.00 |  3.00 |  4.00 |  5.00 |  6.00 |  7.00
                 | 1.00 |  2.00 |  3.00 |  4.00 |  5.00 |  6.00 |  7.00
                 | 1.00 |  2.00 |  3.00 |  4.00 |  5.00 |  6.00 |  7.00
                 | 1.00 |  2.00 |  3.00 |  4.00 |  5.00 |  6.00 |  7.00
                 | 1.00 |  2.00 |  3.00 |  4.00 |  5.00 |  6.00 |  7.00
                 | 1.00 |  2.00 |  3.00 |  4.00 |  5.00 |  6.00 |  7.00

我试图： df.columns = ['col1'，'col2'，'col3'，'col4'，'col5'...]

我已经尝试过将其变成系列，然后再回到df。

尝试应用.str.split函数。

很多切片和合并，但没有成功。

帮助将不胜感激。谢谢！

这是我的数据集的前几行，例如：

日期和AALR3是行MultiIndex

2019-01-02; AALR3; 0000000020; 000000000013.300000; 000000000000000100; 10：00：04.961; 1; 2019-01-02; 000086597137782; 000000000310091; 2; 2019-01-02; 000086597142909; 000000000310092; 1; 0; 00000072; 00000174 2019-01-02; AALR3; 0000000010; 000000000013.310000; 000000000000003000; 10：00：04.961; 1; 2019-01-02; 000086597135827; 000000000310088; 2; 2019-01-02; 000086597142909; 000000000310089; 1; 0; 00000120; 00000174 2019-01-02; AALR3; 0000000050; 000000000013.390000; 000000000000000200; 10：11：40.214; 1; 2019-01-02; 000086597182855; 000000000400273; 1; 2019-01-02; 000086597151579; 000000000400274; 2; 0; 00000058; 00000008 2019-01-02; AALR3; 0000000040; 000000000013.380000; 000000000000000100; 10：11：40.214; 1; 2019-01-02; 000086597182855; 000000000400271; 1; 2019-01-02; 000086597151578; 000000000400272; 2; 0; 00000058; 00000174 2019-01-02; AALR3; 0000000030; 000000000013.380000; 000000000000000100; 10：11：40.214; 1; 2019-01-02; 000086597182855; 000000000400269; 1; 2019-01-02; 000086597151189; 000000000400270; 2; 0; 00000058; 00000308

我正在阅读：

    pd.read_csv('//path_to_file', sep=';')

我想这样命名列。

    df.columns = ['Session Date','Instrument Symbol','Trade Number','Trade Price','Traded Quantity',
          'Trade Time','Trade Indicator','Buy Order Date','Sequential Buy Order Number',
          'Secondary Order ID - Buy Order','Aggressor Buy Order Indicator','Sell Order Date',
         'Sequential Sell Order Number','Secondary Order ID - Sell Order','Aggressor Sell Order Indicator',
          'Cross Trade Indicator','Buy Member','Sell Member']

更新：

解决方案有效，非常感谢。

I is almost the way i want it. Is there a way to make the duplicate indexes a MultiIndex as well? I managed to make the dates, but not the symbol. Thanks

Answer 1

尝试一下-

your_df = pd.DataFrame(df.column1.str.split(' ',1).tolist(), columns = ['col1','col2','col3','col4','col5','col6','col7'])
print(your_df)

Answer 2

您看到的是MultiIndex Dataframe，您正在寻找的SingleIndex dataframe，尝试

df = df.reset_index()
df.columns = ['col1','col2','col3','col4','col5','col6','col7']

Pandas DataFrame列分开

2 个答案: