清除将pandas DataFrame()拆分为多个列的方法

时间:2018-02-22 16:45:38

标签: python pandas

如果在某处存在,我道歉 - 我找不到合适的关键字。

我有一个非常简单的com.sun.mail.smtp.SMTPSendFailedException: 530 5.7.57 SMTP; Client was not authenticated to send anonymous mail during MAIL FROM [DB6PR0802CA0037.eurprd08.prod.outlook.com] 看起来像这样

pd.DataFrame()

这样

articles = pd.DataFrame(
                [(0, "Once upon.."),
                 (1, "It happened.."),
                 (2, "The story.."),
                 (3, "So many.."),
                 (4, "How long.."),
                 (5, "It's been..")],
            columns=["article_id", "article"])

我只想将该列拆分为3列(无论顺序如何,但让我们说保持顺序),如下所示:

>>> articles

    article_id  article
0   0   Once upon..
1   1   It happened..
2   2   The story..
3   3   So many..
4   4   How long..
5   5   It's been..

现在我有一些像这样的丑陋的东西(有效):

    article1_id article1    article2_id article2    article3_id article3
0   0   Once upon.. 1   It happened..   2   The story..
1   3   So many..   4   How long..  5   It's been..

但我确信tmp1 = articles.loc[::3].reset_index(); del tmp1['index']; tmp1.columns = ['article1_id', 'article1'] tmp2 = articles.loc[1::3].reset_index(); del tmp2['index']; tmp2.columns = ['article2_id', 'article2'] tmp3 = articles.loc[2::3].reset_index(); del tmp3['index']; tmp3.columns = ['article3_id', 'article3'] pd.concat([tmp1, tmp2, tmp3], axis=1, ignore_index=False).head() 提供更清洁的东西......

1 个答案:

答案 0 :(得分:3)

我认为我们正在寻找array.reshape()

import pandas as pd

df = pd.DataFrame(
                [(0, "Once upon.."),
                 (1, "It happened.."),
                 (2, "The story.."),
                 (3, "So many.."),
                 (4, "How long.."),
                 (5, "It's been.."),
                 (6, "It's been.."),
                 (7, "It's been..")],
            columns=["article_id", "article"])

# New cols (let them define the length of reshape)
cols = ['article1_id','article1','article2_id','article2','article3_id','article3']

# If size of dataframe is not divisable by len(cols) add rows
# Can be removed if certain of length.
while df.size % len(cols) != 0:
    df.loc[len(df)] = ('','')

df = pd.DataFrame(df.values.reshape(df.size//len(cols),len(cols)), columns=cols)

print(df)

返回:

  article1_id     article1 article2_id       article2 article3_id     article3
0           0  Once upon..           1  It happened..           2  The story..
1           3    So many..           4     How long..           5  It's been..
2           6  It's been..           7    It's been..                           

.to_csv():

,article1_id,article1,article2_id,article2,article3_id,article3
0,0,Once upon..,1,It happened..,2,The story..
1,3,So many..,4,How long..,5,It's been..
2,6,It's been..,7,It's been..,,