Question

我是python开发的新手。在这里，我有以下数据框

Document_ID OFFSET  PredictedFeature  word

    0         0            2000       Mark
    0         8            2000       Bob
    0         16           2200       AL
    0         23           2200       NS
    0         30           2200       GK
    1          0            2100      sandy
    1          5            2100      Rohan
    1          7            2100      DV

此处文档ID是您可以说出的关键。

在这里，我要做的是制作一个文件，使我将看到类似

的结果。

mark 2000, Bob 2000, AL 2200, NS 2200, GK 2200, sandy 2100, 2100 Rohan, 2100 DV

我尝试通过

使用该组

df = df.groupby('Document_ID').agg(lambda x: ','.join(x))
for name in df.index:
    print name
    print df.loc[name]

我也在尝试将其保存为文本或csv格式的文件。

有人可以帮我吗？

Answer 1

使用DataFrame.stack：

new_df=df[['word','PredictedFeature']].stack().to_frame().T
new_df.columns=new_df.columns.droplevel(0)
print(new_df)

   word PredictedFeature word PredictedFeature word PredictedFeature word  \
0  Mark             2000  Bob             2000   AL             2200   NS   

  PredictedFeature word PredictedFeature   word PredictedFeature   word  \
0             2200   GK             2200  sandy             2100  Rohan   

  PredictedFeature word PredictedFeature  
0             2100   DV             2100

但是如果您想保留其余信息，最好使用pivot_table

new_df=df.pivot_table(columns=['word','PredictedFeature'],index='Document_ID',values='OFFSET',fill_value=0)
print(new_df)

word               AL  Bob   DV   GK Mark   NS Rohan sandy
PredictedFeature 2200 2000 2100 2200 2000 2200  2100  2100
Document_ID                                               
0                  16    8    0   30    0   23     0     0
1                   0    0    7    0    0    0     5     0

要保存它，您需要DataFrame.to_csv：

new_df.to_csv('mycsv.csv')

如果是多索引，则需要：

new_df.to_csv('mycsv.csv',index_label=['word','PredictedFeature'])

阅读pd.read_csv：

new_read_csv=pd.read_csv('mycsv.csv')

使用pandas连接两个数据框列

1 个答案: