将熊猫数据框的每一行保存到txt文件

时间:2018-07-11 13:48:22

标签: python pandas numpy hdf5

因此,我从如下所示的HDF5文件中打开数据集:

import pandas as pd
import numpy as np

data1 = pd.read_hdf('sport.hdf5', usecols=['category','title','images','link','date','desc'])

它将给我如下输出:

category                                              title  images  \
0      raket  Kevin/Marcus Langsung Fokus ke Kejuaraan Dunia...     NaN   
1         f1         Vettel Menangi GP Inggris yang Penuh Drama     NaN   
2     others  Semangat 'Semakin di Depan' Warnai Kejuaraan M...     NaN   
5  sepakbola             Roberto Martinez Mengejar Status Elite     NaN   
6  sepakbola  Nyaris Separuh Gol Piala Dunia 2018 Lahir dari...     NaN   

                                                link  \
0  https://sport.detik.com/raket/d-4104834/kevinm...   
1  https://sport.detik.com/f1/d-4104788/vettel-me...   
2  https://sport.detik.com/sport-lain/d-4105193/s...   
5  https://sport.detik.com/sepakbola/berita/d-410...   
6  https://sport.detik.com/sepakbola/berita/d-410...   

                             date  \
0   Senin 09 Juli 2018, 00:31 WIB   
1  Minggu 08 Juli 2018, 22:35 WIB   
2   Senin 09 Juli 2018, 11:15 WIB   
5   Senin 09 Juli 2018, 12:35 WIB   
6   Senin 09 Juli 2018, 12:51 WIB   

                                                desc  
0   - Setelah  , Kevin Sanjaya/Marcus Gideon suda...  
1   - Driver Ferrari   keluar sebagai pemenang Gr...  
2   - Kejuaraan Dunia Motocross Grand Prix (MXGP)...  
5             -   bisa jadi mulai kerap diperbinc...  
6             - Berakhirnya perempatfinal Piala D... 

现在,我需要保存标题为 title 的包含 desc 的每一行,我正在使用代码belom:

np.savetxt(data1['title']+'.txt', data1['desc'], fmt='%s')

但是,结果如下:

Traceback (most recent call last):
  File "index.py", line 23, in <module>
    np.savetxt(data1['title']+'.txt', data1['desc'], fmt='%s')
  File "/home/adminsvr/tf-py3/lib/python3.5/site-packages/numpy/lib/npyio.py", line 1187, in savetxt
    if fname.endswith('.gz'):
  File "/home/adminsvr/tf-py3/lib/python3.5/site-packages/pandas/core/generic.py", line 3614, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'endswith'

任何解决方案或想法?

1 个答案:

答案 0 :(得分:0)

工作几个小时后,以下是解决问题的方法:

首先,对Data1数据帧的行进行迭代。不要忘记添加将返回行选择的属性迭代。并且不要忘记定义索引和行。

要为每一行创建文件,请定义目录,后跟(row [title])使其动态。

但是,目录 result / 尚不存在。用户makedirs来制作。

最后,在txt文件中写入(行[desc])。

我们在这里:

import os

for idx,row in data1.iterrows():

    filename = "result/"+str(row['title'])+".txt"
    os.makedirs(os.path.dirname(filename), exist_ok=True)
    with open(filename, "w+") as f:
        f.write(row['desc'])
    f.close()

    print (idx)