因此,我从如下所示的HDF5文件中打开数据集:
import pandas as pd
import numpy as np
data1 = pd.read_hdf('sport.hdf5', usecols=['category','title','images','link','date','desc'])
它将给我如下输出:
category title images \
0 raket Kevin/Marcus Langsung Fokus ke Kejuaraan Dunia... NaN
1 f1 Vettel Menangi GP Inggris yang Penuh Drama NaN
2 others Semangat 'Semakin di Depan' Warnai Kejuaraan M... NaN
5 sepakbola Roberto Martinez Mengejar Status Elite NaN
6 sepakbola Nyaris Separuh Gol Piala Dunia 2018 Lahir dari... NaN
link \
0 https://sport.detik.com/raket/d-4104834/kevinm...
1 https://sport.detik.com/f1/d-4104788/vettel-me...
2 https://sport.detik.com/sport-lain/d-4105193/s...
5 https://sport.detik.com/sepakbola/berita/d-410...
6 https://sport.detik.com/sepakbola/berita/d-410...
date \
0 Senin 09 Juli 2018, 00:31 WIB
1 Minggu 08 Juli 2018, 22:35 WIB
2 Senin 09 Juli 2018, 11:15 WIB
5 Senin 09 Juli 2018, 12:35 WIB
6 Senin 09 Juli 2018, 12:51 WIB
desc
0 - Setelah , Kevin Sanjaya/Marcus Gideon suda...
1 - Driver Ferrari keluar sebagai pemenang Gr...
2 - Kejuaraan Dunia Motocross Grand Prix (MXGP)...
5 - bisa jadi mulai kerap diperbinc...
6 - Berakhirnya perempatfinal Piala D...
现在,我需要保存标题为 title 的包含 desc 的每一行,我正在使用代码belom:
np.savetxt(data1['title']+'.txt', data1['desc'], fmt='%s')
但是,结果如下:
Traceback (most recent call last):
File "index.py", line 23, in <module>
np.savetxt(data1['title']+'.txt', data1['desc'], fmt='%s')
File "/home/adminsvr/tf-py3/lib/python3.5/site-packages/numpy/lib/npyio.py", line 1187, in savetxt
if fname.endswith('.gz'):
File "/home/adminsvr/tf-py3/lib/python3.5/site-packages/pandas/core/generic.py", line 3614, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'endswith'
任何解决方案或想法?
答案 0 :(得分:0)
工作几个小时后,以下是解决问题的方法:
首先,对Data1数据帧的行进行迭代。不要忘记添加将返回行选择的属性迭代。并且不要忘记定义索引和行。
要为每一行创建文件,请定义目录,后跟(row [title])使其动态。
但是,目录 result / 尚不存在。用户makedirs来制作。
最后,在txt文件中写入(行[desc])。
我们在这里:
import os
for idx,row in data1.iterrows():
filename = "result/"+str(row['title'])+".txt"
os.makedirs(os.path.dirname(filename), exist_ok=True)
with open(filename, "w+") as f:
f.write(row['desc'])
f.close()
print (idx)