我正在尝试使用Python复制here上可用的“整理数据”文件的内容。
但是,数据集在github上以.tex文件的形式提供,我似乎无法用熊猫打开它们。
就目前为止的搜索范围来看,大熊猫似乎可以导出为乳胶,但不能从中导入...
1)我正确吗? 2)如果可以,您如何建议我打开这些文件?
谢谢您的时间!
答案 0 :(得分:2)
以this为例:
import pandas as pd
from pandas.compat import StringIO
with open('test.tex') as input_file:
text = ""
for line in input_file:
if '&' in line:
text += line.replace('\\', '') + '\n'
data = StringIO(text)
df = pd.read_csv(data, sep="&")
data.close()
返回:
year artist track time date.entered wk1 wk2 wk3
0 2000 2 Pac Baby Don't Cry 4:22 2000-02-26 87 82 72
1 2000 2Ge+her The Hardest Part Of ... 3:15 2000-09-02 91 87 92
2 2000 3 Doors Down Kryptonite 3:53 2000-04-08 81 70 68
3 2000 98verb|^|0 Give Me Just One Nig... 3:24 2000-08-19 51 39 34
4 2000 A*Teens Dancing Queen 3:44 2000-07-08 97 97 96
5 2000 Aaliyah I Don't Wanna 4:15 2000-01-29 84 62 51
6 2000 Aaliyah Try Again 4:03 2000-03-18 59 53 38
7 2000 Adams, Yolanda Open My Heart 5:30 2000-08-26 76 76 74
您还可以编写一个脚本来转换文件:
with open('test.tex') as input_file:
with open('test.csv', 'w') as output_file:
for line in input_file:
if '&' in line:
output_file.write(line.replace('\\', '') + '\n')
然后另一个脚本使用熊猫
import pandas as pd
pd.read_csv('test.csv', sep="&")
答案 1 :(得分:-1)
1)据我所知,您可以使用python打开任何标准类型的文件
2)您可以尝试:
with open('test.tex', 'w') as text_file:
//Do something to text_file here