Question

我正在尝试使用Python复制here上可用的“整理数据”文件的内容。

但是，数据集在github上以.tex文件的形式提供，我似乎无法用熊猫打开它们。

就目前为止的搜索范围来看，大熊猫似乎可以导出为乳胶，但不能从中导入...

1）我正确吗？ 2）如果可以，您如何建议我打开这些文件？

谢谢您的时间！

Answer 1

以this为例：

import pandas as pd
from pandas.compat import StringIO

with open('test.tex') as input_file:
    text = ""
    for line in input_file:
        if '&' in line:
            text += line.replace('\\', '') + '\n'

data = StringIO(text)
df = pd.read_csv(data, sep="&")
data.close()

返回：

    year    artist          track                   time    date.entered    wk1 wk2 wk3
0   2000    2 Pac           Baby Don't Cry          4:22    2000-02-26      87  82  72
1   2000    2Ge+her         The Hardest Part Of ... 3:15    2000-09-02      91  87  92
2   2000    3 Doors Down    Kryptonite              3:53    2000-04-08      81  70  68
3   2000    98verb|^|0      Give Me Just One Nig... 3:24    2000-08-19      51  39  34
4   2000    A*Teens         Dancing Queen           3:44    2000-07-08      97  97  96
5   2000    Aaliyah         I Don't Wanna           4:15    2000-01-29      84  62  51
6   2000    Aaliyah         Try Again               4:03    2000-03-18      59  53  38
7   2000    Adams, Yolanda  Open My Heart           5:30    2000-08-26      76  76  74

您还可以编写一个脚本来转换文件：

with open('test.tex') as input_file:
    with open('test.csv', 'w') as output_file:
        for line in input_file:
            if '&' in line:
                output_file.write(line.replace('\\', '') + '\n')

然后另一个脚本使用熊猫

import pandas as pd
pd.read_csv('test.csv', sep="&")

Answer 2

1）据我所知，您可以使用python打开任何标准类型的文件

2）您可以尝试：

with open('test.tex', 'w') as text_file:
    //Do something to text_file here

用Pandas打开Latex文件？

2 个答案: