Question

我正在尝试在Python中打开一个.txt文件。

在将此标记为重复之前，请先查看下面的代码和文件。

我之前使用过此代码段来读取类似的文件，但是这一批特定的文件不起作用。

location="sample/sample2/"
filename=location+"Detector_-3000um.txt"
skip=25 #Skip the first 25 lines

打开它的代码是 -

f=open(filename)
num_lines = sum(1 for line in f)
print "Skipping the first "+str(skip)+" lines"
data=np.zeros((num_lines-skip+1,num_lines-skip+1))
f.close()
f=open(filename)
i=0
for _ in range(skip):  #skip unwanted rows
    next(f)
for line in f:
    data[i,:]=line.split()
    i+=1
f.close()

它是一个501x501数据集，第一行和第二列是行号和列号。

数据集附有here。

我也尝试过使用panda - pd.read_csv（文件名，跳过）但是它给出了这个错误 -

CParserError: Error tokenizing data. C error: Expected 1 fields in line 49, saw 501

Answer 1

我认为，您的代码没有问题，问题是文件编码。

我将您的文件编码转换为'utf-8'，然后您的代码和来自pandas的read_csv（）都能正常工作。

pd.read_csv(myfile, skiprows=24, header=0, index_col=0,sep='\t')

有很多方法可以转换编码，例如使用notepad ++（windows），我的方式或请看到这里：How to convert a file to utf-8 in Python?

跳过行后在Python中打开.txt文件 - 编码问题

1 个答案: