Question

我收到了一大堆sas文件，所有这些文件都需要更改文件路径。

我为这些任务编写的代码如下：

import glob
import os
import sys 

os.chdir(r"C:\path\subdir")
glob.glob('*.sas')
import os
fileLIST=[]
for dirname, dirnames, filenames in os.walk('.'):
    for filename in filenames:
        fileLIST.append(os.path.join(dirname, filename))
print fileLIST

import re

for fileITEM in set(fileLIST):
    dataFN=r"//path/subdir/{0}".format(fileITEM)
    dataFH=open(dataFN, 'r+')

    for row in dataFH:
    print row
        if re.findall('\.\.\.', str(row)) != []:
            dataSTR=re.sub('\.\.\.', "//newpath/newsubdir", row)
        print >> dataFH, dataSTR.encode('utf-8')
    else:
        print >> dataFH, row.encode('utf-8')
dataFH.close()

我遇到的问题有两个方面：首先，好像我的代码无法识别三个连续的句点，即使用反斜杠分隔也是如此。其次，我收到一个错误“UnicodeDecodeError：'ascii'编解码器无法解码字节...'

SAS程序文件（.sas）是否可能不是utf-8？如果是这样，修复是否就像知道他们使用什么文件编码一样简单？

完整的追溯如下：

Traceback (most recent call last):
  File "stringsubnew.py", line 26, in <module>
    print >> dataFH, row.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position 671: ordinal not in range(128)

提前致谢

Answer 1

问题在于阅读而不是写作。您必须知道正在读取的源文件中的编码是什么，并对其进行适当的解码。

假设源文件包含用iso-8859-1编码的数据

使用str.decode（）

进行阅读时可以执行此操作

my_row = row.decode('iso-8859-1')

或者您可以使用编解码器打开文件来为您处理。

import codecs

dataFH = codecs.open(dataFN, 'r+', 'iso-8859-1')

关于这方面的好话可以在http://nedbatchelder.com/text/unipain.html

找到

UnicodeDecode问题 - 写入SAS程序文件

1 个答案: