Question

我正在尝试将以下数据字段拆分为3个字段（pre，match和suf），并将它们输入到逗号分隔的txt文件中。我正在从csv文件中读取所有这些...并且它是utf-8数据。

我现在的问题是我无法解决“TypeError：强制转换为Unicode：需要字符串或缓冲区，找到列表”错误...但是，看到我试图设置我的编码，我不知道我不知道犯规在哪里......

示例数据：

 A-1 طس
 TX 35-L
 Av Rib

分裂应该（\ d +（ - ？[NSEW]）？）为我提供：

Column1 | Column2 | Column3
A       |1        |طس
TX      |35       |-L
Av Rib  |         |

我目前的代码是：

## Iterate over csv file to create matches and splits 
## string according to regex pattern..

    reader = csv.reader(csvfile)

    with codecs.open(r'file.txt', 'w', 'utf-8') as outfile1:
        for row in reader:
           unicode_row = [x.decode('utf-8') for x in row]
           item = unicode_row[1]
           parsed = re.compile("\d+(-?[NSEW])?", re.UNICODE).split(unicode(item, 'utf-8'))
           outfile1.write(parsed + "\n")

Answer 1

您的错误是因为parsed是列表清单。

http://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects

使用Python来're.split'unicode字符

1 个答案: