我有一个csv文件,它使用þ作为引用,段落符号作为逗号分隔值。
使用子类csv.Dialect不起作用。熊猫不会将þ值解释为字符串。
有什么想法吗?
# This works when the delimiters are more standard (; ")
# But really trying to make it work with the ASCII chars commented out below
import csv
f = open('./data/Test_Quote_SemiColon.dat')
class my_dialect(csv.Dialect):
lineterminator = '\n'
delimiter = ';' # ASCII: 020
quotechar = '"' # ASCII: 254
reader = csv.reader(f, dialect=my_dialect, quoting=1)
for line in reader:
print line
这是(引用和分号)数据:
“BEGID”; “endID所”, “名称”, “要”, “从”; “CC”, “BCC” “ABC_001”;“ABC_004”;“史密斯,约翰”;“Doe,John”;“Roe,Jane”;“”;“” “ABC_005”;“ABC_007”;“史密斯,约翰”;“Doe,John”;“”;“”;“” “ABC_008”;“ABC_012”;“Doe,John”;“Doe,John”;“Smith,John”;“”;“”
答案 0 :(得分:0)
我发现文字和chr(254)
都解决了这个问题。这看起来是对的吗?
>>> import StringIO
>>> txt = '''þBEGIDþþENDIDþþNameþþToþþFromþþCCþþBCCþ þABC_001þþaBC_004þþSmith, JohnþþDoe, JohnþRoe, Janeþþþþþ þABC_005þþaBC_007þþSmith, JohnþþDoe, Johnþþþþþþ þABC_008þþaBC_012þþDoe, JohnþþDoe, JohnþSmith, Johnþþþþþ'''
>>> reader = csv.reader(StringIO.StringIO(txt), delimiter=',', quotechar=chr(254))
>>> for line in reader:
... for entry in line:
... print unicode(entry, 'utf8')
...
þBEGIDþþENDIDþþNameþþToþþFromþþCCþþBCCþ þABC_001þþaBC_004þþSmith
JohnþþDoe
JohnþRoe
Janeþþþþþ þABC_005þþaBC_007þþSmith
JohnþþDoe
Johnþþþþþþ þABC_008þþaBC_012þþDoe
JohnþþDoe
JohnþSmith
Johnþþþþþ
txt
回声如下:
>>> txt
'\xc3\xbeBEGID\xc3\xbe\xc3\xbeENDID\xc3\xbe\xc3\xbeName\xc3\xbe\xc3\xbeTo\xc3\xbe\xc3\xbeFrom\xc3\xbe\xc3\xbeCC\xc3\xbe\xc3\xbeBCC\xc3\xbe \xc3\xbeABC_001\xc3\xbe\xc3\xbeaBC_004\xc3\xbe\xc3\xbeSmith, John\xc3\xbe\xc3\xbeDoe, John\xc3\xbeRoe, Jane\xc3\xbe\xc3\xbe\xc3\xbe\xc3\xbe\xc3\xbe \xc3\xbeABC_005\xc3\xbe\xc3\xbeaBC_007\xc3\xbe\xc3\xbeSmith, John\xc3\xbe\xc3\xbeDoe, John\xc3\xbe\xc3\xbe\xc3\xbe\xc3\xbe\xc3\xbe\xc3\xbe \xc3\xbeABC_008\xc3\xbe\xc3\xbeaBC_012\xc3\xbe\xc3\xbeDoe, John\xc3\xbe\xc3\xbeDoe, John\xc3\xbeSmith, John\xc3\xbe\xc3\xbe\xc3\xbe\xc3\xbe\xc3\xbe'