我有一个文本文件,其中包含重音字符,例如:'č','š','ž'。当我使用Python程序读取此文件并将文件内容放入Python列表时,重音字符将丢失,Python将其替换为其他字符。例如:'č'替换为'_'。当我从文件中读取它们时,有谁知道如何在Python程序中保留重音字符?我的代码:
import sqlite3 #to work with relational DB
conn = sqlite3.connect('contacts.sqlite') #connect to db
cur = conn.cursor() #db connection handle
cur.execute("DROP TABLE IF EXISTS contacts")
cur.execute("CREATE TABLE contacts (id INTEGER, name TEXT, surname TEXT, email TEXT)")
fname = "acos_ibm_notes_contacts - test.csv"
fh = open(fname) #file handle
print " "
print "Reading", fname
print " "
#--------------------------------------------------
#First build a Python list with new contacts data: name, surname and email address
lst = list() #temporary list to hold content of the file
new_contact_list = list() #this list will contain contatcs data: name, surname and email address
count = 0 # to count number of contacts
id = 1 #will be used to add contacts id into the DB
for line in fh: #for every line in the file handle
new_contact = list()
name = ''
surname = ''
mail = ''
#split line into tokens at each '"' character and put tokens into the temporary list
lst = line.split('"')
if lst[1] == ',': continue #if there is no first name, move to next line
elif lst[1] != ',': #if 1st element of list is not empty
name = lst[1] #this is the name
if name[-1] == ',': #If last character in name is ','
name = name[:-1] #delete it
new_contact.append({'Name':name}) #add first name to new list of contacts
if lst[5] != ',': #if there is a last name in the contact data
surname = lst[5] #assign 5th element of the list to surname
if surname[0] == ',': #If first character in surname is ','
surname = surname[1:] #delete it
if surname[-1] == ',': #If last character in surname is ','
surname = surname[:-1] #delete it
if ',' in surname: #if surname and mail are merged in same list element
sur_mail = surname.split(',') #split them at the ','
surname = sur_mail[0]
mail = sur_mail[1]
new_contact.append({'Surname':surname}) #add last name to new list of contacts
new_contact.append({'Mail':mail}) #add mail address to new list of contacts
new_contact_list.append(new_contact)
count = count + 1
fh.close()
#--------------------------------------------------
# Second: populate the DB with data from the new_contact_list
row = cur.fetchone()
id = 1
for i in range(count):
entry = new_contact_list[i] #every row in the list has data about 1 contact - put it into variable
name_dict = entry[0] #First element is a dictionary with name data
surname_dict = entry[1] #Second element is a dictionary with surname data
mail_dict = entry[2] #Third element is a dictionary with mail data
name = name_dict['Name']
surname = surname_dict['Surname']
mail = mail_dict['Mail']
cur.execute("INSERT INTO contacts VALUES (?, ?, ?, ?)", (id, name, surname, mail))
id = id + 1
conn.commit() # Commit outstanding changes to disk
import io
fh = io.open("notes_contacts.csv", encoding="utf_16_le") #file handle
lst = list() #temporary list to hold content of the file
new_contact_list = list() #this list will contain the contact name, surname and email address
count = 0 # to count number of contacts
id = 1 #will be used to add contacts id into the DB
for line in fh: #for every line in the file handle
print "Line from file:\n", line # print it for debugging purposes
new_contact = list()
name = ''
surname = ''
mail = ''
#split line into tokens at each '"' character and put tokens into the temporary list
lst = line.split('"')
if lst[1] == ',': continue #if there is no first name, move to next line
elif lst[1] != ',': #if 1st element of list is not empty
name = lst[1] #this is the name
print "Name in variable:", name # print it for debugging purposes
if name[-1] == ',': #If last character in name is ','
name = name[:-1] #delete it
new_contact.append({'Name':name}) #add first name to new list of contacts
if lst[5] != ',': #if there is a last name in the contact data
surname = lst[5] #assign 5th element of the list to surname
print "Surname in variable:", surname # print it for debugging purposes
if surname[0] == ',': #If first character in surname is ','
surname = surname[1:] #delete it
if surname[-1] == ',': #If last character in surname is ','
surname = surname[:-1] #delete it
if ',' in surname: #if surname and mail are merged in same list element
sur_mail = surname.split(',') #split them at the ','
surname = sur_mail[0]
mail = sur_mail[1]
new_contact.append({'Surname':surname}) #add last name to new list of contacts
new_contact.append({'Mail':mail}) #add mail address to new list of contacts
new_contact_list.append(new_contact)
print "New contact within the list:", new_contact # print it for debugging purposes
fh.close()
Aco,"",Vidovič,aco.vidovic@si.ibm.com,+38613208872,"",+38640456872,"","","","","","","","",""
答案 0 :(得分:0)
在Python 2.7中,默认文件模式是二进制。相反,您需要以文本模式打开文件,并在Python 3中对文本进行解码。在阅读文件时,您不必解码文本,但这样可以避免在以后的文件中担心编码代码。
加入顶部:
import io
变化:
fh = io.open(fname, encoding='utf_16_le')
注意:您始终需要传递encoding
,因为Python无法原始猜测编码。
现在,每次read()
,文本都将转换为Unicode字符串。
SQLite模块接受TEXT为Unicode或UTF-8编码的str。由于您已经将文本解码为Unicode,因此您无需执行任何其他操作。
为了确保SQLite不会尝试将SQL命令的主体编码回ASCII字符串,请通过在字符串中附加u
将SQL命令更改为Unicode字符串。
E.g。
cur.execute(u"INSERT INTO contacts VALUES (?, ?, ?, ?)", (id, name, surname, mail))
Python 3将帮助您避免一些这些怪癖,您只需要执行以下操作即可使其工作:
fh = io.open(fname, encoding='utf_16_le')
由于您的数据看起来像标准Excel方言CSV,因此您可以使用CSV模块拆分数据。 DictReader允许您传递列名,这使得解析字段变得非常容易。不幸的是,Python的2.7 CSV模块不是Unicode安全的,所以你需要使用Py3 backport:https://github.com/ryanhiebert/backports.csv
您的代码可以简化为:
from backports import csv
import io
csv_fh = io.open('contacts.csv', encoding='utf_16_le')
field_names = [u'first_name', u'middle_name', u'surname', u'email',
u'phone_office', u'fax', u'phone_mobile', u'inside_leg_measurement']
csv_reader = csv.DictReader(csv_fh, fieldnames=field_names)
for row in csv_reader:
if not row['first_name']: continue
print u"First Name: {first_name}, " \
u"Surname: {surname} " \
u"Email: {email}".format(first_name=row['first_name'],
surname=row['surname'],
email=row['email'])
答案 1 :(得分:-3)
尝试在代码程序的第一行使用# coding=utf-8