ID,age,salary,suburb,language
P1,eighty two,60196.0,Toorak,English
P2,49,-16945514.0,St. Kilda,Chinese
P3,54,49775.0,Neverland,Italian
我有上述字典。在“年龄”列中,一些年龄是用文字写的。我想用None
代替它们。
类似地,第二列中的薪水为负数或大于需要用None
代替的最高薪水,并且存在无效的郊区名称也需要更改为None
。
答案 0 :(得分:0)
拆分列表,然后对每个字段进行操作非常简单。 有很多小错误可以捕捉(例如,如果您的薪水不是数字),但是下面是这种处理的简单示例。
ok_suburbs = [ 'Toorak', 'St. Kilda', 'Redfern' ]
# Read list of data into <people>
people = open("people_data.txt", "rt").readlines()
del(people[0]) # remove the header
for row in people:
try:
id, age, salary, suburb, language = row.split(",")
except:
print("Invalid data: "+row)
row = None
if row != None:
try:
age = str(int(age))
except:
age = None
salary = float(salary)
if salary < 0:
salary = None
if suburb not in ok_suburbs:
suburb = None
# TODO - rebuild the row from parts
您应该处理边缘条件,例如-错误的数字,字段上的多余空间,SuBUrB NamE中的大小写,字段太少,字段太多等。
答案 1 :(得分:0)
我不清楚该数据的存储方式,因为每一行有5个条目,并且字典通常由键值对组成。我将假设ID
被用作键,而其他四个条目作为成员存储在一个对象中,并以该对象作为值。我将此字典称为dict
,如果您期望年龄是整数年,并且最高薪水存储在max_salary
中,那么以下方法应该起作用:
for ID in dict.keys():
age, salary = dict[ID].age, dict[ID].salary
if not isinstance(age, int) or age < 0:
dict[ID].age = None
if salary < 0 or salary > max_salary:
dict[ID].salary = None
如果您从文件中的行列表开始,则可以打开文件并将其读入这样的字典中(第一部分是从的答案中借来的):
class PersonData(object):
def __init__(self, age, salary, suburb, language):
self.age = age
self.salary = salary
self.suburb = suburb
self.language = language
file=open("people_data.txt", "rwt")
dict = {}
for row in file.readlines():
try:
ID, age, salary, suburb, language = row.split(",")
dict[ID] = PersonData(age, salary, suburb, language)
except:
print("Invalid data: "+row)
row = None
然后在检查之后,文件可能会被新数据覆盖:
file.seek(0) # go to file beginning
for ID in dict.keys():
age, salary, suburb, language = dict[ID].age, dict[ID].salary, \
dict[ID].suburb, dict[ID].language
file.write(str(ID)+','+str(age)+','+str(salary)+',' \
+str(suburb)+','+str(language)+'\n')
file.close()