Question

ID,age,salary,suburb,language
P1,eighty two,60196.0,Toorak,English
P2,49,-16945514.0,St. Kilda,Chinese
P3,54,49775.0,Neverland,Italian

我有上述字典。在“年龄”列中，一些年龄是用文字写的。我想用None代替它们。

类似地，第二列中的薪水为负数或大于需要用None代替的最高薪水，并且存在无效的郊区名称也需要更改为None。

>

Answer 1

拆分列表，然后对每个字段进行操作非常简单。有很多小错误可以捕捉（例如，如果您的薪水不是数字），但是下面是这种处理的简单示例。

ok_suburbs = [ 'Toorak', 'St. Kilda', 'Redfern' ]

# Read list of data into <people>
people = open("people_data.txt", "rt").readlines()
del(people[0])  # remove the header

for row in people:
    try:
        id, age, salary, suburb, language = row.split(",")
    except:
        print("Invalid data: "+row)
        row = None

    if row != None:
        try:
            age = str(int(age))
        except:
            age = None
        salary = float(salary)
        if salary < 0:
            salary = None
        if suburb not in ok_suburbs:
            suburb = None
        # TODO - rebuild the row from parts

您应该处理边缘条件，例如-错误的数字，字段上的多余空间，SuBUrB NamE中的大小写，字段太少，字段太多等。

Answer 2

我不清楚该数据的存储方式，因为每一行有5个条目，并且字典通常由键值对组成。我将假设ID被用作键，而其他四个条目作为成员存储在一个对象中，并以该对象作为值。我将此字典称为dict，如果您期望年龄是整数年，并且最高薪水存储在max_salary中，那么以下方法应该起作用：

for ID in dict.keys():
  age, salary = dict[ID].age, dict[ID].salary
  if not isinstance(age, int) or age < 0:
    dict[ID].age = None
  if salary < 0 or salary > max_salary:
    dict[ID].salary = None

如果您从文件中的行列表开始，则可以打开文件并将其读入这样的字典中（第一部分是从的答案中借来的）：

class PersonData(object):
  def __init__(self, age, salary, suburb, language):
    self.age = age
    self.salary = salary
    self.suburb = suburb
    self.language = language

file=open("people_data.txt", "rwt")
dict = {}
for row in file.readlines():
  try:
    ID, age, salary, suburb, language = row.split(",")
    dict[ID] = PersonData(age, salary, suburb, language)
  except:
    print("Invalid data: "+row)
    row = None

然后在检查之后，文件可能会被新数据覆盖：

file.seek(0) # go to file beginning
for ID in dict.keys():
  age, salary, suburb, language = dict[ID].age, dict[ID].salary, \
    dict[ID].suburb, dict[ID].language
  file.write(str(ID)+','+str(age)+','+str(salary)+',' \
            +str(suburb)+','+str(language)+'\n')
file.close()

如何通过从字典中删除无效值来清理数据

2 个答案: