Question

我试图弄清楚如何识别每行分成四个元素的行列表中的重复元素。然后，我需要保留原始元素的行，并删除包含重复元素的所有行。

例如：

123, jon, doe, $50
123, bob, smith, $25
456, jane, jones, $60

所需的输出应为：

warning! duplicate: 123

然后应该像这样读取列表：

123, jon, doe, $50
456, jane, jones, $60

列表很长，到目前为止我已尝试循环，但我似乎只能打印出第0个元素。我不知道如何识别和删除列表中包含重复元素的行。

我的猜测是代码应该在最后一行之前，以便在原始列表被清除重复之后，将附加剩余的内容。如果有人可以帮助我，我会很感激。这是我的第一个问题，我尽力遵守所有规定的政策。我使用的是Python 3.谢谢。

class BankAccount:

    def __init__(self, account_num, first_name, last_name, decimal_val):
        self.account_num = account_num
        self.first_name = first_name
        self.last_name = last_name
        self.decimal_val = float(decimal_val)

    def __str__(self):
        return (self.account_num+", "+ self.last_name+", "+self.first_name+", "+str(self.decimal_val))

    def __eq__(self, other):
        if self.account_num == other.account_num:
            print("Warning! Account number already exists:"+self.account_num)



from BankAccount import *
total, count, average = 0, 0, 0
customer_money = [] # for a different part that is working

with open("accounts.csv", "r") as file: #original file
    contents = file.readlines() 
    customers = []
    for i in range(1,len(contents)):
        line = contents[i].split(",") #splits each line into four elements
        customers.append(BankAccount(line[0], line[1], line[2], line[3]))

Answer 1

我在这里使用字典，因为你希望id号是唯一的，并且如果重复它就能失败。

此外，您不需要从readlines()生成列表，或使用range迭代该列表 - 您可以直接在file对象上循环。如下所示：

customers = {}
with open("accounts.csv", "r") as file: #original file
    for i in file:
        i = i.strip()
        line = i.split(",")
        if not line[0] in customers:
            customers[line[0]] = BankAccount(line[0], line[1], line[2], line[3])
        else:
            print("Duplicate!", line)

如果您只需要customers.values()个对象列表，则可以使用BankAccount。

Answer 2

您的代码几乎。首先，您的__eq__方法需要略有不同：不要尝试在那里打印任何内容，只需指出这两个对象是否应该被视为重复。看起来像这样：

def __eq__(self, other):
    return self.account_id == other.account_id

然后，您可以利用in运算符来过滤重复项。这是一个例子：

one = BankAccount(123, 'John', 'Doe', 39.5)
customers = [one]
two = BankAccount(123, 'Fred', 'Smith', 96.2)
assert(two in customers) # This is true

最后一步是在将新客户添加到列表之前为您的for循环添加一个检查：

customers = []
for i in range(1,len(contents)):
    line = contents[i].split(",") #splits each line into four elements
    account = BankAccount(line[0], line[1], line[2], line[3])
    if account in customers:
        print("Duplicate account: {}".format(account.id))
    else:
        customers.append(account)

请注意，还有很多其他方法可以实现您的目标，其中一些可能更有效，但我想向您展示一个非常接近您已有的解决方案。

还有一点需要注意：您的__str__方法也不起作用 - 您需要将self.account_id更改为str(self.account_id)。完成后，您可以更改＆＃34;重复帐户＆＃34;上面的消息发送到print("Duplicate account: {}".format(account))，以便您获得更多信息。

从列表中删除包含重复的第0个元素的行

2 个答案: