Question

我是Python新手，尝试使用csv.reader导入2个csv文件，然后进行比较以查看另一个中的元素是否存在，如果是，则删除整行。

我发现了类似问题的其他问题，这些问题表明列表理解是要走的路，但当我执行循环检查appList中是否存在machine时，我得到的结果是空括号，如因此[]。

到目前为止我的代码是：

import csv

appList = csv.reader(open('applist.csv', encoding = "ISO-8859-1"))
appList = list(appList)

machine = csv.reader(open('machine.csv', encoding = "ISO-8859-1"))
machine = list(machine)

for app in appList:
     machine = [app for app in machine if app not in machine]
     print(machine)

applist.csv看起来像这样（它是macOS标准版本上的应用程序列表）

Adobe Creative Cloud for Enterprise
Adobe Acrobat DC Professional
Adobe Bridge CC
Adobe Extension Manager CC
Adobe Illustrator CC 2015
Adobe InDesign CC 2015
Adobe Photoshop CC 2015
Adobe Media Encoder CC 2015
AirPort Utility 6
App Store
Automator 2
[...]

machine.csv看起来像这样......

"Application name";"Metric";"Last used";"Requirement";"Entitlement state";"Remark"
"Adobe Creative Cloud for Enterprise (Mac)";"Installations";"2018-03-28T10:45:00+01:00";"1";"Not covered";""
"Adobe Acrobat DC Professional (Mac)";"Installations";"2018-03-22T17:08:00+00:00";"0";"No requirement";"Installation included in software bundle"
"Adobe Bridge CC (Mac)";"No license required";"2018-03-12T13:45:00+00:00";"";"";"Installation included in software bundle"
"Adobe Extension Manager CC (Mac)";"No license required";"";"";"";"Installation included in software bundle"
"Adobe Illustrator CC 2015 (Mac)";"Installations";"2018-03-12T13:41:00+00:00";"0";"No requirement";"Installation included in software bundle"

[已更新以添加]

目前我的代码：

#!/usr/local/bin/python3

import os
import csv

def csv_reader(machine_dir, machine):
    mach_list = list(csv.reader(open(machine_dir + "/" + machine, encoding="ISO-8859-1"), delimiter=";"))
    return mach_list

def main():
    # Get the paths to the csv files
    csvFile = input("drop the app list csv here: ")
    machine_dir = input("drop the machines csv folder here: ")

    # Import appList csv
    app_list = list(csv.reader(open(csvFile, encoding = "ISO-8859-1")))

    # Get list of machine csv
    machines = os.listdir(machine_dir)

    for machine in machines:
        machine_list = csv_reader(machine_dir, machine)

        new_machine = [app for app in app_list if app not in machine_list]

        print(new_machine)



if __name__ == '__main__': main()

我目前正在一台机器csv文件上测试它，返回的结果不是从app_list

中减去machine_list之后的结果

Answer 1

你正在使用传统的循环，然后进行列表理解，我认为这不是你需要的。

在列表推导中，您循环浏览zones[i][4]中的值，然后如果machine中的值不，则会将值附加到列表中。所以你的逻辑有点偏。实际上，您需要在列表推导中循环显示machine的值，看看它们是否出现在列表appList中：

machine

修改

打开文件时，如果检查它们，则它们是嵌套列表。一种解决方案可能是扁平化列表，然后使用相同的列表理解：

import csv appList = csv.reader(open('applist.csv', encoding = "ISO-8859-1")) appList = list(appList) machine = csv.reader(open('machine.csv', encoding = "ISO-8859-1")) machine = list(machine) new_machine = [app for app in appList if app not in machine]

注意：小心 - 在示例csv文件中，appList.csv包含例如{strong 1} 与您的machine.csv import csv appList = csv.reader(open('applist.csv')) appList = list(appList) machine = csv.reader(open('machine.csv')) machine = list(machine) # Flatten both appList and machine flat_appList = [item for sublist in appList for item in sublist] flat_machine = [item for sublist in machine for item in sublist] new_machine = [app for app in flat_machine if app not in flat_appList]
中包含的内容相同的Adobe Creative Cloud for Enterprise

Answer 2

或者，您可以使用pandas（https://pandas.pydata.org/pandas-docs/stable/api.html）（假设您希望保留每个文件中没有重复的行）。

import pandas

app = pandas.read_csv('applist.csv', encoding="ISO-8859-1")
machine = pandas.read_csv('machine.csv', encoding="ISO-8859-1")

# Combine both dataframes into one
dataframe = app.append(machine, ignore_index=True)

# Only keep the first of each set of duplicates
# This should give us the machine list (without any of the lines
# duplicated in the applist) plus the full applist
dataframe.drop_duplicates(keep='first', inplace=True)
# Now add the applist again
dataframe = dataframe.append(app, ingore_index=True)
# Now drop all the duplicates
# (since the applist was added again, this should drop the entire applist)
dataframe.drop_duplicates(keep=False, inplace=True)
dataframe.reset_index(inplace=True)

# Now 'dataframe' should be the machine list without any lines from applist

如果这些文件相对较小，那么使用循环与使用pandas大致相同，但如果这些文件大熊猫的速度要快得多。

如果列表元素存在则搜索CSV然后删除

2 个答案: