我是Python新手,尝试使用csv.reader
导入2个csv文件,然后进行比较以查看另一个中的元素是否存在,如果是,则删除整行。
我发现了类似问题的其他问题,这些问题表明列表理解是要走的路,但当我执行循环检查appList
中是否存在machine
时,我得到的结果是空括号,如因此[]。
到目前为止我的代码是:
import csv
appList = csv.reader(open('applist.csv', encoding = "ISO-8859-1"))
appList = list(appList)
machine = csv.reader(open('machine.csv', encoding = "ISO-8859-1"))
machine = list(machine)
for app in appList:
machine = [app for app in machine if app not in machine]
print(machine)
applist.csv看起来像这样(它是macOS标准版本上的应用程序列表)
Adobe Creative Cloud for Enterprise
Adobe Acrobat DC Professional
Adobe Bridge CC
Adobe Extension Manager CC
Adobe Illustrator CC 2015
Adobe InDesign CC 2015
Adobe Photoshop CC 2015
Adobe Media Encoder CC 2015
AirPort Utility 6
App Store
Automator 2
[...]
machine.csv看起来像这样......
"Application name";"Metric";"Last used";"Requirement";"Entitlement state";"Remark"
"Adobe Creative Cloud for Enterprise (Mac)";"Installations";"2018-03-28T10:45:00+01:00";"1";"Not covered";""
"Adobe Acrobat DC Professional (Mac)";"Installations";"2018-03-22T17:08:00+00:00";"0";"No requirement";"Installation included in software bundle"
"Adobe Bridge CC (Mac)";"No license required";"2018-03-12T13:45:00+00:00";"";"";"Installation included in software bundle"
"Adobe Extension Manager CC (Mac)";"No license required";"";"";"";"Installation included in software bundle"
"Adobe Illustrator CC 2015 (Mac)";"Installations";"2018-03-12T13:41:00+00:00";"0";"No requirement";"Installation included in software bundle"
[已更新以添加]
目前我的代码:
#!/usr/local/bin/python3
import os
import csv
def csv_reader(machine_dir, machine):
mach_list = list(csv.reader(open(machine_dir + "/" + machine, encoding="ISO-8859-1"), delimiter=";"))
return mach_list
def main():
# Get the paths to the csv files
csvFile = input("drop the app list csv here: ")
machine_dir = input("drop the machines csv folder here: ")
# Import appList csv
app_list = list(csv.reader(open(csvFile, encoding = "ISO-8859-1")))
# Get list of machine csv
machines = os.listdir(machine_dir)
for machine in machines:
machine_list = csv_reader(machine_dir, machine)
new_machine = [app for app in app_list if app not in machine_list]
print(new_machine)
if __name__ == '__main__': main()
我目前正在一台机器csv文件上测试它,返回的结果不是从app_list
machine_list
之后的结果
答案 0 :(得分:2)
你正在使用传统的循环,然后进行列表理解,我认为这不是你需要的。
在列表推导中,您循环浏览zones[i][4]
中的值,然后如果machine
中的值不,则会将值附加到列表中。所以你的逻辑有点偏。实际上,您需要在列表推导中循环显示machine
的值,看看它们是否出现在列表appList
中:
machine
修改强>
打开文件时,如果检查它们,则它们是嵌套列表。一种解决方案可能是扁平化列表,然后使用相同的列表理解:
import csv
appList = csv.reader(open('applist.csv', encoding = "ISO-8859-1"))
appList = list(appList)
machine = csv.reader(open('machine.csv', encoding = "ISO-8859-1"))
machine = list(machine)
new_machine = [app for app in appList if app not in machine]
注意:小心 - 在示例csv文件中,appList.csv包含例如{strong 1} 与您的machine.csv import csv
appList = csv.reader(open('applist.csv'))
appList = list(appList)
machine = csv.reader(open('machine.csv'))
machine = list(machine)
# Flatten both appList and machine
flat_appList = [item for sublist in appList for item in sublist]
flat_machine = [item for sublist in machine for item in sublist]
new_machine = [app for app in flat_machine if app not in flat_appList]
Adobe Creative Cloud for Enterprise
答案 1 :(得分:0)
或者,您可以使用pandas
(https://pandas.pydata.org/pandas-docs/stable/api.html)(假设您希望保留每个文件中没有重复的行)。
import pandas
app = pandas.read_csv('applist.csv', encoding="ISO-8859-1")
machine = pandas.read_csv('machine.csv', encoding="ISO-8859-1")
# Combine both dataframes into one
dataframe = app.append(machine, ignore_index=True)
# Only keep the first of each set of duplicates
# This should give us the machine list (without any of the lines
# duplicated in the applist) plus the full applist
dataframe.drop_duplicates(keep='first', inplace=True)
# Now add the applist again
dataframe = dataframe.append(app, ingore_index=True)
# Now drop all the duplicates
# (since the applist was added again, this should drop the entire applist)
dataframe.drop_duplicates(keep=False, inplace=True)
dataframe.reset_index(inplace=True)
# Now 'dataframe' should be the machine list without any lines from applist
如果这些文件相对较小,那么使用循环与使用pandas大致相同,但如果这些文件大熊猫的速度要快得多。