所以我在一家公司工作,他们要求我编写一些代码来比较一些CSV文件,以查看两个公司中是否都存在公司。
他们使用的工具存在的问题是,例如,如果某公司在一个CSV中是Apple,而另一个在CSV中是Apple,Inc,则认为它们不匹配。因此,我编写了一个脚本,该脚本在经过反复试验和大量修改后,仍能完成工作。但是我觉得应该已经有一个程序包可以为您完成此任务了。
import csv
import copy
newData = open("test.csv", "r")
dataBase = open("CMPList.csv", "r")
testList = list(newData)
test1List = list(dataBase)
preclin = []
cmp = []
newData.close()
dataBase.close()
notInBoth = []
for i in testList:
new = i.strip()
preclin.append(new)
for j in test1List:
new = j.strip()
cmp.append(new)
notInBoth = copy.deepcopy(preclin)
for a in preclin:
for b in cmp:
print(a, b)
if a[0] == b[0]:
if a[:4] in b[:4]:
userinput = input("Are these the same company: [" + a + "] and [" + b + "] [y for yes, n for no]\n")
if userinput == "y":
print("--------------------------------------------------------------\n")
print("["a + "] has been confirmed as the same company as [" + b + "]\n")
print("--------------------------------------------------------------\n")
notInBoth.remove(a)
if userinput == "n":
print("----------------------------------------------------------\n")
print("These companies do not match. Continuing matching process.\n")
print("----------------------------------------------------------\n")
print("All comparisons complete, creating new CSV of companies not in our database.\n")
csvFile = open('NotInDatabase.csv', 'w')
writer = csv.writer(csvFile)
for item in notInBoth:
writer.writerow([item])
csvFile.close()
print("CSV creation complete. Exiting...")