我有一个脚本,每天运行一次,并输出带有一行代码的CSV文件。
示例:
CSV今天:
Access Point,MacAddress,Status,Site,Date
AP03 - 1695,5c5b352e3c9b,Disconnected,Store 1695,08-21-2019
AP01 - 0099,5c5b352e44b1,Disconnected,Store 0099,08-21-2019
AP07 - 1961,5c5b350eeae9,Disconnected,Store 1961,08-21-2019
AP05 - 3165,5c5b352e1f04,Disconnected,Store 3165,08-21-2019
AP02 - 1161,5c5b352e4484,Disconnected,Store 1161,08-21-2019
AP05 - 0249,5c5b352e40c9,Disconnected,Store 0249,08-21-2019
AP06 - 1057,5c5b352e1ed7,Disconnected,Store 1057,08-21-2019
AP01 - 2700,5c5b353e444d,Disconnected,Store 2700,08-21-2019
AP02 - 2700,5c5b352ea519,Disconnected,Store 2700,08-21-2019
AP02 - 2722,5c5b352eb446,Disconnected,Store 2722,08-21-2019
CSV昨天:
Access Point,MacAddress,Status,Site,Date
AP03 - 1695,5c5b352e3c9b,Disconnected,Store 1695,08-20-2019
AP01 - 0099,5c5b352e44b1,Disconnected,Store 0099,08-20-2019
AP07 - 1961,5c5b350eeae9,Disconnected,Store 1961,08-20-2019
AP05 - 3165,5c5b352e1f04,Disconnected,Store 3165,08-20-2019
AP02 - 1161,5c5b352e4484,Disconnected,Store 1161,08-20-2019
AP05 - 0249,5c5b352e40c9,Disconnected,Store 0249,08-20-2019
AP06 - 1057,5c5b352e1ed7,Disconnected,Store 1057,08-20-2019
AP01 - 2700,5c5b353e444d,Disconnected,Store 2700,08-20-2019
AP02 - 2700,5c5b352ea519,Disconnected,Store 2700,08-20-2019
AP06 - 0415,5c5b352ebdce,Disconnected,Store 0415,08-20-2019
AP03 - 2542,5c5b353e3e94,Disconnected,Store 2542,08-20-2019
AP03 - 0788,5c5b353e1216,Disconnected,Store 0788,08-20-2019
AP04 - 0788,5c5b353e11e9,Disconnected,Store 0788,08-20-2019
AP05 - 0788,5c5b353e122a,Disconnected,Store 0788,08-20-2019
AP06 - 0788,5c5b353e1220,Disconnected,Store 0788,08-20-2019
AP01 - 1366,5c5b353e136a,Disconnected,Store 1366,08-20-2019
AP05 - 0671,5c5b352eb7ed,Disconnected,Store 0671,08-20-2019
我正在尝试编写一个脚本,该脚本将今天生成的文件与昨天进行比较,然后仅将重复项返回到新的CSV文件中。(如果可能,仅比较MacAddress部分,这样日期就不会从最后一栏)
我发现了数十篇与此类似的文章和问题,但其中大多数都相反(删除重复项),我无法出于某种原因使它们起作用。
有人可以指出我正确的方向吗?
所需的输出(类似):
Access Point,MacAddress,Status,Site,Date
AP03 - 1695,5c5b352e3c9b,Disconnected,Store 1695,08-21-2019
AP01 - 0099,5c5b352e44b1,Disconnected,Store 0099,08-21-2019
AP07 - 1961,5c5b350eeae9,Disconnected,Store 1961,08-21-2019
AP05 - 3165,5c5b352e1f04,Disconnected,Store 3165,08-21-2019
AP06 - 1057,5c5b352e1ed7,Disconnected,Store 1057,08-21-2019
AP01 - 2700,5c5b353e444d,Disconnected,Store 2700,08-21-2019
AP02 - 2700,5c5b352ea519,Disconnected,Store 2700,08-21-2019
我已经尝试了许多变体来使其正常运行,但目前我还只是一个简陋的脚本来完成此任务,因为我不确定最好的开始方法是什么。
当前
:import pandas as pd
import csv
from datetime import date, timedelta
# Setting Dates
today = date.today()
yesterday = today - timedelta(days = 1)
# Setting files with Dates
currentFile = "ap-inventory_" + today.strftime('%m-%d-%Y') + ".csv"
yesterdayFile = "ap-inventory_" + yesterday.strftime('%m-%d-%Y') + ".csv"
这是我得到的最远的结果,但是我永远无法得到它来正确比较结果
import csv
from datetime import date, timedelta
# Setting Dates
today = date.today()
yesterday = today - timedelta(days = 1)
# Setting files with Dates
currentFile = "ap-inventory_" + today.strftime('%m-%d-%Y') + ".csv"
yesterdayFile = "ap-inventory_" + yesterday.strftime('%m-%d-%Y') + ".csv"
with open('master.csv', 'rt') as master:
master_indices = dict((r[1], i) for i, r in enumerate(csv.reader(master)))
with open(currentFile, 'rt') as hosts:
with open(yesterdayFile, 'wt') as results:
reader = csv.reader(hosts)
writer = csv.writer(results)
writer.writerow(next(reader, []) + ['RESULTS'])
for row in reader:
index = master_indices.get(row[3])
if index is not None:
message = 'FOUND in master list (row {})'.format(index)
else:
message = 'NOT FOUND in master list'
writer.writerow(row + [message])
答案 0 :(得分:2)
没有大熊猫,您可以使用类似的东西:
import time
with open("yesterday.csv") as f1, open("today.csv") as f2, open("output.csv", "w+") as out:
yesterday = []
for line in list(f1)[1:]:
yesterday.append(",".join(line.split(",")[:-1]))
today = []
for line in list(f2)[1:]:
today.append(",".join(line.split(",")[:-1]))
date_today = time.strftime('%m-%d-%Y')
common = [f"{x},{date_today}" for x in list(set(today) & set(yesterday))]
header = "Access Point,MacAddress,Status,Site,Date"
out.write(f"{header}\n")
for o in common:
out.write(f"{o}\n")
所需的输出(类似)为:
Access Point,MacAddress,Status,Site,Date
AP05 - 3165,5c5b352e1f04,Disconnected,Store 3165,08-21-2019
AP07 - 1961,5c5b350eeae9,Disconnected,Store 1961,08-21-2019
AP02 - 1161,5c5b352e4484,Disconnected,Store 1161,08-21-2019
AP03 - 1695,5c5b352e3c9b,Disconnected,Store 1695,08-21-2019
AP02 - 2700,5c5b352ea519,Disconnected,Store 2700,08-21-2019
AP05 - 0249,5c5b352e40c9,Disconnected,Store 0249,08-21-2019
AP06 - 1057,5c5b352e1ed7,Disconnected,Store 1057,08-21-2019
AP01 - 0099,5c5b352e44b1,Disconnected,Store 0099,08-21-2019
AP01 - 2700,5c5b353e444d,Disconnected,Store 2700,08-21-2019
yesterday.csv
和today.csv
文件之间的常见项目(无日期)。
Demo
解释
common = [f"{x},{date_today}" for x in list(set(today) & set(yesterday))]
f"{var}"
-被称为f-string list(set(today) & set(yesterday)
-提供列表之间的共同元素[x for x in list]
被称为list comprehension 答案 1 :(得分:1)
我认为,我找到了使用熊猫的解决方案。
convert image.suffix -compress XXX image.tiff
答案 2 :(得分:0)
用熊猫来做到这一点的方法是:
picRng2.CopyPicture Appearance:=xlScreen, Format:=xlPicture
With Email
With wdDoc.Paragraphs(4)
.Range.InsertParagraphAfter
.Range.PasteAndFormat Type:=wdChartPicture
.Range.ParagraphFormat.LineSpacingRule = wdLineSpaceDouble
With wdDoc
.InlineShapes(1).Height = 700
End With
End With
'.Body = "Hello"
.Subject = "Daily Ops Report"
.To = sTo
.Display
End With
picRng1.CopyPicture Appearance:=xlScreen, Format:=xlPicture
With Email
With wdDoc.Paragraphs(2)
.Range.InsertParagraphAfter
.Range.PasteAndFormat Type:=wdChartPicture
.Range.ParagraphFormat.LineSpacingRule = wdLineSpaceDouble
With wdDoc
.InlineShapes(1).Height = 700
End With
End With
End With
With Email
With wdDoc.Paragraphs(1)
.Range.InsertParagraphAfter
.Range.ParagraphFormat.LineSpacingRule = wdLineSpaceDouble
End With
End With
End Sub
duplicates现在是一个布尔结构,可用于在数据框中建立索引。
import pandas as pd
df1 = pd.read_csv("your_file_from_yesterday.csv")
df2 = pd.read_csv("your_file_from_today.csv")
df_combined = pd.concat([df1,df2], axis=0)
duplicates = df_combined["your_column_of_interest"].duplicated(keep="last")
#keep="first" if you want the addresses from yesterday that were duplicated.
您可以将其另存为新文件。
df_combined[duplicates]
的文档