我有2个CSV文件,其列数和格式相同,包含每行中服务器的详细信息。每个文件都指的是不同的日子。
我想将Day2 CSV file
列的Size (GB)
(列D
)的每个服务器(行)与Day1 CSV file
的{{1}}的每个服务器进行比较{1}}列(列Size (GB)
),并将输出写入D
的{{1}}或单独的第3个CSV文件中,以跟踪每天的差异/增长情况。
我正试图在column E
中实现它。
接下来我举一个例子:
day1.csv
day2 CSV file
day2.csv
Python
预期结果 output.csv
Server Site Platform Size(GB)
a Primary Windows 100
b Secondary Unix 200
c Primary Oracle 500
编辑1:
这是我到目前为止开发的代码:
Server Site Platform Size(GB)
a Primary Windows 150
b Secondary Unix 100
c Primary Oracle 500
答案 0 :(得分:1)
这不是一个非常通用的解决方案,但我尝试尽可能地遵循您的方法:
import csv
# Open read files
file1 = open('day1.csv', 'r')
file2 = open('day2.csv', 'r')
# Open output file
outputFile = open ('day3.csv', 'w')
csvWriter = csv.writer(outputFile, delimiter=',')
# Write the output file header
csvWriter.writerow(["Server", "Site", "Platform", "Size", "Growth"])
# Process input files
csvReader1 = csv.reader(file1, delimiter=',')
csvReader2 = csv.reader(file2, delimiter=',')
# Skip headers
csvReader1.next()
csvReader2.next()
# Process data
for rowF2 in csvReader2:
# Get the content of each line in F1
rowF1 = csvReader1.next()
# Uncomment for debug
#print rowF1
#print rowF2
# Construct output line from F2 values
colA = str(rowF2[0])
colB = str(rowF2[1])
colC = str(rowF2[2])
# Compute the growth
colD = str(int(rowF2[3]) - int(rowF1[3]))
# Write the output file
csvWriter.writerow([colA, colB, colC, colD])
file1.close()
file2.close()
outputFile.close()
从我的角度来看,最大的担忧是:
CSV
库(csv reader and writer)答案 1 :(得分:0)
可以使用Python的CSV库和OrderedDict
来维护原始文件顺序:
from collections import OrderedDict
import csv
with open('day1.csv', 'rb') as f_day1, open('day2.csv', 'rb') as f_day2:
csv_day1 = csv.reader(f_day1)
csv_day2 = csv.reader(f_day2)
header = next(csv_day1) + ['Growth(GB)']
next(csv_day2)
day1 = OrderedDict([row[0], [row[1], row[2], int(row[3])]] for row in csv_day1)
day2 = OrderedDict([row[0], [row[1], row[2], int(row[3])]] for row in csv_day2)
with open('output.csv', 'wb') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(header)
for server, data in day1.items():
data.append(day2[server][2] - data[2])
data[2] = day2[server][2]
csv_output.writerow([server] + data)
为您提供输出CSV文件,如下所示:
Server,Site,Platform,Size(GB),Growth(GB)
a,Primary,Windows,150,50
b,Secondary,Unix,100,-100
c,Primary,Oracle,500,0
注意:使用with
时文件会自动关闭。
在Python 2.7.12上测试
答案 2 :(得分:0)
# Show True/False against column containing NaN(Mached data)
print(difference.isnull().any())
# Count of NaN(Mached data) in each column
print(difference.isnull().sum())
# Count of Mismatched Data in each column
print(difference.count())
# Difference in records from 2 csv loaded in dataframe df
df = difference.dropna(axis=0,how='all')
# OutputFile to be saved as 'output_file'.
df.to_csv(output_file)