我对python很新。我正在尝试将2个csv文件合并为一个,选择特定的行和列。
csv1:
Host, Time Up, Time Down, Time Unreachable, Time Undetermined
server1.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
server2.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
CSV2:
Host,Service, Time OK, Time Warning, Time Unknown, Time Critical, Time Undetermined
server1.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,server_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
server2.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,server_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
以下是组合这两个文件的代码:
import csv
itertools as IT
filenames = ['csv1.csv', 'csv2.csv']
handles = [open(filename, 'rb') for filename in filenames]
readers = [csv.reader(f, delimiter=',') for f in handles]
with open('combined.csv', 'wb') as h:
writer = csv.writer(h, delimiter=',', lineterminator='\n', )
for rows in IT.izip_longest(*readers, fillvalue=['']*3):
combined_row = []
for row in rows:
row = row[:3] # select the columns you want
if len(row) == 3:
combined_row.extend(row)
else:
combined.extend(['']*3)
writer.writerow(combined_row)
for f in handles:
f.close()
这结合并输出了这个:
Host, Time Up, Time Down,Host,Service, Time OK
server1.test.com:1717,100.000% (100.000%),0.000% (0.000%),server1.test.com:1717,application_availability_check,100.000% (100.000%)
server2.test.com:1717,100.000% (100.000%),0.000% (0.000%),,server_hit_rate,100.000% (100.000%)
Average,100.000% (100.000%),0.000% (0.000%),,max_hit_rate,100.000% (100.000%)
,,,,application_log_check,100.000% (100.000%)
,,,,application_sessions_check,100.000% (100.000%)
,,,server2.test.com:1717,application_availability_check,100.000% (100.000%)
,,,,server_hit_rate,100.000% (100.000%)
,,,,max_hit_rate,100.000% (100.000%)
,,,,application_log_check,100.000% (100.000%)
,,,,application_sessions_check,100.000% (100.000%)
,,,Average,100.000% (100.000%),0.000% (0.000%)
但在这里我只想提取以下内容 - 来自csv1& csv2:
Host, Time Up, Time Down,Service, Time OK
server1.test.com:1717,100.000% (100.000%),0.000% (0.000%),application_availability_check,100.000% (100.000%)
server2.test.com:1717,100.000% (100.000%),0.000% (0.000%),application_availability_check,100.000% (100.000%)
有没有办法实现这个目标?
答案 0 :(得分:0)
import pandas as pd
df = pd.read_csv('csv1.csv',skipfooter=1)
df2 = pd.read_csv('csv2.csv',skipfooter=1)
combined = pd.merge(df[['Host','Service','Time OK']],df2[['Host','Time Up','Time Down']], on='Host')
combined['Time OK'] = combined['Time OK'].apply(lambda x: x.split('(')[0])
combined['Time Up'] = combined['Time Up'].apply(lambda x: x.split('(')[0])
combined['Time Down'] = combined['Time Down'].apply(lambda x: x.split('(')[0])
combined.to_csv('combined.csv',index=False)
你应该能够用Pandas轻松解决这个问题,你有这个选择吗?
输出:
,Host, Time Up, Time Down,Service, Time OK
0,server1.test.com:1717,100.000% (100.000%),0.000% (0.000%),application_availability_check,100.000% (100.000%)
1,server2.test.com:1717,100.000% (100.000%),0.000% (0.000%),application_availability_check,100.000% (100.000%)