通过csv文件读取和解析[python3.6]

时间:2017-10-19 12:07:45

标签: python python-3.x pandas csv

我对python很新。我正在尝试将2个csv文件合并为一个,选择特定的行和列。

csv1:

Host, Time Up, Time Down, Time Unreachable, Time Undetermined
server1.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
server2.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000

CSV2:

Host,Service, Time OK, Time Warning, Time Unknown, Time Critical, Time Undetermined
server1.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,server_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
server2.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,server_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000

以下是组合这两个文件的代码:

import csv
itertools as IT

filenames = ['csv1.csv', 'csv2.csv']
handles = [open(filename, 'rb') for filename in filenames]    
readers = [csv.reader(f, delimiter=',') for f in handles]

with  open('combined.csv', 'wb') as h:
    writer = csv.writer(h, delimiter=',', lineterminator='\n', )
    for rows in IT.izip_longest(*readers, fillvalue=['']*3):
        combined_row = []
        for row in rows:
            row = row[:3] # select the columns you want
            if len(row) == 3:
                combined_row.extend(row)
            else:
                combined.extend(['']*3)
        writer.writerow(combined_row)

for f in handles:
    f.close()

这结合并输出了这个:

Host, Time Up, Time Down,Host,Service, Time OK
server1.test.com:1717,100.000% (100.000%),0.000%      (0.000%),server1.test.com:1717,application_availability_check,100.000% (100.000%)
server2.test.com:1717,100.000% (100.000%),0.000%   (0.000%),,server_hit_rate,100.000% (100.000%)
Average,100.000% (100.000%),0.000% (0.000%),,max_hit_rate,100.000% (100.000%)
,,,,application_log_check,100.000% (100.000%)
,,,,application_sessions_check,100.000% (100.000%)
,,,server2.test.com:1717,application_availability_check,100.000%   (100.000%)
,,,,server_hit_rate,100.000% (100.000%)
,,,,max_hit_rate,100.000% (100.000%)
,,,,application_log_check,100.000% (100.000%)
,,,,application_sessions_check,100.000% (100.000%)
,,,Average,100.000% (100.000%),0.000% (0.000%)

但在这里我只想提取以下内容 - 来自csv1& csv2:

Host, Time Up, Time Down,Service, Time OK
server1.test.com:1717,100.000% (100.000%),0.000%   (0.000%),application_availability_check,100.000% (100.000%)
server2.test.com:1717,100.000% (100.000%),0.000% (0.000%),application_availability_check,100.000% (100.000%)

有没有办法实现这个目标?

1 个答案:

答案 0 :(得分:0)

import pandas as pd

df = pd.read_csv('csv1.csv',skipfooter=1)
df2 = pd.read_csv('csv2.csv',skipfooter=1)


combined = pd.merge(df[['Host','Service','Time OK']],df2[['Host','Time Up','Time Down']], on='Host')

combined['Time OK'] = combined['Time OK'].apply(lambda x: x.split('(')[0])
combined['Time Up'] = combined['Time Up'].apply(lambda x: x.split('(')[0])
combined['Time Down'] = combined['Time Down'].apply(lambda x: x.split('(')[0])


combined.to_csv('combined.csv',index=False)

你应该能够用Pandas轻松解决这个问题,你有这个选择吗?

输出:

,Host, Time Up, Time Down,Service, Time OK
0,server1.test.com:1717,100.000% (100.000%),0.000%  (0.000%),application_availability_check,100.000% (100.000%)
1,server2.test.com:1717,100.000% (100.000%),0.000%  (0.000%),application_availability_check,100.000% (100.000%)