我有一个场景,我从csv文件中提取行值。
(CSV)test1:
Host, Time Up, Time Down, Time Unreachable, Time Undetermined
server1.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
server2.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
(CSV)test2:
Host,Service, Time OK, Time Warning, Time Unknown, Time Critical, Time Undetermined
server1.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,server_hit_rate,99.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
server2.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,server_hit_rate,99.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
这是我的代码:
df = pd.read_csv('test1.csv',skipfooter=1)
df2 = pd.read_csv('test2.csv',skipfooter=1)
combined = pd.merge(df[['Host',' Time Up']],df2[['Host',' Time OK']], on='Host')
combined[' Time OK'] = combined[' Time OK'].apply(lambda x: x.split('(')[0])
combined[' Time Up'] = combined[' Time Up'].apply(lambda x: x.split('(')[0])
在这里,我试图获得“server_hit_rate”的值,即99%&属于第3行数据。 但是使用上面的代码,我能够获得仅在第一行的数据。即。
Host Time Up Time OK
0 server1.test.com:1717 100.000% 100.000%
1 server2.test.com:1717 100.000% 100.000%
所需的输出应为:
Host Time Up Time OK
0 server1.test.com:1717 100.000% 99.000%
1 server2.test.com:1717 100.000% 99.000%
任何达到以下目标的建议都会有所帮助。
EDIT1:
import pandas as pd
import pandas
import os, shutil, glob
import sys
import datetime
import time
def t1():
import pandas as pd
import pandas
today=datetime.datetime.utcnow().strftime("%a %b %d %H:%M:%S %Z %Y")
print "date :", today
df = pd.read_csv('t1.csv',skipfooter=1, engine='python')
df2 = pd.read_csv('t2.csv',skipfooter=1, engine='python')
temp = df2.ffill()[df2['Service']=='server_hit_rate']
combined = pd.merge(df[['Host',' Time Up']],temp[['Host',' Time OK']], on='Host')
combined[' Time OK'] = combined[' Time OK'].apply(lambda x: x.split('(')[0])
combined[' Time Up'] = combined[' Time Up'].apply(lambda x: x.split('(')[0])
combined.to_csv('test.csv',index=False)
t1()
O/P:
Wed Nov 15 10:07:01 2017
Empty DataFrame
Columns: [Host, % Time Up, % Time OK]
Index: []
答案 0 :(得分:3)
如果您通过正向填充主机选择基于包含Service
的{{1}}的数据然后合并数据,那将是相当简单的
server_hit_rate
数据框temp = df2.ffill()[df2['Service']=='server_hit_rate']
# Host Service Time OK ...
#1 server1.test.com:1717 server_hit_rate 99.000% (100.000%) ...
#6 server2.test.com:1717 server_hit_rate 99.000% (100.000%) ...
combined = pd.merge(df[['Host',' Time Up']],temp[['Host',' Time OK']], on='Host')
combined[' Time OK'] = combined[' Time OK'].apply(lambda x: x.split('(')[0])
combined[' Time Up'] = combined[' Time Up'].apply(lambda x: x.split('(')[0])
的输出:
print(combined) Host Time Up Time OK 0 server1.test.com:1717 100.000% 99.000% 1 server2.test.com:1717 100.000% 99.000%
在列名称之前使用
删除空格,而不是使用空格combined
答案 1 :(得分:0)
csv库中的DictReader工具对于这类事情很方便 - 它将列标题转换为字典键,然后您可以像查询任何其他字典一样查询每一行。
geckoWebBrowser1.Reload();
输出不是您想要的格式,但应该为您提供基础。
答案 2 :(得分:0)
我认为这是一个更好的代码来获得你想要的结果。注意我没有保留“%”,因为您已经表明要稍后选择较大的列。这样我们转换为数字并且只使用我们需要的列,我们也从头开始摆脱列名中的烦人空格。通过设置索引,我们可以让Pandas在不调用merge的情况下排列条目。
def parse_percentage(perc_string):
"Parse the percentage strings of the form 99.00% (99.00%)"
return float(perc_string.split('%')[0])
t1 = pd.read_csv('t1.csv',
skipfooter=1,
engine='python',
sep=' *, *', # This gets rid of the spaces
index_col='Host',
usecols=['Host', 'Time Up'],
converters={'Time Up': parse_percentage})
t2 = pd.read_csv('t2.csv',
skipfooter=1,
engine='python',
sep=' *, *',
usecols=['Host', 'Service', 'Time OK'],
converters={'Time OK': parse_percentage}).fillna(method='ffill').set_index('Host')
combined = pandas.concat([t1, t2[t2.Service == 'server_hit_rate']['Time OK']], axis=1)
combined.to_csv('test.csv)
答案 3 :(得分:0)
我使用过Python3.6。认为这应该给你想要的东西。
import pandas as pd
df1 = pd.read_csv('t1.csv', skipfooter=1)
df1.columns = [c.strip() for c in df1.columns]
df2 = pd.read_csv('t2.csv', skipfooter=1)
df2.columns = [c.strip() for c in df2.columns]
df2 = df2.ffill()
combined = pd.merge(df1[['Host', 'Time Up']], df2[['Host', 'Service', 'Time OK']], on='Host')
combined['Time Up'] = combined['Time Up'].apply(lambda x : x.split('(')[0])
combined['Time OK'] = combined['Time OK'].apply(lambda x : x.split('(')[0])
print(combined[combined.Service == 'server_hit_rate'])
答案 4 :(得分:0)
在我的日子里,回答你的挑战是一个不错的咖啡休息时间。请参阅下面的代码。它适用于CSV1和CSV2文件,因为我为您的搜索创建了server-name和search-key变量。对于需要时实施的学习曲线“#+评论”。没有额外的进口或任何需要。只是简单的pythonic写作。
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# lists: csv1 and csv2 mimick reading from file.
csv1 = ["Host, Time Up, Time Down, Time Unreachable, Time Undetermined",
"server1.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000",
"server2.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000",
"Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000"]
csv2 = ["Host,Service, Time OK, Time Warning, Time Unknown, Time Critical, Time Undetermined",
"server1.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
",server_hit_rate,99.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
",max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
",application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
",application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
"server2.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
",server_hit_rate,99.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
",max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
",application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
",application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
"Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000"]
# assuming your provided data comes from a static file on hdd and can be read by using readline().
total_servers = 2
count_server = 0
current_server_name = ''
result_dict = {}
# added implementable server-number; just in case you got multiple servers as your example shows.
server_name = "server%s.test.com:"
search_key = ",server_hit_rate"
# the while-loop ploughs/iters through the file for a reason: > someone may have changed the order of servernames randomly.
while count_server < total_servers:
for line in csv2:
# print line # -> to check output on screen
current_server_name = server_name % str(count_server + 1) # Some folks..start counting at "1"...
if line.startswith((current_server_name)):
print current_server_name
if not line.startswith((search_key)):
continue
else:
# print current_server_name
print 'got your line of interest : "%s"' % line # -> to check output on screen
items = line.split(',')
value = items[2]
result_dict[current_server_name] = value
count_server +=1
print result_dict
享受!