csv中的列操作[python]

时间:2017-11-10 00:13:39

标签: python python-2.7 python-3.x pandas csv

我有一个场景,我从csv文件中提取行值。

  

(CSV)test1:

    Host, Time Up, Time Down, Time Unreachable, Time Undetermined
server1.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
server2.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000
  

(CSV)test2:

Host,Service, Time OK, Time Warning, Time Unknown, Time Critical, Time Undetermined
server1.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,server_hit_rate,99.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
server2.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,server_hit_rate,99.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
,application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000
  

这是我的代码:

df = pd.read_csv('test1.csv',skipfooter=1)
df2 = pd.read_csv('test2.csv',skipfooter=1)
combined = pd.merge(df[['Host',' Time Up']],df2[['Host',' Time OK']], on='Host')
combined[' Time OK'] = combined[' Time OK'].apply(lambda x: x.split('(')[0])
combined[' Time Up'] = combined[' Time Up'].apply(lambda x: x.split('(')[0])

在这里,我试图获得“server_hit_rate”的值,即99%&属于第3行数据。 但是使用上面的代码,我能够获得仅在第一行的数据。即。

                    Host    Time Up    Time OK
0  server1.test.com:1717  100.000%   100.000% 
1  server2.test.com:1717  100.000%   100.000%

所需的输出应为:

                    Host    Time Up    Time OK
0  server1.test.com:1717  100.000%    99.000% 
1  server2.test.com:1717  100.000%    99.000% 

任何达到以下目标的建议都会有所帮助。

  

EDIT1:

import pandas as pd
import pandas
import os, shutil, glob
import sys
import datetime
import time
def t1():
    import pandas as pd
    import pandas
    today=datetime.datetime.utcnow().strftime("%a %b %d %H:%M:%S %Z %Y")
    print "date :", today
    df = pd.read_csv('t1.csv',skipfooter=1, engine='python')
    df2 = pd.read_csv('t2.csv',skipfooter=1, engine='python')
    temp = df2.ffill()[df2['Service']=='server_hit_rate']
    combined = pd.merge(df[['Host',' Time Up']],temp[['Host',' Time OK']], on='Host')
    combined[' Time OK'] = combined[' Time OK'].apply(lambda x: x.split('(')[0])
    combined[' Time Up'] = combined[' Time Up'].apply(lambda x: x.split('(')[0])
    combined.to_csv('test.csv',index=False)
t1()


O/P:

Wed Nov 15 10:07:01  2017
Empty DataFrame
Columns: [Host, % Time Up, % Time OK]
Index: []

5 个答案:

答案 0 :(得分:3)

如果您通过正向填充主机选择基于包含Service的{​​{1}}的数据然后合并数据,那将是相当简单的

server_hit_rate

数据框temp = df2.ffill()[df2['Service']=='server_hit_rate'] # Host Service Time OK ... #1 server1.test.com:1717 server_hit_rate 99.000% (100.000%) ... #6 server2.test.com:1717 server_hit_rate 99.000% (100.000%) ... combined = pd.merge(df[['Host',' Time Up']],temp[['Host',' Time OK']], on='Host') combined[' Time OK'] = combined[' Time OK'].apply(lambda x: x.split('(')[0]) combined[' Time Up'] = combined[' Time Up'].apply(lambda x: x.split('(')[0]) 的输出:

print(combined)

                  Host    Time Up   Time OK
0  server1.test.com:1717  100.000%   99.000% 
1  server2.test.com:1717  100.000%   99.000% 

在列名称之前使用

删除空格,而不是使用空格
combined

答案 1 :(得分:0)

csv库中的DictReader工具对于这类事情很方便 - 它将列标题转换为字典键,然后您可以像查询任何其他字典一样查询每一行。

geckoWebBrowser1.Reload();

输出不是您想要的格式,但应该为您提供基础。

答案 2 :(得分:0)

我认为这是一个更好的代码来获得你想要的结果。注意我没有保留“%”,因为您已经表明要稍后选择较大的列。这样我们转换为数字并且只使用我们需要的列,我们也从头开始摆脱列名中的烦人空格。通过设置索引,我们可以让Pandas在不调用merge的情况下排列条目。

def parse_percentage(perc_string):
    "Parse the percentage strings of the form 99.00% (99.00%)"
    return float(perc_string.split('%')[0])

t1 = pd.read_csv('t1.csv', 
                 skipfooter=1, 
                 engine='python',
                 sep=' *, *',  # This gets rid of the spaces
                 index_col='Host', 
                 usecols=['Host', 'Time Up'],
                 converters={'Time Up': parse_percentage})

t2 = pd.read_csv('t2.csv',
                  skipfooter=1, 
                  engine='python',
                  sep=' *, *',
                  usecols=['Host', 'Service', 'Time OK'],
                  converters={'Time OK': parse_percentage}).fillna(method='ffill').set_index('Host')

combined = pandas.concat([t1, t2[t2.Service == 'server_hit_rate']['Time OK']], axis=1)
combined.to_csv('test.csv)

答案 3 :(得分:0)

我使用过Python3.6。认为这应该给你想要的东西。

import pandas as pd

df1 = pd.read_csv('t1.csv', skipfooter=1)
df1.columns = [c.strip() for c in df1.columns]
df2 = pd.read_csv('t2.csv', skipfooter=1)
df2.columns = [c.strip() for c in df2.columns]
df2 = df2.ffill()
combined = pd.merge(df1[['Host', 'Time Up']], df2[['Host', 'Service', 'Time OK']], on='Host')
combined['Time Up'] = combined['Time Up'].apply(lambda x : x.split('(')[0])
combined['Time OK'] = combined['Time OK'].apply(lambda x : x.split('(')[0])
print(combined[combined.Service == 'server_hit_rate'])

答案 4 :(得分:0)

在我的日子里,回答你的挑战是一个不错的咖啡休息时间。请参阅下面的代码。它适用于CSV1和CSV2文件,因为我为您的搜索创建了server-name和search-key变量。对于需要时实施的学习曲线“#+评论”。没有额外的进口或任何需要。只是简单的pythonic写作。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

# lists: csv1 and csv2 mimick reading from file.

csv1 =  ["Host, Time Up, Time Down, Time Unreachable, Time Undetermined",
         "server1.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000",
         "server2.test.com:1717,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000",
         "Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000"]

csv2 =  ["Host,Service, Time OK, Time Warning, Time Unknown, Time Critical, Time Undetermined",
         "server1.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
         ",server_hit_rate,99.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
         ",max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
         ",application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
         ",application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
         "server2.test.com:1717,application_availability_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
         ",server_hit_rate,99.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
         ",max_hit_rate,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
         ",application_log_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
         ",application_sessions_check,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000",
         "Average,100.000% (100.000%),0.000% (0.000%),0.000% (0.000%),0.000% (0.000%),0.000"]

# assuming your provided data comes from a static file on hdd and can be read by using readline().

total_servers        = 2
count_server         = 0
current_server_name  = ''
result_dict          = {}

# added implementable server-number; just in case you got multiple servers as your example shows.
server_name = "server%s.test.com:"
search_key = ",server_hit_rate"

# the while-loop ploughs/iters through the file for a reason: > someone may have changed the order of servernames randomly.

while count_server < total_servers:
    for line in csv2:
    #    print line  # -> to check output on screen

        current_server_name = server_name % str(count_server + 1) # Some folks..start counting at "1"...

        if line.startswith((current_server_name)):
            print current_server_name

        if not line.startswith((search_key)):
            continue
        else:
#            print current_server_name
            print 'got your line of interest : "%s"' % line  # -> to check output on screen
            items = line.split(',')
            value = items[2]
            result_dict[current_server_name] = value

            count_server +=1

print result_dict

享受!