CSV列中的行列操作

时间:2017-10-20 13:57:03

标签: python python-2.7 python-3.x pandas csv

我的情况是我的CSV文件包含以下数据:

Host, Time Up, Time OK
server1.test.com:1717,100.00% ,100.00% 
server2.test.com:1717,100.00% ,100.00% 

我正在尝试比较所有行中的列值:

  • 如果col1 <= col2则应在新col1
  • 中打印col3的值
  • 如果col1 > col2,则在col2中打印col3值。

示例:

Time Up(col1), Time OK(col2), Total(col3)
100%              100%         100%
100%              95%          95%
95%               100%         95%

我通过互联网搜索,无法找到任何案例。有没有办法实现这个目标?

EDIT2: 代码 -

import pandas as pd
df = pd.read_csv('3.csv',skipfooter=1)
df2 = pd.read_csv('4.csv',skipfooter=1)
combined = pd.merge(df[['Host',' Time Up']],df2[['Host',' Time OK']], on='Host')
combined[' Time OK'] = combined[' Time OK'].apply(lambda x: x.split('(')[0])
combined[' Time Up'] = combined[' Time Up'].apply(lambda x: x.split('(')[0])
combined.to_csv('combined.csv',index=False)

df =pd.read_csv('combined.csv', skipfooter=1)
col1 = df[' Time Up']
col2 = df[' Time OK']
df['Total'] = col1.where(col1 <= col2, col2)
df.to_csv('combined.csv',index=False)

1 个答案:

答案 0 :(得分:0)

当然,只需使用read_csv()来读取数据:

import pandas as pd
df = pd.read_csv('t.csv') # this is your original example input file

现在你有:

                    Host   Time Up   Time OK
0  server1.test.com:1717  100.00%   100.00% 
1  server2.test.com:1717  100.00%   100.00% 

第一个问题是你的CSV在标题中有虚假的空格。让我们把它清理干净:

df.columns = [col.strip() for col in df.columns] # " Time Up" -> "Time Up"

接下来,请注意您的数据是“100.00%”之类的字符串。干净:

df['Time Up'] = df['Time Up'].str.strip('% ').astype(float)
df['Time OK'] = df['Time OK'].str.strip('% ').astype(float)

现在我们有干净的数据:

                    Host  Time Up  Time OK
0  server1.test.com:1717    100.0    100.0
1  server2.test.com:1717    100.0    100.0

最后,我们可以添加新列:

col1 = df['Time Up']
col2 = df['Time OK']
df['Total'] = col1.where(col1 <= col2, col2)

给我们:

                    Host  Time Up  Time OK  Total
0  server1.test.com:1717    100.0    100.0  100.0
1  server2.test.com:1717    100.0    100.0  100.0

获取Total列的另一种方法是:

df['Total'] = df[['Time Up', 'Time OK']].min(axis=1)

即,取每行的最小值。

如果您想要添加百分号:

df['Total'] = df['Total'].astype(str) + '%'