我的情况是我的CSV文件包含以下数据:
Host, Time Up, Time OK
server1.test.com:1717,100.00% ,100.00%
server2.test.com:1717,100.00% ,100.00%
我正在尝试比较所有行中的列值:
col1 <= col2
则应在新col1
col3
的值
col1 > col2
,则在col2
中打印col3
值。示例:
Time Up(col1), Time OK(col2), Total(col3)
100% 100% 100%
100% 95% 95%
95% 100% 95%
我通过互联网搜索,无法找到任何案例。有没有办法实现这个目标?
EDIT2: 代码 -
import pandas as pd
df = pd.read_csv('3.csv',skipfooter=1)
df2 = pd.read_csv('4.csv',skipfooter=1)
combined = pd.merge(df[['Host',' Time Up']],df2[['Host',' Time OK']], on='Host')
combined[' Time OK'] = combined[' Time OK'].apply(lambda x: x.split('(')[0])
combined[' Time Up'] = combined[' Time Up'].apply(lambda x: x.split('(')[0])
combined.to_csv('combined.csv',index=False)
df =pd.read_csv('combined.csv', skipfooter=1)
col1 = df[' Time Up']
col2 = df[' Time OK']
df['Total'] = col1.where(col1 <= col2, col2)
df.to_csv('combined.csv',index=False)
答案 0 :(得分:0)
当然,只需使用read_csv()
来读取数据:
import pandas as pd
df = pd.read_csv('t.csv') # this is your original example input file
现在你有:
Host Time Up Time OK
0 server1.test.com:1717 100.00% 100.00%
1 server2.test.com:1717 100.00% 100.00%
第一个问题是你的CSV在标题中有虚假的空格。让我们把它清理干净:
df.columns = [col.strip() for col in df.columns] # " Time Up" -> "Time Up"
接下来,请注意您的数据是“100.00%”之类的字符串。干净:
df['Time Up'] = df['Time Up'].str.strip('% ').astype(float)
df['Time OK'] = df['Time OK'].str.strip('% ').astype(float)
现在我们有干净的数据:
Host Time Up Time OK
0 server1.test.com:1717 100.0 100.0
1 server2.test.com:1717 100.0 100.0
最后,我们可以添加新列:
col1 = df['Time Up']
col2 = df['Time OK']
df['Total'] = col1.where(col1 <= col2, col2)
给我们:
Host Time Up Time OK Total
0 server1.test.com:1717 100.0 100.0 100.0
1 server2.test.com:1717 100.0 100.0 100.0
获取Total列的另一种方法是:
df['Total'] = df[['Time Up', 'Time OK']].min(axis=1)
即,取每行的最小值。
如果您想要添加百分号:
df['Total'] = df['Total'].astype(str) + '%'