Python Pandas: 我想比较excel中两列中的值/字符串,并根据给定的条件在新列中返回一个字符串/值。 我尝试了下面的代码..但输出比实际数组更长..
有人可以帮我解决一下吗
Resource = []
for x in df['Category']:
for y in df['Service_Line']:
if x=='low space'and y=='Intel':
Resource.append('Rhythm')
elif x=='log space' and y=='Intel':
Resource.append('Blue')
elif x=='CPU usage' and y=='Intel':
Resource.append('Jazz')
else:
Resource.append('Other')
print('Resource')
df['Resource'] = Resource
print(df)
样本数据
d = {'Category': {0: 'low space',1: 'CPU usage',2: 'log space',3: 'low volume',4: 'CPU usage',5: 'low volume',6: 'CPU usage',7: 'log space',8: 'log spac',9: 'other',10: 'other',11: 'Low space'},
'Service_Line': {0: 'Intel',1: 'SQL',2: 'Intel',3: 'BUR',4: 'AIX',5: 'BUR',
6: 'Intel',7: 'SQL',8: 'AIX',9:'SAN',10: 'SAN',11: 'SQL'},
'summary_data': {0: 'low space in server123',1: 'Server213f3 CPU usage', 2: 'getting more data in log space',3: 'low volume space in server',4: 'high CPU usage by application',5: 'low volume space in server',6: 'high CPU usage by application',7: 'getting more data in log space',8: 'getting more data in log space',9: 'space in server123',10: 'space in server123',11: np.nan}}
df = pd.DataFrame(d)
Category Service_Line summary_data
0 low space Intel low space in server123
1 CPU usage SQL Server213f3 CPU usage
2 log space Intel getting more data in log space
3 low volume BUR low volume space in server
4 CPU usage AIX high CPU usage by application
5 low volume BUR low volume space in server
6 CPU usage Intel high CPU usage by application
7 log space SQL getting more data in log space
8 log spac AIX getting more data in log space
9 other SAN space in server123
10 other SAN space in server123
11 Low space SQL NaN
答案 0 :(得分:0)
Resource = []
for i, x in enumerate(df['Category']):
y = df['Service_Line'][ i ]
if x=='low space'and y=='Intel':
Resource.append('Rhythm')
elif x=='log space' and y=='Intel':
Resource.append('Blue')
elif x=='CPU usage' and y=='Intel':
Resource.append('Jazz')
else:
Resource.append('Other')
print('Resource')
df['Resource'] = Resource
print(df)
这应该是IIUC你的问题。
您的代码的问题是它在Resources中生成N * N值,因为对于每个x,它将得到N个Y,并且您将值放在Resources中。
您也可以使用df.index而不是枚举 如
for i in df.index:
x = df['Category'][ i ]
y = df['Service_Line'][ i ]
if x=='low space'and y=='Intel':
Resource.append('Rhythm')
elif x=='log space' and y=='Intel':
Resource.append('Blue')
elif x=='CPU usage' and y=='Intel':
Resource.append('Jazz')
else:
Resource.append('Other')
print('Resource')
df['Resource'] = Resource
print(df)
答案 1 :(得分:0)
在列表中定义所有条件
conditions = [((df.Category == 'low space') & (df.Service_Line == 'Intel')),
((df.Category == 'log space') & (df.Service_Line == 'Intel')),
((df.Category == 'CPU usage') & (df.Service_Line == 'Intel'))]
然后使用numpy中的select
import numpy as np
df['Resource'] = np.select(conditions,['Rhythm','Blue','Jazz'],default='Other')
Service_Line summary_data Category Resource
0 Intel low space in server123 low space Rhythm
1 SQL Server213f3 CPU usage CPU usage Other
2 Intel getting more data in log space log space Blue
3 BUR low volume space in server low volume Other
4 AIX high CPU usage by application CPU usage Other
5 BUR low volume space in server low volume Other
6 Intel high CPU usage by application CPU usage Jazz
7 SQL getting more data in log space log space Other
8 AIX getting more data in log space log spac Other
9 SAN space in server123 other Other
10 SAN space in server123 other Other
11 SQL NaN Low space Other