我有一个像下面的df1,我想检查df2中某些列的所有值是否在df1最大值和最小值之间。如果是的话,我想从该索引的名称栏中给出值。如果df2值不在这些值之间,我想看看它是大于还是小于df1最大值或最小值。
data = {'Name': ['MN1', 'MN2', 'MN3', 'MN4', 'MN5', 'MN6', 'MN7-8', 'MN9', 'MN10', 'MN11', 'MN12', 'MN13', 'MN14', 'MN15', 'MN16','MN17', 'MQ18', 'MQ19'],
'MAX': [23, 21.7, 19.5, 17.2, 16.4, 14.2, 12.85, 11.2, 9.9, 8.9, 7.6, 7.1, 5.3, 5, 3.55, 2.5, 1.9, 0.85],
'MIN':[21.7, 19.5, 17.2, 16.4, 14.2, 12.85, 11.2, 9.9, 8.9, 7.6, 7.1, 5.3, 5, 3.55, 2.5, 1.9, 0.85, 0.01]
}
df1 = pd.DataFrame (data, columns = ['Name','MAX','MIN'])
我尝试过:
list = []
for i in df2['AVERAGE_AGE']:
for index, row in df1.iterrows():
if row['MAX'] >= i and row['MIN'] < i:
list.append(row['Name'])
if i > df1['MAX'].max():
list.append("Postmn")
elif i < df1['MIN'].min():
list.append("Premn")
df2['MNname'] = list
这需要很长时间,并且列表长度与df2的长度不匹配
答案 0 :(得分:0)
您可以尝试
(df2['AVERAGE_AGE'] < df1['MIN'].min()).value_counts()
(df2['AVERAGE_AGE'] > df1['MAX'].max()).value_counts()
这将通过给出True和False的计数来告诉您满足条件的行数。
答案 1 :(得分:0)
您可以循环使用第一个数据框,并使用pandas.DataFrame.loc为第二个数据集设置名称:
>>> df2 = pd.DataFrame([
... 2.299367, 20.688943, 10.245027, 1.412258, 22.541987,
... 2.588420, 5.578598, 11.703629, 12.529066, 17.769196,
... ], columns=['AVERAGE_AGE'])
>>> for index, row in df1.iterrows():
... df2.loc[(df2.AVERAGE_AGE>=row.MIN) & (df2.AVERAGE_AGE<row.MAX),'Name'] = row.Name
...
>>> df2
AVERAGE_AGE Name
0 2.299367 MN17
1 20.688943 MN2
2 10.245027 MN9
3 1.412258 MQ18
4 22.541987 MN1
5 2.588420 MN16
6 5.578598 MN13
7 11.703629 MN7-8
8 12.529066 MN7-8
9 17.769196 MN3
答案 2 :(得分:0)
尝试一下:
arr = []
for i in range(df2.shape[0]):
# Check if the value in COLUMN_1 is between MIN and MAX value
if ((df2['COLUMN_1'][i] > df1['MIN'][i]) and df2['COLUMN_1'][i] < df1['MAX'][i]):
arr.append(df1['Name'][i])
# Check if value in COLUMN_1 is less than Minimum value
elif (df2['COLUMN_1'][i] < df1['MIN'][i]):
arr.append(np.round(df2['COLUMN_1'][i] - df1['MIN'][i], 2))
# Check if value in COLUMN_1 is less than Minimum value
elif (df2['COLUMN_1'][i] > df1['MAX'][i]):
arr.append(np.round(df2['COLUMN_1'][i] - df1['MAX'][i], 2))
df2['Name'] = pd.Series(arr)
由于您没有确切提及要在df2中检查的列的名称,因此我将其用作COLUMN_1。使用的条件和值是:
希望这行得通!