我需要将两列中的值合并到另一列中。
假设以下是我的熊猫df:
data = {'material':['Matl_A', 'Matl_B', 'Matl_B', 'Matl_A'],
'strength':[10, 20, 30, 100]
df = pd.DataFrame(data)
所以我的df是:
material strength
---------- ----------
Matl_A 10
Matl_B 20
Matl_B 30
Matl_A 100
我想做这样的事情:
material strength grade
---------- ---------- ---------
Matl_A 10 1
Matl_B 20 4
Matl_B 80 5
Matl_A 100 2
什么是最好的方法?
编辑:
我在下面使用了迈克尔·加德纳(Michael Gardner)的答案,并进行了扩展,因为我们材料很多。希望此修订版可以提供更清晰的画面。如果我有20种需要分类的条件范围不同的材料,那将是一种更优雅的方法:
import numpy as np
import pandas as pd
strength = np.random.randint(low=1, high=30, size=20)
material = ['matl_a', 'matl_b', 'matl_b', 'matl_a', 'matl_d',
'matl_b', 'matl_d', 'matl_a', 'matl_a', 'matl_b',
'matl_a', 'matl_b', 'matl_e', 'matl_a', 'matl_c',
'matl_b', 'matl_c', 'matl_a', 'matl_a', 'matl_b']
data = {'material':material,
'strength':strength }
df = pd.DataFrame(data)
def grading(df):
if df['material'] == 'matl_a':
if 0 <= df['strength'] <=10:
return 1
elif 11 <= df['strength'] <= 20:
return 2
elif 21 <= df['strength'] <= 30:
return 3
elif 31 <= df['strength'] <= 40:
return 4
else:
return 5
elif df['material'] == 'matl_b':
if 0 <= df['strength'] <=10:
return 6
elif 11 <= df['strength'] <= 20:
return 7
elif 21 <= df['strength'] <= 30:
return 8
elif 31 <= df['strength'] <= 40:
return 9
else:
return 10
elif df['material'] == 'matl_c':
if 0 <= df['strength'] <=10:
return 11
elif 11 <= df['strength'] <= 20:
return 12
elif 21 <= df['strength'] <= 30:
return 13
elif 31 <= df['strength'] <= 40:
return 14
else:
return 15
else:
if 0 <= df['strength'] <=10:
return 16
elif 11 <= df['strength'] <= 20:
return 17
elif 21 <= df['strength'] <= 30:
return 18
elif 31 <= df['strength'] <= 40:
return 19
else:
return 20
df['grade'] = df.apply(grading, axis=1)
答案 0 :(得分:2)
使用np.select
a = df.material.eq('Matl_A')
b = df.material.eq('Matl_B')
df['grade'] = np.select([a & df.strength.between(5,10),
a & df.strength.between(11,20),
b & df.strength.between(10,50),
b & df.strength.between(50,100)],
['A', 'B', 'A', 'B'],
default='C')
答案 1 :(得分:1)
IN:
data = {'material':['Matl_A', 'Matl_B', 'Matl_B', 'Matl_A'],
'strength':[10, 20, 80, 100] }
df = pd.DataFrame(data)
def grading(df):
if df['material'] == 'Matl_A':
if 5 <= df['strength'] <= 10:
return 'A'
elif 11 <= df['strength'] <= 20:
return 'B'
else:
return 'C'
elif 10 <= df['strength'] <= 50:
return 'A'
elif 50 <= df['strength'] <= 100:
return 'B'
else:
return 'C'
df['grade'] = df.apply(grading, axis=1)
df.head()
OUT:
| material | strength | grade |
|----------|----------|-------|
| Matl_A | 10 | A |
| Matl_B | 20 | A |
| Matl_B | 80 | B |
| Matl_A | 100 | C |
答案 2 :(得分:1)
将成绩定义放入df。
grades = pd.DataFrame([
('Matl_A', 5, 'A'),
('Matl_A', 11, 'B'),
('Matl_A', 21, 'C'),
('Matl_B', 10, 'A'),
('Matl_B', 51, 'B'),
('Matl_B', 101, 'C'),
], columns=('material', 'strength', 'grade'))
grades = grades.sort_values(['strength'])
然后使用pd.merge_asof
pd.merge_asof(df, grades, on='strength', by='material')
可以从外部源(css或db等)加载成绩定义。
这样可以处理大量的材料和等级平板而不会造成混乱。