我有一个Pandas DataFrame
,其中包含两个自由格式文本列,人们可以在其中描述其车辆的模型或修剪级别(例如:LE,1LT,RS,SS等)。在这些列中,有些人将只有模型(例如:LE),其他人将添加其他文本(例如:2dr Convertible SS w / 2SS)。此外,模型级别具有一定的层次结构,其中SS <1。 1SS&lt; 1SS 2SS。
我想提取这些模型或修剪级别并在我的数据框中创建一个新列(例如:1ls = 1LS,ZL-1 = ZL1等)
# the model can be stored in either 'submodel' or 'trim'
data = [{'SubModel': 'SS-EDITION', 'Trim': 'SS-EDITION(MANUAL 6 SPEED) Coupe 2-Door'},
{'SubModel': 'ZL1', 'Trim': 'ZL1 Coupe 2-Door'},
{'SubModel': 'N/A', 'Trim': 'SS TRANSFORMER'},
{'SubModel': '1LT RS AUTO BLUETOOTH REAR CAM', 'Trim': 'N/A'},
{'SubModel': 'N/A', 'Trim': 'LS'},
{'SubModel': 'Camaro SS', 'Trim': 'Camaro SS'},
{'SubModel': 'Dusk Edition', 'Trim': 'N/A'},
{'SubModel': 'Camaro SS W/ RS Pkg', 'Trim': 'Camaro SS W/ RS Pkg'},
{'SubModel': '2dr Coupe SS w/2SS', 'Trim': '2dr Coupe SS w/2SS'},
{'SubModel': '2dr Convertible LT w/1LT', 'Trim': '2dr Convertible LT w/1LT'},
{'SubModel': 'N/A', 'Trim': '2LT'},
{'SubModel': "LT RS 6-SPD SUNROOF REAR CAM 20'S", 'Trim': '1LT Coupe 2-Door'},
{'SubModel': '2dr Convertible SS w/2SS', 'Trim': '2dr Convertible SS w/2SS'},
{'SubModel': '2dr Convertible LT w/2LT', 'Trim': '2dr Convertible LT w/2LT'},
{'SubModel': 'N/A', 'Trim': '2LT'},
{'SubModel': 'N/A', 'Trim': 'RARE ZL1 - LOW MILES'},
{'SubModel': "2SS AUTO LEATHER NAV HUD 20'S", 'Trim': 'SS Coupe 2-Door'},
{'SubModel': 'SS', 'Trim': 'SS Coupe 2-Door'},
{'SubModel': 'N/A', 'Trim': 'Car'},
{'SubModel': 'N/A', 'Trim': '2LT'}]
# load data into dataframe
df = pd.DataFrame(data)
# create a dict of all models, including alternative spellings
models = {'LE' : 'LE',
'1LE' : '1LE',
'2LE' : '2LE',
'LT' : 'LT',
'1LT' : '1LT',
'2LT' : '2LT',
'LS' : 'LS',
'1LS' : '1LS',
'2LS' : '2LS',
'SS' : 'SS',
'1SS' : '1SS',
'2SS' : '2SS',
'ZL1' : 'ZL1',
'ZL/1' : 'ZL1',
'ZL-1' : 'ZL1',
'COPO' : 'COPO',
'copo' : 'copo'}
# look for each key in the models dict, and if found, return the value for that key for the column 'TRIM'
def trim_level(row):
for key in models.keys():
if key in (row['Trim'] or row['SubModel']):
return models[key]
df['TRIM'] = df.apply(lambda row: trim_level(row), axis=1)
如下所示,我现有的解决方案存在一个问题,即2SS被归类为SS,或者2LT被归类为LT。我也不知道如何处理在描述中包含2个不同模型的人,例如SS w / 2SS。
+-----+-------+------------------------------------+-----------------------------------------+
| | TRIM | SubModel | Trim |
+-----+-------+------------------------------------+-----------------------------------------+
| 0 | SS | SS-EDITION | SS-EDITION(MANUAL 6 SPEED) Coupe 2-Door |
| 1 | ZL1 | ZL1 | ZL1 Coupe 2-Door |
| 2 | SS | N/A | SS TRANSFORMER |
| 3 | None | 1LT RS AUTO BLUETOOTH REAR CAM | N/A |
| 4 | LS | N/A | LS |
| 5 | SS | Camaro SS | Camaro SS |
| 6 | None | Dusk Edition | N/A |
| 7 | SS | Camaro SS W/ RS Pkg | Camaro SS W/ RS Pkg |
| 8 | SS | 2dr Coupe SS w/2SS | 2dr Coupe SS w/2SS |
| 9 | 1LT | 2dr Convertible LT w/1LT | 2dr Convertible LT w/1LT |
| 10 | LT | N/A | 2LT |
| 11 | 1LT | LT RS 6-SPD SUNROOF REAR CAM 20'S | 1LT Coupe 2-Door |
| 12 | SS | 2dr Convertible SS w/2SS | 2dr Convertible SS w/2SS |
| 13 | LT | 2dr Convertible LT w/2LT | 2dr Convertible LT w/2LT |
| 14 | LT | N/A | 2LT |
| 15 | LE | N/A | RARE ZL1 - LOW MILES |
| 16 | SS | 2SS AUTO LEATHER NAV HUD 20'S | SS Coupe 2-Door |
| 17 | SS | SS | SS Coupe 2-Door |
| 18 | None | N/A | Car |
| 19 | LT | N/A | 2LT |
+-----+-------+------------------------------------+-----------------------------------------+