从pandas dataframe列中提取关键字,但不提取嵌套关键字

时间:2017-04-06 03:53:16

标签: python pandas

我有一个Pandas DataFrame,其中包含两个自由格式文本列,人们可以在其中描述其车辆的模型或修剪级别(例如:LE,1LT,RS,SS等)。在这些列中,有些人将只有模型(例如:LE),其他人将添加其他文本(例如:2dr Convertible SS w / 2SS)。此外,模型级别具有一定的层次结构,其中SS <1。 1SS&lt; 1SS 2SS。

我想提取这些模型或修剪级别并在我的数据框中创建一个新列(例如:1ls = 1LS,ZL-1 = ZL1等)

# the model can be stored in either 'submodel' or 'trim'
data = [{'SubModel': 'SS-EDITION', 'Trim': 'SS-EDITION(MANUAL 6 SPEED)  Coupe 2-Door'},
        {'SubModel': 'ZL1', 'Trim': 'ZL1 Coupe 2-Door'},
        {'SubModel': 'N/A', 'Trim': 'SS TRANSFORMER'},
        {'SubModel': '1LT RS AUTO BLUETOOTH REAR CAM', 'Trim': 'N/A'},
        {'SubModel': 'N/A', 'Trim': 'LS'},
        {'SubModel': 'Camaro SS', 'Trim': 'Camaro SS'},
        {'SubModel': 'Dusk Edition', 'Trim': 'N/A'},
        {'SubModel': 'Camaro SS W/ RS Pkg', 'Trim': 'Camaro SS W/ RS Pkg'},
        {'SubModel': '2dr Coupe SS w/2SS', 'Trim': '2dr Coupe SS w/2SS'},
        {'SubModel': '2dr Convertible LT w/1LT', 'Trim': '2dr Convertible LT w/1LT'},
        {'SubModel': 'N/A', 'Trim': '2LT'},
        {'SubModel': "LT RS 6-SPD SUNROOF REAR CAM 20'S", 'Trim': '1LT Coupe 2-Door'},
        {'SubModel': '2dr Convertible SS w/2SS', 'Trim': '2dr Convertible SS w/2SS'},
        {'SubModel': '2dr Convertible LT w/2LT', 'Trim': '2dr Convertible LT w/2LT'},
        {'SubModel': 'N/A', 'Trim': '2LT'},
        {'SubModel': 'N/A', 'Trim': 'RARE ZL1 - LOW MILES'},
        {'SubModel': "2SS AUTO LEATHER NAV HUD 20'S", 'Trim': 'SS Coupe 2-Door'},
        {'SubModel': 'SS', 'Trim': 'SS Coupe 2-Door'},
        {'SubModel': 'N/A', 'Trim': 'Car'},
        {'SubModel': 'N/A', 'Trim': '2LT'}]

# load data into dataframe
df = pd.DataFrame(data)

# create a dict of all models, including alternative spellings
models = {'LE' : 'LE',
          '1LE' : '1LE',
          '2LE' : '2LE',
          'LT' : 'LT',
          '1LT' : '1LT',
          '2LT' : '2LT',
          'LS' : 'LS',
          '1LS' : '1LS',
          '2LS' : '2LS',
          'SS' : 'SS',
          '1SS' : '1SS',
          '2SS' : '2SS',
          'ZL1' : 'ZL1',
          'ZL/1' : 'ZL1',
          'ZL-1' : 'ZL1',
          'COPO' : 'COPO',
          'copo' : 'copo'}

# look for each key in the models dict, and if found, return the value for that key for the column 'TRIM'
def trim_level(row):

    for key in models.keys():
        if key in (row['Trim'] or row['SubModel']):
            return models[key]


df['TRIM'] = df.apply(lambda row: trim_level(row), axis=1)

如下所示,我现有的解决方案存在一个问题,即2SS被归类为SS,或者2LT被归类为LT。我也不知道如何处理在描述中包含2个不同模型的人,例如SS w / 2SS。

    +-----+-------+------------------------------------+-----------------------------------------+
|     | TRIM  |             SubModel               |                  Trim                   |
+-----+-------+------------------------------------+-----------------------------------------+
|  0  | SS    | SS-EDITION                         | SS-EDITION(MANUAL 6 SPEED) Coupe 2-Door |
|  1  | ZL1   | ZL1                                | ZL1 Coupe 2-Door                        |
|  2  | SS    | N/A                                | SS TRANSFORMER                          |
|  3  | None  | 1LT RS AUTO BLUETOOTH REAR CAM     | N/A                                     |
|  4  | LS    | N/A                                | LS                                      |
|  5  | SS    | Camaro SS                          | Camaro SS                               |
|  6  | None  | Dusk Edition                       | N/A                                     |
|  7  | SS    | Camaro SS W/ RS Pkg                | Camaro SS W/ RS Pkg                     |
|  8  | SS    | 2dr Coupe SS w/2SS                 | 2dr Coupe SS w/2SS                      |
|  9  | 1LT   | 2dr Convertible LT w/1LT           | 2dr Convertible LT w/1LT                |
| 10  | LT    | N/A                                | 2LT                                     |
| 11  | 1LT   | LT RS 6-SPD SUNROOF REAR CAM 20'S  | 1LT Coupe 2-Door                        |
| 12  | SS    | 2dr Convertible SS w/2SS           | 2dr Convertible SS w/2SS                |
| 13  | LT    | 2dr Convertible LT w/2LT           | 2dr Convertible LT w/2LT                |
| 14  | LT    | N/A                                | 2LT                                     |
| 15  | LE    | N/A                                | RARE ZL1 - LOW MILES                    |
| 16  | SS    | 2SS AUTO LEATHER NAV HUD 20'S      | SS Coupe 2-Door                         |
| 17  | SS    | SS                                 | SS Coupe 2-Door                         |
| 18  | None  | N/A                                | Car                                     |
| 19  | LT    | N/A                                | 2LT                                     |
+-----+-------+------------------------------------+-----------------------------------------+

0 个答案:

没有答案