在Pandas Python中将字符串映射到0

时间:2019-04-07 22:45:59

标签: python string pandas mapping

我正在尝试使用Open BigQuery数据集研究酒精和毒品对车祸的影响。我的数据集已经准备就绪,可以进一步完善。我想对pandas列中的字符串条目进行分类。

数据框超过11,000个条目,每列中大约有44个唯一值。但是,我只想将分别说“饮酒”和“毒品(非法)”的条目分别分类为1和1。我想将任何其他条目映射到0。

我已经创建了一个清单,其中列出了我不关心并希望删除的所有条目,它们的列表如下:

list_ign  = ['Backing Unsafely',
   'Turning Improperly', 'Other Vehicular',
   'Driver Inattention/Distraction', 'Following Too Closely',
   'Oversized Vehicle', 'Driver Inexperience', 'Brakes Defective',
   'View Obstructed/Limited', 'Passing or Lane Usage Improper',
   'Unsafe Lane Changing', 'Failure to Yield Right-of-Way',
   'Fatigued/Drowsy', 'Prescription Medication',
   'Failure to Keep Right', 'Pavement Slippery', 'Lost Consciousness',
   'Cell Phone (hands-free)', 'Outside Car Distraction',
   'Traffic Control Disregarded', 'Fell Asleep',
   'Passenger Distraction', 'Physical Disability', 'Illness', 'Glare',
   'Other Electronic Device', 'Obstruction/Debris', 'Unsafe Speed',
   'Aggressive Driving/Road Rage',
   'Pedestrian/Bicyclist/Other Pedestrian Error/Confusion',
   'Reaction to Other Uninvolved Vehicle', 'Steering Failure',
   'Traffic Control Device Improper/Non-Working',
   'Tire Failure/Inadequate', 'Animals Action',
   'Driverless/Runaway Vehicle']

我应该怎么做才能将“酒精滥用”和“毒品(非法)”分别映射为1,并将列表中的所有内容都设置为0

2 个答案:

答案 0 :(得分:2)

假设您的源列名为Crime

import numpy as np

df['Illegal'] = np.where(df['Crime'].isin(['Alcohol Involvement', 'Drugs']), 1, 0)

或者,

df['Crime'] = df['Crime'].isin(['Alcohol Involvement', 'Drugs']).astype(int)

答案 1 :(得分:0)

因此,尽管上述方法可以正常工作。但是,他们并没有标记我以后要删除的所有类别。所以,我用这种方法,

for word in list_ign:
    df = df.replace(str(word), 'Replace')