如何在熊猫中使用嵌套字典映射变量?

时间:2020-08-20 13:53:55

标签: python pandas dictionary

因此,我无法通过搜索以前的帖子真正找到任何可以帮助我完成我想做的事情的东西。我也是python的新手。

本质上,我想做的是找出最简化的方法,从一个变量创建多个变量。

可以说我的数据是这样的

CaseNumber   Offense
ABC123       1      
ABC123       1
ABC124       24
ABC124       62
ABC125       12
ABC126       10

我想知道该怎么做,是否可以使用嵌套字典创建像这样的变量:

offense_variable = { 'Traffic', {1:1},
'Violence', {24:1},
'DUI', {62:1},
'Theft', {12:1},
'Drugs', {10:1}
}

并使用map函数根据进攻中的键为“交通”,“暴力”等创建变量值。

谢谢!

编辑:

The goal is essentially to turn this: 

CaseNumber   Offense
ABC123       1      
ABC123       1
ABC124       24
ABC124       62
ABC125       12
ABC126       10

对此:

CaseNumber   Offense   Traffic   Violence    DUI    Theft    Drugs   Flag    
ABC123       1           1         0          0       0       0        1
ABC123       1           1         0          0       0       0        1
ABC124       24          0         1          0       0       0        1 
ABC124       62          0         0          1       0       0        1
ABC125       12          0         0          0       1       0        0
ABC126       10          0         0          0       0       1        1

,还有一些添加的功能,这些功能合并了其他虚拟标志标志。例如,假设如果最后一列“ flag”为1,则除了offence = 12之外,盗窃也将为1。

1 个答案:

答案 0 :(得分:0)

这是根据以下OP的评论进行的修订后的答复。

from io import StringIO
import pandas as pd

data = '''CaseNumber   Offense
ABC123       1      
ABC123       1
ABC124       24
ABC124       62
ABC125       12
ABC126       10
'''
# create data frame
df = pd.read_csv(StringIO(data), sep='\s+', engine='python')

# create dict of dict
offense_variable = { 'Traffic': {1: 1}, 'Violence': {24: 1},
    'DUI': {62: 1}, 'Theft': {12: 1}, 'Drugs': {10: 1} }

# flatten the offense_variable from nested dicts to ordinary dict
ov = { num: desc
      for desc, vs in offense_variable.items()
      for num, _ in vs.items() }

# use flattened dict to convert Offense (number) to desc (string)
df['offense_desc'] = df['Offense'].map(ov)

# use `.get_dummies()` for one-hot encoding
df = pd.concat([df, pd.get_dummies(df['offense_desc'])], axis=1)

print(df)

  CaseNumber  Offense offense_desc  DUI  Drugs  Theft  Traffic  Violence
0     ABC123        1      Traffic    0      0      0        1         0
1     ABC123        1      Traffic    0      0      0        1         0
2     ABC124       24     Violence    0      0      0        0         1
3     ABC124       62          DUI    1      0      0        0         0
4     ABC125       12        Theft    0      0      1        0         0
5     ABC126       10        Drugs    0      1      0        0         0