因此,我无法通过搜索以前的帖子真正找到任何可以帮助我完成我想做的事情的东西。我也是python的新手。
本质上,我想做的是找出最简化的方法,从一个变量创建多个变量。
可以说我的数据是这样的
CaseNumber Offense
ABC123 1
ABC123 1
ABC124 24
ABC124 62
ABC125 12
ABC126 10
我想知道该怎么做,是否可以使用嵌套字典创建像这样的变量:
offense_variable = { 'Traffic', {1:1},
'Violence', {24:1},
'DUI', {62:1},
'Theft', {12:1},
'Drugs', {10:1}
}
并使用map函数根据进攻中的键为“交通”,“暴力”等创建变量值。
谢谢!
编辑:
The goal is essentially to turn this:
CaseNumber Offense
ABC123 1
ABC123 1
ABC124 24
ABC124 62
ABC125 12
ABC126 10
对此:
CaseNumber Offense Traffic Violence DUI Theft Drugs Flag
ABC123 1 1 0 0 0 0 1
ABC123 1 1 0 0 0 0 1
ABC124 24 0 1 0 0 0 1
ABC124 62 0 0 1 0 0 1
ABC125 12 0 0 0 1 0 0
ABC126 10 0 0 0 0 1 1
,还有一些添加的功能,这些功能合并了其他虚拟标志标志。例如,假设如果最后一列“ flag”为1,则除了offence = 12之外,盗窃也将为1。
答案 0 :(得分:0)
这是根据以下OP的评论进行的修订后的答复。
from io import StringIO
import pandas as pd
data = '''CaseNumber Offense
ABC123 1
ABC123 1
ABC124 24
ABC124 62
ABC125 12
ABC126 10
'''
# create data frame
df = pd.read_csv(StringIO(data), sep='\s+', engine='python')
# create dict of dict
offense_variable = { 'Traffic': {1: 1}, 'Violence': {24: 1},
'DUI': {62: 1}, 'Theft': {12: 1}, 'Drugs': {10: 1} }
# flatten the offense_variable from nested dicts to ordinary dict
ov = { num: desc
for desc, vs in offense_variable.items()
for num, _ in vs.items() }
# use flattened dict to convert Offense (number) to desc (string)
df['offense_desc'] = df['Offense'].map(ov)
# use `.get_dummies()` for one-hot encoding
df = pd.concat([df, pd.get_dummies(df['offense_desc'])], axis=1)
print(df)
CaseNumber Offense offense_desc DUI Drugs Theft Traffic Violence
0 ABC123 1 Traffic 0 0 0 1 0
1 ABC123 1 Traffic 0 0 0 1 0
2 ABC124 24 Violence 0 0 0 0 1
3 ABC124 62 DUI 1 0 0 0 0
4 ABC125 12 Theft 0 0 1 0 0
5 ABC126 10 Drugs 0 1 0 0 0