我想为每个唯一值获取虚拟变量。想法是将数据框变成多标签目标。我该怎么办?
数据:
C:\Users\danny\AppData\Local\Programs\Python\Python38-32\lib\site-packages\django\core\handlers\exception.py in inner
response = get_response(request) …
▼ Local vars
Variable Value
exc
ValueError("The view accounts.views.accept_follow_request didn't return an HttpResponse object. It returned None instead.")
get_response
<bound method BaseHandler._get_response of <django.core.handlers.wsgi.WSGIHandler object at 0x039FC6E8>>
request
<WSGIRequest: GET '/accounts/users/2/accept_follower/'>
C:\Users\danny\AppData\Local\Programs\Python\Python38-32\lib\site-packages\django\core\handlers\base.py in _get_response
raise ValueError( …
▼ Local vars
Variable Value
callback
<function accept_follow_request at 0x04D19E80>
callback_args
()
callback_kwargs
{'id': 2}
middleware_method
<bound method CsrfViewMiddleware.process_view of <django.middleware.csrf.CsrfViewMiddleware object at 0x03A2B3D0>>
request
<WSGIRequest: GET '/accounts/users/2/accept_follower/'>
resolver
<URLResolver 'socialwebsite.urls' (None:None) '^/'>
resolver_match
ResolverMatch(func=accounts.views.accept_follow_request, args=(), kwargs={'id': 2}, url_name=accept_follow_request, app_names=[], namespaces=[], route=accounts/users/<int:id>/accept_follower/)
response
None
self
<django.core.handlers.wsgi.WSGIHandler object at 0x039FC6E8>
view_name
'accept_follow_request'
wrapped_callback
<function accept_follow_request at 0x04D19E80>
所需的输出:
ID L2
A Firewall
A Security
B Communications
C Business
C Switches
我已经尝试过ID Firewall Security Communications Business Switches
A 1 1 0 0 0
B 0 0 1 0 0
C 0 0 0 1 1
,但它需要一个汇总列。我也尝试过对this link的答案,但它对值求和而不是变成二进制伪列。非常感谢您的帮助。非常感谢!
答案 0 :(得分:4)
crosstab
,然后转换为布尔值:
pd.crosstab(df['ID'],df['L2']).astype(bool)
输出:
L2 Business Communications Firewall Security Switches
ID
A False False True True False
B False True False False False
C True False False False True
答案 1 :(得分:2)
让我们依次set_index
和get_dummies
,因为每个ID中都有多个重复项,因此我们需要sum
和level = 0
s = df.set_index('ID')['L2'].str.get_dummies().max(level=0).reset_index()
Out[175]:
ID Business Communications Firewall Security Switches
0 A 0 0 1 1 0
1 B 0 1 0 0 0
2 C 1 0 0 0 1
答案 2 :(得分:1)
如果更改pivot_table
,则可以使用aggfunc=any
。
print(df.pivot_table(index='ID', columns='L2',
aggfunc=any, fill_value=False)\
.astype(int))
L2 Business Communications Firewall Security Switches
ID
A 0 0 1 1 0
B 0 1 0 0 0
C 1 0 0 0 1
以及可能在末尾的reset_index
上将ID列为
答案 3 :(得分:1)
您可以尝试以下方法:
df1 = pd.read_csv("file.csv")
df2 = df1.groupby(['ID'])['L2'].apply(','.join).reset_index()
df3 = df2["L2"].str.get_dummies(",")
df = pd.concat([df2, df3], axis = 1)
print(df)
输出:
ID L2 Business Communications Firewall Security Switches
0 A Firewall,Security 0 0 1 1 0
1 B Communications 0 1 0 0 0
2 C Business,Switches 1 0 0 0 1
替代选项:
df = df.groupby(['ID'])['L2'].apply(','.join).str.get_dummies(",").reset_index()
print(df)