Question

我想为每个唯一值获取虚拟变量。想法是将数据框变成多标签目标。我该怎么办？

数据：

    C:\Users\danny\AppData\Local\Programs\Python\Python38-32\lib\site-packages\django\core\handlers\exception.py in inner
            response = get_response(request) …
▼ Local vars
Variable    Value
exc 
ValueError("The view accounts.views.accept_follow_request didn't return an HttpResponse object. It returned None instead.")
get_response    
<bound method BaseHandler._get_response of <django.core.handlers.wsgi.WSGIHandler object at 0x039FC6E8>>
request 
<WSGIRequest: GET '/accounts/users/2/accept_follower/'>
C:\Users\danny\AppData\Local\Programs\Python\Python38-32\lib\site-packages\django\core\handlers\base.py in _get_response
            raise ValueError( …
▼ Local vars
Variable    Value
callback    
<function accept_follow_request at 0x04D19E80>
callback_args   
()
callback_kwargs 
{'id': 2}
middleware_method   
<bound method CsrfViewMiddleware.process_view of <django.middleware.csrf.CsrfViewMiddleware object at 0x03A2B3D0>>
request 
<WSGIRequest: GET '/accounts/users/2/accept_follower/'>
resolver    
<URLResolver 'socialwebsite.urls' (None:None) '^/'>
resolver_match  
ResolverMatch(func=accounts.views.accept_follow_request, args=(), kwargs={'id': 2}, url_name=accept_follow_request, app_names=[], namespaces=[], route=accounts/users/<int:id>/accept_follower/)
response    
None
self    
<django.core.handlers.wsgi.WSGIHandler object at 0x039FC6E8>
view_name   
'accept_follow_request'
wrapped_callback    
<function accept_follow_request at 0x04D19E80>

所需的输出：

           ID                      L2
           A                 Firewall
           A                 Security
           B           Communications
           C                 Business
           C                 Switches

我已经尝试过ID Firewall Security Communications Business Switches A 1 1 0 0 0 B 0 0 1 0 0 C 0 0 0 1 1，但它需要一个汇总列。我也尝试过对this link的答案，但它对值求和而不是变成二进制伪列。非常感谢您的帮助。非常感谢！

Answer 1

crosstab，然后转换为布尔值：

pd.crosstab(df['ID'],df['L2']).astype(bool)

输出：

L2  Business  Communications  Firewall  Security  Switches
ID                                                        
A      False           False      True      True     False
B      False            True     False     False     False
C       True           False     False     False      True

Answer 2

让我们依次set_index和get_dummies，因为每个ID中都有多个重复项，因此我们需要sum和level = 0

s = df.set_index('ID')['L2'].str.get_dummies().max(level=0).reset_index()
Out[175]: 
  ID  Business  Communications  Firewall  Security  Switches
0  A         0               0         1         1         0
1  B         0               1         0         0         0
2  C         1               0         0         0         1

Answer 3

如果更改pivot_table，则可以使用aggfunc=any。

print(df.pivot_table(index='ID', columns='L2', 
                     aggfunc=any, fill_value=False)\
        .astype(int))
L2  Business  Communications  Firewall  Security  Switches
ID                                                        
A          0               0         1         1         0
B          0               1         0         0         0
C          1               0         0         0         1

以及可能在末尾的reset_index上将ID列为

Answer 4

您可以尝试以下方法：

df1 = pd.read_csv("file.csv")
df2 = df1.groupby(['ID'])['L2'].apply(','.join).reset_index()
df3 = df2["L2"].str.get_dummies(",")
df = pd.concat([df2, df3], axis = 1)
print(df)

输出：

  ID                 L2  Business  Communications  Firewall  Security  Switches
0  A  Firewall,Security         0               0         1         1         0
1  B     Communications         0               1         0         0         0
2  C  Business,Switches         1               0         0         0         1

替代选项：

df = df.groupby(['ID'])['L2'].apply(','.join).str.get_dummies(",").reset_index()
print(df)

熊猫集团并获得假人

4 个答案: