为列值创建二进制指示符变量

时间:2017-10-25 19:49:30

标签: python pandas

我有以下数据框。我想为categories列中的每个值创建一个列,(例如:Sandwhiches,Restaurants ...)该列将有一个0或1,表示该记录是否具有该值。这是我可以用getdummies做的事情还是有人可以建议的其他方式?

代码:

print df1[1:3]

示例数据:

                    address  \
4             4719 N 20Th St   
14  9616 E Independence Blvd   

                                           attributes             business_id  \
4   {u'GoodForMeal': {u'dessert': False, u'latenig...  duHFBe87uNSXImQmvBh87Q   
14  {u'Alcohol': u'full_bar', u'HasTV': True, u'No...  SDMRxmcKPNt1AHPBKqO64Q   

                                           categories      city  \
4                           [Sandwiches, Restaurants]   Phoenix   
14  [Burgers, Bars, Restaurants, Sports Bars, Nigh...  Matthews   

                                                hours  is_open   latitude  \
4                                                  {}        0  33.505928   
14  {u'Monday': u'11:00-0:00', u'Tuesday': u'11:00...        1  35.135196   

     longitude        name neighborhood postal_code  review_count  stars state  
4  -112.038847     Blimpie                    85016            10    4.5    AZ  
14  -80.714683  Applebee's                    28105            21    2.0    NC  

更新

testdummies = pd.concat(df1["categories"],pd.get_dummies(df1["categories"]))
testdummies.head()

Error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-1dae1382c2ba> in <module>()
      1 # 13) create dummy variables for Categories
      2 
----> 3 testdummies = pd.concat(df1["categories"],pd.get_dummies(df1["categories"]))
      4 testdummies.head()

/Users/anaconda/lib/python2.7/site-packages/pandas/core/reshape.pyc in get_dummies(data, prefix, prefix_sep, dummy_na, columns, sparse, drop_first)
   1102     else:
   1103         result = _get_dummies_1d(data, prefix, prefix_sep, dummy_na,
-> 1104                                  sparse=sparse, drop_first=drop_first)
   1105     return result
   1106 

/Users/anaconda/lib/python2.7/site-packages/pandas/core/reshape.pyc in _get_dummies_1d(data, prefix, prefix_sep, dummy_na, sparse, drop_first)
   1109                     sparse=False, drop_first=False):
   1110     # Series avoids inconsistent NaN handling
-> 1111     codes, levels = _factorize_from_iterable(Series(data))
   1112 
   1113     def get_empty_Frame(data, sparse):

/Users/anaconda/lib/python2.7/site-packages/pandas/core/categorical.pyc in _factorize_from_iterable(values)
   2038         codes = values.codes
   2039     else:
-> 2040         cat = Categorical(values, ordered=True)
   2041         categories = cat.categories
   2042         codes = cat.codes

/Users/anaconda/lib/python2.7/site-packages/pandas/core/categorical.pyc in __init__(self, values, categories, ordered, name, fastpath)
    288                 codes, categories = factorize(values, sort=True)
    289             except TypeError:
--> 290                 codes, categories = factorize(values, sort=False)
    291                 if ordered:
    292                     # raise, as we don't have a sortable data structure and so

/Users/anaconda/lib/python2.7/site-packages/pandas/core/algorithms.pyc in factorize(values, sort, order, na_sentinel, size_hint)
    311     table = hash_klass(size_hint or len(vals))
    312     uniques = vec_klass()
--> 313     labels = table.get_labels(vals, uniques, 0, na_sentinel, True)
    314 
    315     labels = _ensure_platform_int(labels)

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_labels (pandas/hashtable.c:15447)()

TypeError: unhashable type: 'list'

更新

代码:

bus_rev_cat = pd.get_dummies(bus_rev['categories'].apply(pd.Series))
bus_rev2 = pd.concat([bus_rev,bus_rev_cat],axis=1)
print(bus_rev2[1:10])

Sample Data:

                  user_id             business_id  stars_x  \
1  CxDOIDnH8gp9KXzpBHJYXw  XSiqtcVEsP6dLOL7ZA9OxA        4   
2  CxDOIDnH8gp9KXzpBHJYXw  v95ot_TNwTk1iJ5n56dR0g        3   
3  CxDOIDnH8gp9KXzpBHJYXw  uloYxyRAMesZzI99mfNInA        2   
4  CxDOIDnH8gp9KXzpBHJYXw  gtcsOodbmk4E0TulYHnlHA        4   
5  CxDOIDnH8gp9KXzpBHJYXw  lOd50CiDJeNWmN_KsvR2rg        3   
6  CxDOIDnH8gp9KXzpBHJYXw  7hUp4XxmUCGqvPFAM8IJww        3   
7  CxDOIDnH8gp9KXzpBHJYXw  Ze4VPogvcD7inc3QuvY_yg        2   
8  CxDOIDnH8gp9KXzpBHJYXw  txAKid34IUd9spo6MLF_Sw        3   
9  CxDOIDnH8gp9KXzpBHJYXw  oiknQaNH9cGC6UBWC8S_Zg        3   

                  address                                         attributes  \
1        522 Yonge Street  {u'BusinessParking': {u'garage': False, u'stre...   
2     1661 Denison Street  {u'BusinessParking': {u'garage': False, u'stre...   
3    4101 Rutherford Road  {u'BusinessParking': {u'garage': False, u'stre...   
4      815 W Bloor Street  {u'Alcohol': u'full_bar', u'HasTV': False, u'N...   
5         114 Laird Drive  {u'GoodForMeal': {u'dessert': False, u'latenig...   
6     300 Borough Dr, 215  {u'BusinessParking': {u'garage': False, u'stre...   
7  5117 Sheppard Avenue E  {u'BusinessParking': {u'garage': False, u'stre...   
8             205 Main St  {u'BusinessParking': {u'garage': False, u'stre...   
9       6347 Yonge Street  {u'GoodForMeal': {u'dessert': False, u'latenig...   

                                          categories         city  \
1                     [Restaurants, Ramen, Japanese]      Toronto   
2                    [Chinese, Seafood, Restaurants]      Markham   
3                             [Italian, Restaurants]   Woodbridge   
4  [Food, Coffee & Tea, Sandwiches, Cafes, Cockta...      Toronto   
5                [Japanese, Sushi Bars, Restaurants]    East York   
6  [Restaurants, Canadian (New), Steakhouses, Ame...  Scarborough   
7  [Canadian (New), Restaurants, Breakfast & Brunch]      Toronto   
8             [Italian, Restaurants, Canadian (New)]      Markham   
9                              [Restaurants, Korean]      Toronto   

                                               hours  is_open   latitude  \
1  {u'Monday': u'11:00-22:00', u'Tuesday': u'11:0...        1  43.663689   
2                                                 {}        0  43.834295   
3  {u'Monday': u'12:00-22:00', u'Tuesday': u'12:0...        1  43.823486   
4  {u'Monday': u'12:00-2:00', u'Tuesday': u'12:00...        1  43.662726   
5  {u'Tuesday': u'17:00-22:00', u'Friday': u'17:0...        0  43.706665   
6  {u'Monday': u'11:00-0:00', u'Tuesday': u'11:00...        1  43.776146   
7  {u'Monday': u'0:00-0:00', u'Tuesday': u'0:00-0...        1  43.793599   
8                                                 {}        1  43.868463   
9                                                 {}        0  43.796237   

         ...         6_Pizza 6_Restaurants 7_Bars 7_Canadian (New)  7_French  \
1        ...               0             0      0                0         0   
2        ...               0             0      0                0         0   
3        ...               0             0      0                0         0   
4        ...               0             0      1                0         0   
5        ...               0             0      0                0         0   
6        ...               0             0      0                0         0   
7        ...               0             0      0                0         0   
8        ...               0             0      0                0         0   
9        ...               0             0      0                0         0   

   7_Restaurants 8_Mediterranean 8_Nightlife  8_Southern  8_Specialty Food  
1              0               0           0           0                 0  
2              0               0           0           0                 0  
3              0               0           0           0                 0  
4              0               0           1           0                 0  
5              0               0           0           0                 0  
6              0               0           0           0                 0  
7              0               0           0           0                 0  
8              0               0           0           0                 0  
9              0               0           0           0                 0  

[9 rows x 149 columns]

1 个答案:

答案 0 :(得分:1)

您可以使用get_dummies完全符合您的要求:

import pandas as pd

df = pd.DataFrame({"Categorical": ["a", "b", "c", "a"]})
df

>>>     Categorical
    0   a
    1   b
    2   c
    3   a


pd.concat([df, pd.get_dummies(df["Categorical"])], axis=1)

>>>     Categorical     a   b   c
    0   a               1   0   0
    1   b               0   1   0
    2   c               0   0   1
    3   a               1   0   0