不能将大小为2的序列复制到数组轴,其尺寸为1771077

时间:2017-05-15 06:47:09

标签: python-3.x pandas dummy-variable valueerror

我有一个名为“combi”的数据框(大小:(1771077,38))。 当我尝试运行以下代码时:

dum = ['Date_ID', 'Distribution_Type','Fixed_CostFactor','Product_Description','Order_ID','Quantity','Amount','SalesMgr_Name', 'Sales_Type', 'Product_Description','Product_Category', 'Product_Group', 'Brand', Unit_of_Measure', 'Pack_Type','Cost_Price', 'Sales_Price', PlantName','City_Name','Population', 'Customer_Name', 'Customer_Since',     'Industry', 'Customer_Group','Month_Name', 'Quarter_No']

dummies = pd.get_dummies(combi[dum])

它给出了这个错误:

ValueErrorTraceback (most recent call last)
<ipython-input-19-1227d2ff4df6> in <module>()
      4        'Industry', 'Customer_Group','Month_Name', 'Quarter_No']
      5 
----> 6 dummies = pd.get_dummies(combi[dum])
      7 dummies.columns

/usr/lib64/python2.7/site-packages/pandas/core/reshape/reshape.pyc in get_dummies(data, prefix, prefix_sep, dummy_na, columns, sparse, drop_first)
   1206             dummy = _get_dummies_1d(data[col], prefix=pre, prefix_sep=sep,
   1207                                     dummy_na=dummy_na, sparse=sparse,
-> 1208                                     drop_first=drop_first)
   1209             with_dummies.append(dummy)
   1210         result = concat(with_dummies, axis=1)

/usr/lib64/python2.7/site-packages/pandas/core/reshape/reshape.pyc in _get_dummies_1d(data, prefix, prefix_sep, dummy_na, sparse, drop_first)
   1218                     sparse=False, drop_first=False):
   1219     # Series avoids inconsistent NaN handling
-> 1220     codes, levels = _factorize_from_iterable(Series(data))
   1221 
   1222     def get_empty_Frame(data, sparse):

/usr/lib64/python2.7/site-packages/pandas/core/series.pyc in __init__(self, data, index, dtype, name, copy, fastpath)
    246             else:
    247                 data = _sanitize_array(data, index, dtype, copy,
--> 248                                        raise_cast_failure=True)
    249 
    250                 data = SingleBlockManager(data, index, fastpath=True)

/usr/lib64/python2.7/site-packages/pandas/core/series.pyc in _sanitize_array(data, index, dtype, copy, raise_cast_failure)
   3027             raise Exception('Data must be 1-dimensional')
   3028         else:
-> 3029             subarr = _asarray_tuplesafe(data, dtype=dtype)
   3030 
   3031     # This is to prevent mixed-type Series getting all casted to

/usr/lib64/python2.7/site-packages/pandas/core/common.pyc in _asarray_tuplesafe(values, dtype)
    378             except ValueError:
    379                 # we have a list-of-list
--> 380                 result[:] = [tuple(x) for x in values]
    381 
    382     return result

ValueError: cannot copy sequence with size 2 to array axis with dimension 1771077

但是当我运行它时,它不会给出任何错误:

dummies = pd.get_dummies(combi)

有人可以告诉我出了什么问题,我该如何解决这个问题?我希望只使用原始列的一部分。

2 个答案:

答案 0 :(得分:2)

我在尝试使用get_dummies()稀疏我的分类数据时也遇到了同样的错误,问题是当我删除它时我有重复的列名称然后我没有再次出现此错误。

希望能帮助那些来这里遇到同样问题的人。

答案 1 :(得分:1)

在你的dum列表中,你似乎错过了一些单引号。例如,Unit_of_Measure和PlantName。你能解决它并再试一次吗?

dum=['Date_ID',
'Distribution_Type',
'Fixed_CostFactor',
'Product_Description',
'Order_ID',
'Quantity',
'Amount',
'SalesMgr_Name',
'Sales_Type',
'Product_Description',
'Product_Category',
'Product_Group',
'Brand',
Unit_of_Measure',
'Pack_Type',
'Cost_Price',
'Sales_Price',
PlantName',
'City_Name',
'Population',
'Customer_Name',
'Customer_Since',
'Industry',
'Customer_Group',
'Month_Name',
'Quarter_No']

可以使用下面的代码,您可以确定导致问题的列。

for c in dum:
    print('column: {}, len: {}'.format(c, len(pd.get_dummies(df[c]))))