我有一个名为“combi”的数据框(大小:(1771077,38))。 当我尝试运行以下代码时:
dum = ['Date_ID', 'Distribution_Type','Fixed_CostFactor','Product_Description','Order_ID','Quantity','Amount','SalesMgr_Name', 'Sales_Type', 'Product_Description','Product_Category', 'Product_Group', 'Brand', Unit_of_Measure', 'Pack_Type','Cost_Price', 'Sales_Price', PlantName','City_Name','Population', 'Customer_Name', 'Customer_Since', 'Industry', 'Customer_Group','Month_Name', 'Quarter_No']
dummies = pd.get_dummies(combi[dum])
它给出了这个错误:
ValueErrorTraceback (most recent call last)
<ipython-input-19-1227d2ff4df6> in <module>()
4 'Industry', 'Customer_Group','Month_Name', 'Quarter_No']
5
----> 6 dummies = pd.get_dummies(combi[dum])
7 dummies.columns
/usr/lib64/python2.7/site-packages/pandas/core/reshape/reshape.pyc in get_dummies(data, prefix, prefix_sep, dummy_na, columns, sparse, drop_first)
1206 dummy = _get_dummies_1d(data[col], prefix=pre, prefix_sep=sep,
1207 dummy_na=dummy_na, sparse=sparse,
-> 1208 drop_first=drop_first)
1209 with_dummies.append(dummy)
1210 result = concat(with_dummies, axis=1)
/usr/lib64/python2.7/site-packages/pandas/core/reshape/reshape.pyc in _get_dummies_1d(data, prefix, prefix_sep, dummy_na, sparse, drop_first)
1218 sparse=False, drop_first=False):
1219 # Series avoids inconsistent NaN handling
-> 1220 codes, levels = _factorize_from_iterable(Series(data))
1221
1222 def get_empty_Frame(data, sparse):
/usr/lib64/python2.7/site-packages/pandas/core/series.pyc in __init__(self, data, index, dtype, name, copy, fastpath)
246 else:
247 data = _sanitize_array(data, index, dtype, copy,
--> 248 raise_cast_failure=True)
249
250 data = SingleBlockManager(data, index, fastpath=True)
/usr/lib64/python2.7/site-packages/pandas/core/series.pyc in _sanitize_array(data, index, dtype, copy, raise_cast_failure)
3027 raise Exception('Data must be 1-dimensional')
3028 else:
-> 3029 subarr = _asarray_tuplesafe(data, dtype=dtype)
3030
3031 # This is to prevent mixed-type Series getting all casted to
/usr/lib64/python2.7/site-packages/pandas/core/common.pyc in _asarray_tuplesafe(values, dtype)
378 except ValueError:
379 # we have a list-of-list
--> 380 result[:] = [tuple(x) for x in values]
381
382 return result
ValueError: cannot copy sequence with size 2 to array axis with dimension 1771077
但是当我运行它时,它不会给出任何错误:
dummies = pd.get_dummies(combi)
有人可以告诉我出了什么问题,我该如何解决这个问题?我希望只使用原始列的一部分。
答案 0 :(得分:2)
我在尝试使用get_dummies()稀疏我的分类数据时也遇到了同样的错误,问题是当我删除它时我有重复的列名称然后我没有再次出现此错误。
希望能帮助那些来这里遇到同样问题的人。
答案 1 :(得分:1)
在你的dum列表中,你似乎错过了一些单引号。例如,Unit_of_Measure和PlantName。你能解决它并再试一次吗?
dum=['Date_ID',
'Distribution_Type',
'Fixed_CostFactor',
'Product_Description',
'Order_ID',
'Quantity',
'Amount',
'SalesMgr_Name',
'Sales_Type',
'Product_Description',
'Product_Category',
'Product_Group',
'Brand',
Unit_of_Measure',
'Pack_Type',
'Cost_Price',
'Sales_Price',
PlantName',
'City_Name',
'Population',
'Customer_Name',
'Customer_Since',
'Industry',
'Customer_Group',
'Month_Name',
'Quarter_No']
可以使用下面的代码,您可以确定导致问题的列。
for c in dum:
print('column: {}, len: {}'.format(c, len(pd.get_dummies(df[c]))))