我有数据框,其中一些列(C1,C2,C3)是分类(字符串)变量。数据和数据类型如下:
C1 C2 C3 C4 C5 \
4 b'02e197c5' b'c2ced437' b'a2427619' b'3f85ecae' b'b8c51ab7'
9 b'62770d79' b'ad984203' b'ddd956c1' b'f7f54f97' b'bbaea1c0'
13 b'7ffd46c3' b'710103fd' b'a1407382' b'f2463ffb' b'664ff944'
14 b'9a8cb066' b'7a06385f' b'417e6103' b'6faef306' b'f8990a45'
45 b'6f877ce8' b'58cc2d25' b'9b48ba97' b'f2463ffb' b'd90dd51f'
数据类型:
C1 object
C2 object
C3 object
然后,我使用DictVectorizer为字符串
应用单热代码labelTransformer = DictVectorizer(dtype='str')
labelTransformer.fit_transform(clickDataFrame["C1"].astype("str"))
但在那之后,我得到如下错误:
File "click_main.py", line 60, in <module>
df2 = labelTransformer.fit_transform(clickDataFrame["C1"].astype("str"))
File "/usr/local/lib/python3.6/dist-packages/sklearn/feature_extraction/dict_vectorizer.py", line 230, in fit_transform
return self._transform(X, fitting=True)
File "/usr/local/lib/python3.6/dist-packages/sklearn/feature_extraction/dict_vectorizer.py", line 166, in _transform
for f, v in six.iteritems(x):
File "/usr/local/lib/python3.6/dist-packages/sklearn/externals/six.py", line 439, in iteritems
return iter(getattr(d, _iteritems)(**kw))
AttributeError: 'str' object has no attribute 'items'
我尝试了很多,但我找不到解决方案?
答案 0 :(得分:0)
您可以使用pd.get_dummies
直接从大熊猫获得单热编码。如果您想独立处理每个列,只需执行pd.get_dummies(df)
或pd.get_dummies(df.C1)
。
如果您想获取所有列中每个唯一值的指标,可以使用pd.get_dummies(df.stack()).unstack().swaplevel(0, 1, axis=1)
。