大熊猫将一个字符串作为一个对象将类别转换为数字,但得到了一组数字

时间:2019-04-16 11:58:13

标签: python python-3.x pandas dataframe

我想在熊猫中分解一列数据框并将其添加为新列。列的值是一个字符串。

例如

 COL_1
 'TRY A TEST'
 'TRY A TEST' 
 'PLAY Q'
 'PLAY Q'

我希望将其转换为数字,例如:

 COL_1     NEW_COL
 'TRY A TEST'   0
 'TRY A TEST'   0
 'PLAY Q'       1
 'PLAY Q'       1

但是,我得到了:

 x = 'TRY A TEST'
 my_df['NEW_COL'] = my_df['COL_1'].apply(lambda x: pd.factorize(x)[0])

 (array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int64), array(['TRY A TEST'], dtype=object))

似乎每个字符都转换为数字。

我也遇到错误:

 TypeError: 'float' object is not iterable

“ COL_1”中没有浮点数,它是字符串。

有什么建议吗?

2 个答案:

答案 0 :(得分:1)

替代方法,使用Categorical dtype:

my_df['NEW_COL'] = my_df['COL_1'].astype('category').cat.codes

答案 1 :(得分:1)

简单的解决方案:

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
my_df['NEW_COL'] = le.fit_transform(my_df['COL_1'].astype(str))
my_df

        COL_1  NEW_COL
0  TRY A TEST        1
1  TRY A TEST        1
2      PLAY Q        0
3      PLAY Q        0

对于大型数据框/多列,您可以简单地用于循环

例如。

my_df

     pets     owner   location
0     cat     Champ  San_Diego
1     dog       Ron   New_York
2     cat     Brick   New_York
3  monkey     Champ  San_Diego
4     dog  Veronica  San_Diego
5     dog       Ron   New_York

############
for column in ['pets','owner','location']:
    le = preprocessing.LabelEncoder()
    my_df[str(column+'_num')] = le.fit_transform(my_df[column].astype(str))
############


my_df

     pets     owner   location  pets_num  owner_num  location_num
0     cat     Champ  San_Diego         0          1             1
1     dog       Ron   New_York         1          2             0
2     cat     Brick   New_York         0          0             0
3  monkey     Champ  San_Diego         2          1             1
4     dog  Veronica  San_Diego         1          3             1
5     dog       Ron   New_York         1          2             0