我想在数据帧中将字节转换为字符串。
data['CleanedText'].head()
0 b'witti littl book make son laugh loud recit c...
1 b'grew read sendak book watch realli rosi movi...
2 b'fun way children learn month year learn poem...
3 b'great littl book read nice rhythm well good ...
4 b'book poetri month year goe month cute littl ...
Name: CleanedText, dtype: object
我正在使用常规的 for循环来执行此操作,但是转换花费了太多时间。
for i,j in enumerate(text_data):
data['newtext'][i] = text_data[i].decode('utf-8')
由于计算速度快,是否可以使用 numpy 将字节转换为字符串?
答案 0 :(得分:0)
您可以使用apply()
加Lambda functions:
data['newtext'] = data['CleanedText'].apply(lambda x: x.decode('utf-8'))
答案 1 :(得分:0)
您可以使用str.decode
>>> df.CleanedText.str.decode('utf-8')
0 witti littl book make son laugh loud recit c...
1 grew read sendak book watch realli rosi movi...
2 fun way children learn month year learn poem...
3 great littl book read nice rhythm well good ...
4 book poetri month year goe month cute littl ...