将pandas dataframe转换为utf8

时间:2017-02-25 13:47:14

标签: python-3.x pandas

如何将pandas数据帧转换为unicode?

`messages=pandas.read_csv('data/SMSSpamCollection',sep='\t',quoting=csv.QUOTE_NONE,names=["label", "message"])
def split_into_tokens(message):
  message = unicode(message, 'utf8')  # convert bytes into proper unicode
  return TextBlob(message).words


messages.head().apply(split_into_tokens(messages))`

它给出了错误

Traceback (most recent call last):
File "minor.py", line 46, in <module>
messages.head().apply(split_into_tokens(messages))
File "minor.py", line 42, in split_into_tokens
message = unicode(message, 'utf8')  # convert bytes into proper unicode
TypeError: coercing to Unicode: need string or buffer, DataFrame found

2 个答案:

答案 0 :(得分:6)

Df.x.str.encode(&#39; UTF-8&#39)

将解决您的问题。

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.encode.html

答案 1 :(得分:2)

更改代码

messages.head().apply(split_into_tokens(messages))

messages.head().apply(split_into_tokens)

虽然使用带有函数的'apply',但在你的情况下不需要传递参数,因为你的代码显示它正在传递一个在执行时给出错误的数据帧。