我收到此错误:
---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-34-8cf38df798b5> in <module>()
1 x = df[['text']]
----> 2 x['subjectivity'] = df.text.apply(lambda x: TextBlob(str(unicode(df[['text']]))).sentiment.subjectivity)
3 df.head()
/Users/keenek1/anaconda3/lib/python2.7/site-packages/pandas/core/series.pyc in apply(self, func, convert_dtype, args, **kwds)
3589 else:
3590 values = self.astype(object).values
-> 3591 mapped = lib.map_infer(values, f, convert=convert_dtype)
3592
3593 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-34-8cf38df798b5> in <lambda>(x)
1 x = df[['text']]
----> 2 x['subjectivity'] = df.text.apply(lambda x: TextBlob(str(unicode(df[['text']]))).sentiment.subjectivity)
3 df.head()
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 551: ordinal not in range(128)
我试图通过使用不同的编码定义来解决这个问题,甚至使用utf-8编码将数据帧写回到csv。
我该怎么办?
from textblob import TextBlob
import pandas as pd
path = 'Tweets.csv'
df = pd.read_csv(path, delimiter=',', header='infer')
df.to_csv('tweets_encoded.csv', encoding='utf-8')
df = pd.read_csv('tweets_encoded.csv', delimiter=',', header='infer', encoding='utf-8"')
我尝试使用chardet查找编码
rawdata=open('Tweets.csv',"r").read()
chardet.detect(rawdata)
{'confidence': 0.5471323391929904,
'encoding': 'Windows-1254',
'language': 'Turkish'}
运行时出现错误
x = df[['text']]
x['subjectivity'] = df.text.apply(lambda x: TextBlob(str(df[['text']])).sentiment.subjectivity)
df.head()