我想在我的数据中删除仅包含包含推文的文字的表情符号。每行对应一条推文。 我得到了一个错误的字符错误" :)"。
error: bad character range :-) at position 4
有什么问题?
#remove emoticons
import re
emoji_pattern = re.compile("["
u":)"
u":-)"
u":D"
u":("
u":-("
"]+", flags=re.UNICODE)
with open('C:/Users/M/PycharmProjects/Bachelor_Thesis/test/data_sentiment.csv',"r", encoding="utf-8") as oldfile1, open('C:/Users/M/PycharmProjects/Bachelor_Thesis/test/data_sentiment_stripped_emoticons.csv', 'w',encoding="utf-8") as newfile1:
for line in oldfile1:
line=emoji_pattern.sub(r'', line)
newfile1.write(line)
newfile1.close()
答案 0 :(得分:0)
坏字符实际上在前一行,即非ASCII字符。如果要使用它们,则需要声明兼容的编码。搜索“Python字符编码”以获得各种选择。
答案 1 :(得分:0)
我这样解决了:
#remove emoticons
with open('C:/Users/M/PycharmProjects/Bachelor_Thesis/test/data_sentiment.csv',"r", encoding="utf-8") as oldfile1, open('C:/Users/M/PycharmProjects/Bachelor_Thesis/test/data_sentiment_stripped_emoticons.csv', 'w',encoding="utf-8") as newfile1:
for line in oldfile1:
line=line.replace("","").replace(':)', '').replace(':D', '').replace(":(","").replace(":-(","")
newfile1.write(line)
newfile1.close()