为什么我不能更换新的行分隔符?

时间:2019-12-29 14:56:27

标签: python-3.x replace utf-8 newline telegram

我正在使用Python电报客户端,该客户端将消息从应用程序发送到我们的API,并且我想排除一些单词。在这种情况下,应删除一些@logins和#tag:

should remove some @logins and #tag

这是我的代码:

for w in app.config['EXCLUDED_WORDS']:
    if w in data:
        data = data.replace(w, '')

很简单,对吧?而我得到的结果(很多新行):

Lots of new lines

我尝试了非常不同的NL分隔符,例如#YoCrypto\n #YoCrypto\r #YoCrypto\r\n,但没有用。这是我的print(data.encode('utf-8'))输出:

#TAG\n#YoCrypto\xd0\xa0laced \xd0\xb0dditional signal for Bitmex. I will remember to include both exchanges on the same signal for btcusd now on. My apologies for inconvenience.\xef\xbb\xbf@grandcchat\n@grandcsign\n@grandcmargin

我在做什么错了?

1 个答案:

答案 0 :(得分:0)

一种可能的解决方案是使用re模块,并将单词以及任何其他换行符替换为空字符串。例如:

import re

data = b'''#TAG\n#YoCrypto\xd0\xa0laced \xd0\xb0dditional signal for Bitmex. I will remember to include both exchanges on the same signal for btcusd now on. My apologies for inconvenience.\xef\xbb\xbf@grandcchat\n@grandcsign\n@grandcmargin'''

words_to_remove = {'#YoCrypto','@grandcchat','@grandcmargin','@grandcsign'}

# decode the data (if not decoded already)
data = data.decode('utf-8')

# replace the words plus any aditional new-line character afterwards:
data = re.sub('|'.join(r'{}\n*'.format(re.escape(w)) for w in words_to_remove) , '', data)

print(data)

打印:

#TAG
Рlaced аdditional signal for Bitmex. I will remember to include both exchanges on the same signal for btcusd now on. My apologies for inconvenience.