Question

我想减少我的twitter语料库的功能。出于这个原因，我打算用等价类标记替换用户名。用户名的特点是以@开头。我尝试使用re.sub（），但它没有按预期工作。它取代了句子中的名字，但不是句子的开头。有什么问题？

#usernames (e.g. @max) are replaced with An equivalence class token 

import re
with open('outfilename2.csv',"r", encoding="utf-8") as oldfile1, open('outfilename3.csv', 'w',encoding="utf-8") as newfile1:
    for line in oldfile1:
        line=re.sub(r"(\s)@\w+", r" USERNAME", line)
        newfile1.write(line)
newfile1.close()

Answer 1

你的正则表达式对于你声称要做的事情是错误的：

line=re.sub(r"\B@\w+", "USERNAME", line)

如果您想匹配@anything_anywhere，其中@前面有非边界字符，请将其替换为USERNAME。

Answer 2

Line = Line.split（“{”）[1] .split（“}”）[0] 这可能会有所帮助

用“USERNAME”替换twitter用户名（@ ...） - 如何？

2 个答案: