def text_process(text):
text = text.translate(str.maketrans('', '', string.punctuation))
return " ".join(text)
输入文本:“交易值为-RS.3456.63”
输出:“交易值为RS 345663”
有人可以建议我如何在文本预处理期间删除特殊字符(包括“。”),但保留十进制数字吗?
必填输出:“交易值为RS 3456.63”
答案 0 :(得分:3)
您可以使用更通用的正则表达式替换。以外的所有特殊字符。
import re
def text_process(text):
text = re.sub('[^\w.]+', ' ', text)
return text
s = 'Transaction: value* #was - 3456.63 Rupees'
text_process(s)
你得到
'Transaction value was 3456.63 Rupees'
编辑:以下函数仅返回带小数的数字。
def text_process(text):
text = re.sub('[^\d.]+', '', text)
return text
s = 'Transaction: value* #was - 3456.63 Rupees'
text_process(s)
'3456.63'
答案 1 :(得分:1)
如果我正确理解了您的问题,那么此代码适合您:
text = 'Transaction value was, - 3456.63 Rupees'
regex = r"(?<!\d)[" + string.punctuation + "](?!\d)"
result = re.sub(regex, "", text)
# output: 'Transaction value was 3456.63 Rupees'
要解决第二个问题,请尝试使用以下技巧:
text = 'Transaction value was, - Rs.3456.63'
regex_space = r"([0-9]+(\.[0-9]+)?)"
regex_punct = r'[^\w.]+'
re.sub(r'[^\w.]+', ' ', re.sub(regex_space,r" \1 ", text).strip())
# output: 'Transaction value was Rs. 3456.63 Rupees'