Question

输入：

 " The Elephant's 4 cats. "

预期产出：

 the elephants 4 cats

代码：

 import re

 temp1 = re.sub('\W+',' ', str).strip()
 output = temp2.lower()

我的输出：

 the elephant s 4 cats

我还有大象和's'之间的额外空间。还有一个问题是我无法删除'_'（下划线）。哪里出错了，任何建议都会有所帮助。

Answer 1

尝试：

StringJoin @@ 
 Flatten[TokenizeNestedBracePairs@
    "f @ g[h[[i[[j[2], k[[1, m[[1, n[2]]]]]]]]]] // z" //. {"[", {"", \
{"[", Longest[x___], "]"}, ""}, "]"} :> {"\[LeftDoubleBracket]", {x}, 
     "\[RightDoubleBracket]"}]

基本上，您的原始\ W +表示“非单词字符”，它与空格，引号和句点匹配。所以它用“空间”代替它们......这意味着撇号获得了一个空间。

通过专门匹配非字 - 非空格 - 非下划线字符，您可能会获得更好的替代品。

python中的字符串清理

1 个答案: