Question

我有一个长字符串变量（许多句子之间用“。”分隔），其中包含一些重要的数字信息，通常带有小数点（例如“ 6.5 lbs”）。

我想对所有句点出现在句子末尾进行正则表达式处理，但是当它们出现在数字之间时，请保留它们。

从：

First sentence.  Second sentence contains a number 1.0 and more words.  One more sentence.

收件人：

First sentence  Second sentence contains a number 1.0 and more words  One more sentence

我正在Stata中使用遵循此标准的Unicode正则函数进行此操作：http://userguide.icu-project.org/strings/regexp

我想在下面所做的事情是：“将前一个字符为小写字母的句点替换为空格”。

gen new_variable = ustrregexrf(note_text, "(?<=[a-z])\.", " ")

我发现它将每行删除一个句点，但不会删除所有句点。也许我需要做的就是告诉它：do this for all the periods you find satisfying the condition，但是由于它无法按照我认为的方式工作，所以也许我需要对其实际操作进行解释。

如果您能告诉我在数字后加空格的情况下如何删除句点，则奖励积分：

number is 1.0. Next sentence-> number is 1.0 Next sentence

编辑：偶尔会有诸如end sentence.begin next sentence之类的字符串不带空格，因此在空格上分隔将无法处理我的所有情况。

Answer 1

方法1

也许

import pytest

class TestParametrized:

    common_args = ('common_arg1, common_arg2', [([0, 1], [2,3])])

    @pytest.mark.parametrize(*common_args)
    @pytest.mark.parametrize('a', [0, 1])
    def test_1(self, common_arg1, common_arg2, a):
        pass

    @pytest.mark.parametrize(*common_args)
    @pytest.mark.parametrize('b', [0, 1])
    def test_2(self, common_arg1, common_arg2, b):
        pass

    @pytest.mark.parametrize(*common_args)
    @pytest.mark.parametrize('x', [0, 1])
    def test_100(self, common_arg1, common_arg2, x):
        pass

    def test_1000(self):
        pass

可能可以研究。

Demo 1

方法2

\.(?=\s|$)

Demo 2

是另一个可供选择的选项，它可以通过安装\d+\.\d+(*SKIP)(*FAIL)|\.模块来工作：

regex

测试

$ pip3 install regex

输出

第一句第二句包含数字1.0和更多的单词再增加一个句子第一句第二句包含数字1.0 还有更多的单词还有一个句子

删除句点，而不是小数点

1 个答案:

方法1

Demo 1

方法2

Demo 2

测试

输出