正则表达式未标识要删除的“#”

时间:2019-02-01 02:46:19

标签: python regex python-3.x data-science

如何从字符串中的单词后删除'#',如果单词,单词中间或什至是单词'#'本身存在,则不只是'#'结束。

当前我正在使用正则表达式:

test = "# #DataScience"
test = re.sub(r'\b#\w\w*\b', '', test) 

用于从'#'开头的单词中删除“#”,但它根本不起作用。它按原样返回字符串

谁能告诉我为什么"#"未被识别和删除? 示例-

测试-"# #DataScience"

预期输出-"# DataScience"

测试-"kjndjk#jnjkd"

预期输出-"kjndjk#jnjkd"

测试-"# #DataScience #KJSBDKJ kjndjk#jnjkd #jkzcjkh# iusadhuish#""

预期输出-"# DataScience KJSBDKJ kjndjk#jnjkd jkzcjkh# iusadhuish#"

4 个答案:

答案 0 :(得分:1)

尝试一下:

test ="# #DataScience #KJSBDKJ kjndjk#jnjkd #jkzcjkh# iusadhuish#"
test = re.sub(r'(?<!\S)#(?=\S)', '', test)

输出:

# DataScience KJSBDKJ kjndjk#jnjkd jkzcjkh# iusadhuish#

答案 1 :(得分:0)

您的模式的问题在于#不是文字字符,因此\b不能使用它。您可以改用后向:

test = "#HereToHelp STUFF #DataScience"
print(test)
test = re.sub(r'(?:(?<= )|^)#\w+\b', '', test)
print(test)

#HereToHelp STUFF #DataScience
 STUFF 

答案 2 :(得分:0)

您的\b放置不正确。

您的正则表达式应为:

r'#\b\w+\b'

此外,+量词表示1次或多次出现,从而节省了对\w\w*的需求

答案 3 :(得分:0)

我知道有一个可接受的答案,但是我想出了这个看起来也很好的正则表达式,我个人更喜欢这样做,因为它对我来说更容易阅读:

pragma solidity ^0.4.25;

contract SturctDemo{

    struct Area{
        bytes32 name;
        bytes6 code;
    }

    struct Administrator{
        uint id;
        bytes32 name;
        bytes32 account;
        bytes32 passwd;

        mapping(uint=>Administrator)subordinateByIndex;

        Area adminArea;
    }


    // Ok
    Area  nk_area = Area("nk_area", "100001");

    function initByPosition() returns(uint, bytes32, bytes32, bytes32, bytes32, bytes6){
        // Compile error, TypeError: Type struct SturctDemo.Area memory is not implicitly convertible to expected type struct SturctDemo.Area storage pointer.
        // As I know, in solidity, local vars like Area sz_area is storage type, 
        // while my confusition is why 'Area("SuZhou", "270027")' is a memrory type?
        Area  sz_area = Area("SuZhou", "270027");

        // more
        // ...
    }
}