Reg Ex试图查找并替换所有Twitter提及和标签Notepad ++

时间:2018-03-09 00:20:18

标签: regex notepad++

我试图找到并删除所有Twitter提及(以@开头,后跟twitter用户名。字母数字字符(大写和小写字母和数字)和下划线。

另外,对于我的下一步,我想删除文本文件中的所有主题标签。标签以#符号开头,任意数量的字符后面没有空格或制表符。

我将使用notepad ++来查找和删除这些实例。

到目前为止,这就是我所拥有的:

@ [a-zA-Z0-9] {1,15}对于第一个有效,但如果有两次相同的用户名则不行。例如: enter image description here

[a-zA-Z0-9]正在运行,但仅匹配主题标签和第一个字符。例如:

enter image description here

以下是我正在处理的文字中的几行:

posted: Sat Feb 03 2018 11:05:14    text: I should be making a killing for mining all this stale DSH coin, minergate. #bitcoin #ripple #altcoin screen_name: carlosrr24 location: Providence, RI    verified: false followers_count: 629    friends_count: 139  lang: en    retweet_count: 0    favorite_count: 0
posted: Sat Feb 03 2018 11:05:14    text: @cryptodailyuk @ADAcoin_ @BittrexExchange @exchange @NEWS @Bitcoin @crypto @_CryptoIQ @CharlieShrem Well done! Shut the fkn tether down!  screen_name: Pascal74672564 location: Zrich, Schweiz    verified: false followers_count: 6  friends_count: 16   lang: de    retweet_count: 0    favorite_count: 0
posted: Sat Feb 03 2018 11:05:27    text: When @Bitcoin becomes number 2 market cap, can I call it an Alt coin?     screen_name: Steven_Budgen84    location: Bahrain   verified: false followers_count: 238    friends_count: 1394 lang: en    retweet_count: 0    favorite_count: 0
posted: Sat Feb 03 2018 11:05:35    text: Current price of Bitcoin is $8844.61 #Bitcoin #Bithound   screen_name: The_BitHound   location: United States verified: false followers_count: 87 friends_count: 237  lang: en    retweet_count: 0    favorite_count: 0
posted: Sat Feb 03 2018 11:05:52    text: THE MOST INNOVATIVE AND LUCRATIVE WAY TO EARN BITCOIN JOIN BITCLUB NETWORK! ! !   screen_name: toshi_mat003   location: null  verified: false followers_count: 37 friends_count: 7    lang: ja    retweet_count: 0    favorite_count: 0
posted: Sat Feb 03 2018 11:05:56    text: THE MOST INNOVATIVE AND LUCRATIVE WAY TO EARN BITCOIN JOIN BITCLUB NETWORK!!! screen_name: Bitclubnetwork3    location: Australia verified: false followers_count: 106    friends_count: 58   lang: ja    retweet_count: 0    favorite_count: 0

1 个答案:

答案 0 :(得分:2)

对于用户名

您建议的@[a-zA-Z0-9]{1,15}正则表达式错过了那些带有下划线的用户名。它与您的搜索文本中的@ADAcoin_@_CryptoIQ不匹配。请改用@\w{1,15}

此外,如果您说搜索失败"如果有两次相同的用户名",我认为您会被呈现搜索结果的方式误导。如果向左看,您会看到每一行突出显示同一行上的不同匹配。在我附加的图像中,正则表达式在同一行(第3行)中找到所有九个匹配项,但它在一个单独的行上打印每个匹配项。

All the matches are on line 3

对于主题标签

(#[^\s]+\s)注意尾随空格 - 如果我没有弄错,多个标签必须用空格分隔。