如何从推文字符串中删除@users和#hashtags

时间:2013-10-06 04:12:44

标签: ruby regex twitter

我正在尝试解析一些推文,但我无法使用正则表达式删除以@符号和#符号开头的字词。 我试过了

tweet.slice!("#(\S+)\s?")
tweet.slice!("@(\S+)\s?")

tweet.slice!("/(?:\s|^)(?:#(?!\d+(?:\s|$)))(\w+)(?=\s|$)/i")
tweet.slice!("/(?:\s|^)(?:@(?!\d+(?:\s|$)))(\w+)(?=\s|$)/i")

tweet.slice!("\#/\w/*")
tweet.slice!("\@/\w/*")

它们似乎都不起作用。我只是做错了吗?

2 个答案:

答案 0 :(得分:0)

使用String#gsub!

>> tweet = 'Hello @ruby #world'
=> "Hello @ruby #world"
>> tweet.gsub!(/[#@]\w+/, '')
=> "Hello  "
>> tweet
=> "Hello  "

答案 1 :(得分:0)

您可以使用gsub!和字边界来执行此操作。

tweet.gsub!(/\B[@#]\S+\b/, '')

正则表达式:

\B         the boundary between two word chars (\w) or two non-word chars (\W)
[@#]       any character of: '@', '#'
\S+        non-whitespace (all but \n, \r, \t, \f, and " ") (1 or more times)       
\b         the boundary between a word char (\w) and something that is not a word char