test_tweets = [
"This president sucks!",
"I hate this Blank House!",
"I can't believe we're living with such a bad leadership. We were so foolish",
"President Presidentname is a danger to society. I hate that he's so bad – it sucks."
]
banned_phrases = ["sucks", "bad", "hate", "foolish", "danger to society"]
写一个程序,从推文中过滤掉以下单词:“吸”,“坏”,“仇恨”,“愚蠢”和“危害社会”。替换每个否定词或在其上加上“已删除”一词。
不确定如何执行此操作。有人可以照亮吗?
new_array = test_tweets.join(" ").split(" ")
new_array.map { |word| word == banned_phrases.to_s ? "CENSORED" : word }.flatten!
答案 0 :(得分:1)
您要将整个banned_phrases数组转换为字符串,这将返回类似的内容
"[\"sucks\", \"bad\", \"hate\", \"foolish\", \"danger to society\"]"
因此,任何推文中都没有一个词可以等于(或可能不会)。主要问题似乎在于比较。
您可以开始遍历每个tweet,将它们拆分,将每个单词放入其中,然后检查包含被禁止短语的数组是否包含该特定单词,如果是,则返回“ CENSORED”,否则返回该单词。然后,您可以使用空格将生成的数组中的每个单词连接起来:
test_tweets = [
"This president sucks!",
"I hate this Blank House!",
"I can't believe we're living with such a bad leadership. We were so foolish",
"President Presidentname is a danger to society. I hate that he's so bad – it sucks."
]
banned_phrases = ["sucks", "bad", "hate", "foolish", "danger to society"]
censored_tweets = test_tweets.flat_map do |tweet|
tweet.split.map { |word| banned_phrases.include?(word) ? 'CENSORED' : word }.join(' ')
end
p censored_tweets
# ["This president sucks!", "I CENSORED this Blank House!", "I can't believe we're living with such a CENSORED leadership. We were so CENSORED", "President Presidentname is a danger to society. I CENSORED that he's so CENSORED – it sucks."]
test_tweets.flat_map do |tweet|
re = Regexp.union(banned_phrases)
tweet.split.map { |word| word.gsub(re, 'CENSORED') }.join(' ')
end
# ["This president CENSORED!", "I CENSORED this Blank House!", "I can't believe we're living with such a CENSORED leadership. We were so CENSORED", "President Presidentname is a danger to society. I CENSORED that he's so CENSORED – it CENSORED."]
答案 1 :(得分:0)
另一种班轮方式,
test_tweets.map {|tweet| tweet.gsub!(Regexp.union(banned_phrases),'censored')}
它使用正则表达式匹配那些被禁止的短语并全局替换它们