我使用的是Ruby 2.4。我想从我的字符串中删除网址,所以我尝试了这个
puts "str before: #{my_str}"
my_str.gsub!(/#{URI::regexp}/, '')
puts "str after url sub: #{my_str}"
但只有" http"被剥夺了。这是上面几行的输出
str before: Top (http://www.lafayettefitness.org/Results/2011%20CHASING%20THE%20RAINBEAU%205K%20AGE%20GROUP%20RESULTS.HTM" \l "Top)
str after url sub: Top (//www.lafayettefitness.org/Results/2011%20CHASING%20THE%20RAINBEAU%205K%20AGE%20GROUP%20RESULTS.HTM" \l "Top)
从字符串中删除网址的正确方法是什么?
编辑:以下是我发生的事情' puts"#{URI :: regexp}"'
(?x-mi:
([a-zA-Z][\-+.a-zA-Z\d]*): (?# 1: scheme)
(?:
((?:[\-_.!~*'()a-zA-Z\d;?:@&=+$,]|%[a-fA-F\d]{2})(?:[\-_.!~*'()a-zA-Z\d;\/?:@&=+$,\[\]]|%[a-fA-F\d]{2})*) (?# 2: opaque)
|
(?:(?:
\/\/(?:
(?:(?:((?:[\-_.!~*'()a-zA-Z\d;:&=+$,]|%[a-fA-F\d]{2})*)@)? (?# 3: userinfo)
(?:((?:(?:[a-zA-Z0-9\-.]|%\h\h)+|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|\[(?:(?:[a-fA-F\d]{1,4}:)*(?:[a-fA-F\d]{1,4}|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})|(?:(?:[a-fA-F\d]{1,4}:)*[a-fA-F\d]{1,4})?::(?:(?:[a-fA-F\d]{1,4}:)*(?:[a-fA-F\d]{1,4}|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}))?)\]))(?::(\d*))?))? (?# 4: host, 5: port)
|
((?:[\-_.!~*'()a-zA-Z\d$,;:@&=+]|%[a-fA-F\d]{2})+) (?# 6: registry)
)
|
(?!\/\/)) (?# XXX: '\/\/' is the mark for hostport)
(\/(?:[\-_.!~*'()a-zA-Z\d:@&=+$,]|%[a-fA-F\d]{2})*(?:;(?:[\-_.!~*'()a-zA-Z\d:@&=+$,]|%[a-fA-F\d]{2})*)*(?:\/(?:[\-_.!~*'()a-zA-Z\d:@&=+$,]|%[a-fA-F\d]{2})*(?:;(?:[\-_.!~*'()a-zA-Z\d:@&=+$,]|%[a-fA-F\d]{2})*)*)*)? (?# 7: path)
)(?:\?((?:[\-_.!~*'()a-zA-Z\d;\/?:@&=+$,\[\]]|%[a-fA-F\d]{2})*))? (?# 8: query)
)
(?:\#((?:[\-_.!~*'()a-zA-Z\d;\/?:@&=+$,\[\]]|%[a-fA-F\d]{2})*))? (?# 9: fragment)
)
答案 0 :(得分:0)
对于常规字符串似乎工作正常:
my_str = "Top (http://www.lafayettefitness.org/Results/2011%20CHASING%20THE%20RAINBEAU%205K%20AGE%20GROU;5DP%20RESULTS.HTM\" \\l \"Top)"
puts "str before: #{my_str}" # => str before: Top (http://www.lafayettefitness.org/Results/2011%20CHASING%20THE%20RAINBEAU%205K%20AGE%20GROU;5DP%20RESULTS.HTM" \l "Top)
my_str.gsub!(/#{URI::regexp}/, '')
puts "str after url sub: #{my_str}" # => str after url sub: Top (" \l "Top)
但是,你的可能会有一些垃圾,不可打印的字符。例如,在第一个斜杠之前的一个随机空字符:
# vv - random null character
my_str = "Top (http:\0//www.lafayettefitness.org/Results/2011%20CHASING%20THE%20RAINBEAU%205K%20AGE%20GROU;5DP%20RESULTS.HTM\" \\l \"Top)"
# looks the same vv
puts "str before: #{my_str}" # => str before: Top (http://www.lafayettefitness.org/Results/2011%20CHASING%20THE%20RAINBEAU%205K%20AGE%20GROU;5DP%20RESULTS.HTM" \l "Top)
my_str.gsub!(/#{URI::regexp}/, '')
puts "str after url sub: #{my_str}" # => str after url sub: Top (//www.lafayettefitness.org/Results/2011%20CHASING%20THE%20RAINBEAU%205K%20AGE%20GROU;5DP%20RESULTS.HTM" \l "Top)
现在,如果您尝试从网站复制并粘贴此空字符的输出,它仍然有效:
# I copied this from the output from the line below `looks the same vv`
my_str = 'Top (http://www.lafayettefitness.org/Results/2011%20CHASING%20THE%20RAINBEAU%205K%20AGE%20GROU;5DP%20RESULTS.HTM" \l "Top)'
puts "str before: #{my_str}" # => str before: Top (http://www.lafayettefitness.org/Results/2011%20CHASING%20THE%20RAINBEAU%205K%20AGE%20GROU;5DP%20RESULTS.HTM" \l "Top)
my_str.gsub!(/#{URI::regexp}/, '')
puts "str after url sub: #{my_str}" # => str after url sub: Top (" \l "Top)
所以最终看起来它对我们有用。因此,您可以尝试删除所有不可打印的字符,看看它是否适合您:
my_str = "Top (http:\0//www.lafayettefitness.org/Results/2011%20CHASING%20THE%20RAINBEAU%205K%20AGE%20GROU;5DP%20RESULTS.HTM\" \\l \"Top)"
my_str.gsub!(/[^[:print:]]/i, '')
my_str.gsub!(/#{URI::regexp}/, '')
puts "str after url sub: #{my_str}" # => str after url sub: Top (" \l "Top)