wget --output-document=- http://www.tip.it/runescape/grand-exchange-centre 2>/dev/null \
| grep "The Grand Exchange updated" \
将输出如下内容:
<h4 id="gec_update_time">The Grand Exchange updated <span><b>1</b> days, <b>12</b> hours, <b>45</b> minutes and <b>1</b> seconds ago</span></h4>
我的目标是修剪它以便只输出:
1 days, 12 hours, 45 minutes, 1 seconds
不是很好,有什么提示吗?
答案 0 :(得分:1)
你可以编写一个简短的Ruby脚本:
gem install sanitize
制作名为“cleaner.rb”的文件:
#!/usr/bin/env ruby -w
require 'rubygems'
require 'sanitize'
puts Sanitize.clean(gets).trim
然后......
wget --output-document=- http://www.tip.it/runescape/grand-exchange-centre 2>/dev/null \
| grep "The Grand Exchange updated" | ./cleaner.rb
给你:“The Grand Exchange更新1天,13小时,0分钟和56秒之前”
答案 1 :(得分:1)
如果是使用lynx的选项,你可以免费获得:
$ lynx -dump http://www.tip.it/runescape/grand-exchange-centre | grep "The Grand Exchange updated"
The Grand Exchange updated 1 days, 19 hours, 8 minutes and 48 seconds ago
如果需要,您可以从中删除主要文本:
$ foo="$(lynx -dump http://www.tip.it/runescape/grand-exchange-centre | grep "The Grand Exchange updated")"
$ echo "${foo#*updated }"
1 days, 19 hours, 9 minutes and 8 seconds ago
如果您绝对想要使用wget并去掉标签,可以使用以下内容:
$ wget --output-document=- http://www.tip.it/runescape/grand-exchange-centre 2>/dev/null | grep "The Grand Exchange updated" | sed -e 's/<[^>]\+>//g' -e 's/The Grand Exchange updated //'
1 days, 19 hours, 17 minutes and 2 seconds ago
第一种选择可能是更好的选择。