Question

嘿伙计们，所以我正在创建一个脚本来改变这个网站的字词/结果（http://grecni.com/texttwist.php），所以我已经准备好了http请求，等等。

我现在唯一需要的是取出单词，所以我正在使用一个看起来像这样的html源：

<html>
<head>
<title>Text Twist Unscrambler</title>
<META NAME="keywords" CONTENT="Text,Twist,Text Twist,Unscramble,Free,Source,php">
</head>
<body>

<font face="arial,helvetica" size="3">
<p>
<b>3 letter words</b><br>sae &nbsp; sac &nbsp; ess &nbsp; aas &nbsp; ass &nbsp; sea &nbsp; ace &nbsp; sec &nbsp; <p>

<b>4 letter words</b><br>cess &nbsp; secs &nbsp; seas &nbsp; ceca &nbsp; sacs &nbsp; case &nbsp; asea &nbsp; casa &nbsp; aces &nbsp; caca &nbsp; <p>

<b>5 letter words</b><br>cacas &nbsp; casas &nbsp; caeca &nbsp; cases &nbsp; <p>
<b>6 letter words</b><br>access &nbsp; <br><br>
Found 23 words in 0.22962 seconds


<form action="texttwist.php" method="post">

enter scrambled letters and I'll return all word combinations<br>
<input type="text" name="l" value="asceacas" size="20" maxlength="20">

<input type="submit" name="button" value="unscramble">
<input type="button" name="clear" value="clear" onClick="this.form.l.value='';">
</form><p>

<a href=texttwist.phps>php source</a>
- it's kinda ugly, but it's fast<p>

<a href=/>back to my page</a>

</body>

</html>

我正在尝试取出像“sae”，“sav”，“secs”，“seas”，“casas”等字样。

任何帮助？

这是我得到的最远，不知道该怎么办：link text

有什么建议吗？帮助

Answer 1

使用像Nokogiri这样的HTML解析器。

Answer 2

如果你想要任何类型的健壮性，你真的想要一个解析器，正如Adrian所提到的，Nokogiri是最受欢迎的解决方案。

如果你坚持，知道你可能会遇到madness，因为页面变得更加复杂，以下内容可能有所帮助：

搜索与

匹配的行

/^<b>\d+ letter words/

然后你可以挖出这样的位：

a = line.split(/<br>/)[1] # the second half
a.gsub!('<p>', '') # take out the trailing <p>
res = a.split(' &nbsp; ')# this is your data

话虽这么说，但这不是你想要的生产代码。如果学习解析器会改变您看到此问题的方式，您会感到惊讶。

帮助正则表达式/红宝石

2 个答案: