Question

我有一个表格行元素

<th scope="row" class="u-printHyphensManual row">
            Advan&shy;taged
</th>

如何在没有连字符的情况下获取文本？即elem.text返回"Advantaged"而非"Advan-taged"。

我正在使用水豚。

Answer 1

将find('th').text更改为find('th').text.gsub(/[^A-za-z]/,'')。

这适用于这种情况，但是根据您真正试图解决的一般问题，这可能会产生意想不到的后果。

Answer 2

您可以通过放入代码并使用string.encode将Unicode字符放入字符串中，也可以将代码直接放在正则表达式中。软连字符的Unicode是\ u00AD

text.gsub('\u00AD'.encode('utf-8'), '')

或

text.gsub(/\u00AD/, '')

如果这不起作用，请尝试替换文字

text.gsub('', '')