Question

我正在尝试从HTML文件中获取文本。该文本跨越多行，因此我正在使用perl及其正则表达式（而不是grep，awk和sed）来提取此数据。我需要的是需求的文本和ID，它们在以下代码段的源HTML文件中表示。

<tr>
<td class="mappingtable-content-cell"> ID-3367 </td>
<td class="mappingtable-content-cell"> <span class="no-style-cleanup" style="white-space:nowrap;" title="SW compatible feature mask"><a style="font-size:1em;" target="_top" href="#"><span style="color:#000000;">ID-3367</span></a></span> </td>
<td class="mappingtable-content-cell"> SW compatible feature mask </td>
<td class="mappingtable-content-cell"> The application shall 
provide a bit mask (SW&nbsp;compatible feature mask) with same size 
as&nbsp;parameter ID_HW_FEATURE_MASK where each&nbsp;expected feature 
which should be provided by the&nbsp;HW shall be set to 1.<br>
<br>
<span style="font-style: italic;">Note: The context of each bit is defined by&nbsp;parameter HW_FEATURE_MASK.</span> </td>
</tr>

我使用bash shell进行此操作，因此我喜欢使用perl oneliner。另外，我希望将所有匹配的文本放入bash数组中，以便以后在bash脚本中重复使用。我的oneliner现在看起来像这样：

requirements_ids=($(perl -n0e 'while (m/<td class="mappingtable-content-cell">.*?(ID-[0-9]{2,4}).*?<\/td>.*?<td class="mappingtable-content-cell">(.*?)<\/td>.*?<td class="mappingtable-content-cell">(.*?)<\/td>.*?<td class="mappingtable-content-cell">(.*?)<\/td>/sg) {print "$1\n";}' $polarion_file))
requirements_polarion_reqs=($(perl -n0e 'while (m/<td class="mappingtable-content-cell">.*?(ID-[0-9]{2,4}).*?<\/td>.*?<td class="mappingtable-content-cell">(.*?)<\/td>.*?<td class="mappingtable-content-cell">(.*?)<\/td>.*?<td class="mappingtable-content-cell">(.*?)<\/td>/smg) {print "$4\n";}' $polarion_file))

对于ID，这很好用，但是对于需求文本，我将每个匹配项的每个单词放入单个数组单元中。如何将整个匹配的文本信息放入单个数组单元格中？

当在文件上“就地”执行正则表达式时，它可以正常工作，但实际上我不想更改原始文件，我只想获取文本并将其用于bash中以进行重新排列和导出。

Perl正则表达式：使用oneliner提取多行文本

0 个答案: