我正在解析一个带有一些html标签并更改为latex标签的文件。
cat text
<Text>A <strong>ASDFF</strong> is a <em>cerebrovafdfasfscular</em> condifasdftion caufadfsed fasdfby tfdashe l
ocfsdafalised <span style="text-decoration: underline;">ballooning</span> or difdaslation of an arfdatery in thdfe bfdasrai
n. Smadfsall aasdneurysms may dadisplay fdasno ofadsbvious sdfasigns (<span style="text-decoration: underline;"><em><str
ong>asymptomatic</strong></em></span>) bfdasut lfdsaarger afdasneurysms maydas besda asfdsasociated widfth sdsfudd
sed -e 's|<strong>\(.*\)</strong>|\\textbf{\1}|g' test
cat out
<Text>A \textbf{ASDFF</strong> is a <em>cerebrovafdfasfscular</em> condifasdftion caufadfsed fasdfby tfdashe locfsda
falised <span style="text-decoration: underline;">ballooning</span> or difdaslation of an arfdatery in thdfe bfdasrain. Sma
dfsall aasdneurysms may dadisplay fdasno ofadsbvious sdfasigns (<span style="text-decoration: underline;"><em><strong>
;asymptomatic}</em></span>) bfdasut lfdsaarger afdasneurysms maydas besda asfdsasociated widfth sdsfudd
当我观察\ textbf {ASDFF .........}时,预期输出应为\ textbf {ASDFF}。如何获得预期的结果?
问候
答案 0 :(得分:2)
您可能希望使用perl正则表达式。
perl -pe 's|<strong>(.*?)</strong>|\\textbf{\1}|g'
您的问题与non-greedy-regex-matching-in-sed类似。下次你可能想简化你的案例来指出真正的问题。例如,不要只粘贴原始的html代码,而是使用它:
fooTEXT1barfooTEXT2bar
<强>更新强>
如果你只是想要贪婪的方法,那就忽略它。