我正在放置HTML代码:
<div class="rendering rendering_person rendering_short rendering_person_short">
<h3 class="title"><a rel="Person" href="https://moh-it.pure.elsevier.com/en/persons/paola-alberti" class="link person"><span>Paola Alberti</span></a></h3>
<ul class="relations email">
<li class="email"><a href="mailto:paola.alberti@istitutotumori.mi.it" class="link"><span>paola.alberti@istitutotumori.mi.it</span></a></li>
</ul>
<ul class="relations organisations">
<li><a rel="Organisation" href="https://moh-it.pure.elsevier.com/en/organisations/fondazione-irccs-istituto-nazionale-dei-tumori" class="link organisation"><span>Fondazione IRCCS Istituto Nazionale dei Tumori</span></a></li>
</ul>
<p class="type"><span class="family">Person: </span>Academic</p>
</div>
如何从上面的span标签中获取电子邮件...
<span>paola.alberti@istitutotumori.mi.it</span>
答案 0 :(得分:1)
您可以使用XPath:
awk 'NR==FNR{A[$1];next}$1 in A else { print "unknown" }' file1 file2
awk 'BEGIN{FS=OFS="\t"} # define field and output seperators
FNR==NR{ # process each field in line of `file1`
for (i=1; i <= n; i++) { # execute loop
d[$1] = $1 # match first element and read into key d
}
}
next # process next line
}{print $1, ($1 in d?d[$1]:"unknown")}' file1 file2 # if no match
print $1 followed by unknown