Question

Still stuck on this one.

I need to use grep to search for the first line (ISSN) and it should be able to output 0007-9235

<td nowrap valign="top" align="right"><b>ISSN:</b></td>
<td valign="top"> 0007-9235 </td>

I had issued issn=$(grep "ISSN:" $i -A1) and out put is

<td nowrap valign="top" align="right"><b>ISSN:</b></td> <td valign="top"> 0007-9235 </td>
<td nowrap valign="top" align="right"><b>ISSN:</b></td> <td valign="top"> 0028-4793 </td>
<td nowrap valign="top" align="right"><b>ISSN:</b></td> <td valign="top"> 0009-2665 </td>
<td nowrap valign="top" align="right"><b>ISSN:</b></td> <td valign="top"> 0034-6861 </td>
<td nowrap valign="top" align="right"><b>ISSN:</b></td> <td valign="top"> 0028-0836 </td>

I need it to output only 0007-9235, the whole column below. Please help, thank you!

Answer 1

You can pipe your command to cut:

grep -Pzo '<td nowrap align="center" bgcolor="FFFFE1"><p align="center">[^>]*>\K\d+(?:\.\d+)?' $i | 
 cut -d ' ' -f1-2

Or use awk:

grep -Pzo '<td nowrap align="center" bgcolor="FFFFE1"><p align="center">[^>]*>\K\d+(?:\.\d+)?' $i | 
 awk '{print $1, $2}'

Answer 2

You can use a rudimentary state machine in awk to do the first task:

pax> awk 'e==1{print $3;e=0}/ISSN:/{e=1}' inputFile1
0007-9235

It boils down to using an echo flag e to decide if the next line should be printed:

e==1    { print $3;
          e=0
        }
/ISSN:/ { e=1
        }

The first rule states: if e is one (it's zero before the script starts), print the third argument (the ISSN) then set e to zero to turn off echo.

The second rule simply turns on echo if it finds a line containing the ISSN: string so that, for the next line, the first rule will fire.

Keep in mind that's for the very specific format you've mentioned. Given that markup language tends to be a little more free-format, it may not work if your format changes. In that case, it's probably better to use a markup processing parser of some sort.

For the second task, awk is also the answer. To get the first two columns, just use:

pax> awk '{print $1" "$2}' inputFile2
162.500 107.740
54.420 52.426
45.661 48.832
42.860 52.577
42.351 40.783
41.392 46.174
39.794 40.274
39.207 39.315
39.080 35.620
37.912 41.706
37.231 35.436
36.458 42.584

although in your case it will be:

<your_current_grep_command> | awk '{print $1" "$2}'

Grep Next Line and Modifying grep

2 个答案: