Question

这是（真实世界）文字：

<tr>
randomtext
ip_(45.54.58.85)
randomtext..
port(randomtext45)
randomtext random...
</tr>
<tr>
randomtext ran
ip_(5.55.45.8)  
randomtext4
port(other$_text_other_length444)
</tr>
<tr>
randomtext
random
port(other$text52)
</tr>

输出应该是：

45.54.58.85 45

5.55.45.8 444

我知道如何grep 45.54.58.85和5.55.45.8

awk 'BEGIN{ RS="<tr>"}1' file | grep -oP '(?<=ip_\()[^)]*'

如何考虑到端口后的随机文本/长度（？

我放了一个不应该出现在输出中的第三条记录，因为没有ip

Answer 1

使用GNU Awk：

gawk 'BEGIN { RS = "<tr>" } match($0, /.*^ip_[(]([^)]+).*^port[(].*[^0-9]+([0-9]+)[)].*/, a) { print a[1], a[2] }' your_file

另一个与任何Awk兼容的东西：

awk -F '[()]' '$1 == "<tr>" { i = 0 } $1 == "ip_" { i = $2 } $1 == "port" && i { sub(/.*[^0-9]/, "", $2); if (length($2)) print i, $2 }' your_file

输出：

45.54.58.85 45
5.55.45.8 444

Answer 2

通过GNU awk，grep和paste。

$ awk 'BEGIN{ RS="<tr>"}/ip_/{print;}' file | grep -oP 'ip_\(\K[^)]*|port\(\D*\K\d+' | paste - -
45.54.58.85 45
5.55.45.8   444

<强>解释

awk 'BEGIN{ RS="<tr>"}/ip_/{print;}' file

<tr>，此awk命令仅打印包含字符串ip_的记录
ip_\(\K[^)]*仅打印ip_(之后的文本，直到下一个)符号。模式中的\K会丢弃先前匹配的字符。
|逻辑OR符号。
port\(\D*\K\d+仅打印port()字符串中的数字。
paste - -每两行合并一次。

Answer 3

这是另一个awk

awk -F"[()]" '/^ip/ {ip=$2;f=NR} f && NR==f+2 {n=split($2,a,"[a-z]+");print ip,a[n]}' file
45.54.58.85 45
5.55.45.8 444

工作原理：

awk -F"[()]" '              # Set field separator to "()"
/^ip/ {                     # If line starts with "ip" do
    ip=$2                   # Set "ip" to field $2
    f=NR}                   # Set "f" to line number
f && NR==f+2 {              # Go two line down and
    n=split($2,a,"[a-z]+")  # Split second part to get port
    print ip,a[n]           # Print "ip" and "port"
    }' file                 # Read the file

Answer 4

任何现代的awk：

$ awk -F'[()]' '
    $1=="ip_"   { ip=$2 }
    $1=="port"  { sub(/.*[^[:digit:]]/,"",$2); port=$2 }
    $1=="</tr>" { if (ip) print ip, port; ip="" }
' file
45.54.58.85 45
5.55.45.8 444

恕我直言并不是更简单和清晰。

多个模式之间的grep

4 个答案: