如何从文本中提取内容?

时间:2012-04-28 12:53:27

标签: php regex preg-match

我有一个脚本提供某些IP地址的信息。

我想从文本中提取国家。

在以下文字的国家/地区中为"Country: US"

我想只显示US

案文是:

[Querying whois.arin.net]
[whois.arin.net]
#
# Query terms are ambiguous.  The query is assumed to be:
#     "n 173.194.74.100"
#
# Use "?" to get help.
#

#
# The following results may also be obtained via:
# http://whois.arin.net/rest/nets;q=173.194.74.100?showDetails=true&showARIN=false&ext=netref2
#

NetRange:       173.194.0.0 - 173.194.255.255
CIDR:           173.194.0.0/16
OriginAS:       AS15169
NetName:        GOOGLE
NetHandle:      NET-173-194-0-0-1
Parent:         NET-173-0-0-0-0
NetType:        Direct Allocation
RegDate:        2009-08-17
Updated:        2012-02-24
Ref:            http://whois.arin.net/rest/net/NET-173-194-0-0-1


OrgName:        Google Inc.
OrgId:          GOGL
Address:        1600 Amphitheatre Parkway
City:           Mountain View
StateProv:      CA
PostalCode:     94043
Country:        US
RegDate:        2000-03-30
Updated:        2011-09-24
Ref:            http://whois.arin.net/rest/org/GOGL

OrgTechHandle: ZG39-ARIN
OrgTechName:   Google Inc
OrgTechPhone:  +1-650-253-0000 
OrgTechEmail:  arin-contact@google.com
OrgTechRef:    http://whois.arin.net/rest/poc/ZG39-ARIN

OrgAbuseHandle: ZG39-ARIN
OrgAbuseName:   Google Inc
OrgAbusePhone:  +1-650-253-0000 
OrgAbuseEmail:  arin-contact@google.com
OrgAbuseRef:    http://whois.arin.net/rest/poc/ZG39-ARIN

#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#

5 个答案:

答案 0 :(得分:2)

如果它只是你需要的正则表达式 - 试试这个 - 国家ID将在第一组

Country:\s*([A-Z]{2})
  • Country: - 匹配文字
  • \s* - 匹配任意数量的空格,标签等。
  • ([A-Z]{2}) - 匹配并捕获任意字母(大写)两次

如果您需要出现此模式,请使用preg_match_all

答案 1 :(得分:2)

使用preg_match,您可以执行以下操作:

if (preg_match('/^Country:\s*([A-Z]{2,3)$/m', $str, $match)) {
    echo $match[1];
}

答案 2 :(得分:1)

有一个用于处理whois数据的phpwhois库。它会以阵列的形式为您提供响应。

答案 3 :(得分:0)

使用preg_match

进行提取
preg_match("/Country:(.*)\"/siU", $str, $match);
echo trim($match[1]);

答案 4 :(得分:0)

$regex = "/country:[\ \t\r\n\f][A-Z]+\s/";

$txt = "descr: NCC#200X44704917
country: FR
admin-c: ACPSA223-RIPE
tech-c: TCWQQP8-RIPE";

preg_match($regex, $txt, $result);

print_r($result);

------------------------------------
数组([0] =>国家:FR)