使用re与rsplit

时间:2017-09-07 14:29:13

标签: python regex

我正在尝试从whois.com中提取所有名称服务器的名称

除了最后一台服务器外,其他所有服务器都是:

name_servers = (re.split(re.compile('Name Server',re.I),info))[1:-1]

这是我想要获取最后一台服务器(不起作用)的原因:

name_servers_end = (info.rsplit(re.compile('Name Server', re.I).pattern, 1)[1]).splitlines()[0]

我知道re.compile不能在rsplit中使用。 rsplit是否有正则表达式替代方案?或者,有没有更好的方法来实现我想要做的整体?

来自信息字符串的示例内容:

Domain Name: google.com
Registry Domain ID: 2138514_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.markmonitor.com
Registrar URL: http://www.markmonitor.com
Updated Date: 2015-06-12T10:38:52-0700
Creation Date: 1997-09-15T00:00:00-0700
Registrar Registration Expiration Date: 2020-09-13T21:00:00-0700
Registrar: MarkMonitor, Inc.
Registrar IANA ID: 292
Registrar Abuse Contact Email: @markmonitor.com
Registrar Abuse Contact Phone: +1.2083895740
Domain Status: clientUpdateProhibited (https://www.icann.org/epp#clientUpdateProhibited)
Domain Status: clientTransferProhibited (https://www.icann.org/epp#clientTransferProhibited)
Domain Status: clientDeleteProhibited (https://www.icann.org/epp#clientDeleteProhibited)
Domain Status: serverUpdateProhibited (https://www.icann.org/epp#serverUpdateProhibited)
Domain Status: serverTransferProhibited (https://www.icann.org/epp#serverTransferProhibited)
Domain Status: serverDeleteProhibited (https://www.icann.org/epp#serverDeleteProhibited)
Registry Registrant ID: 
Registrant Name: Dns Admin
Registrant Organization: Google Inc.
Registrant Street: Please contact @google.com, 1600 Amphitheatre Parkway
Registrant City: Mountain View
Registrant State/Province: CA
Registrant Postal Code: 94043
Registrant Country: US
Registrant Phone: +1.6502530000
Registrant Phone Ext: 
Registrant Fax: +1.6506188571
Registrant Fax Ext: 
Registrant Email: @google.com
Registry Admin ID: 
Admin Name: DNS Admin
Admin Organization: Google Inc.
Admin Street: 1600 Amphitheatre Parkway
Admin City: Mountain View
Admin State/Province: CA
Admin Postal Code: 94043
Admin Country: US
Admin Phone: +1.6506234000
Admin Phone Ext: 
Admin Fax: +1.6506188571
Admin Fax Ext: 
Admin Email: @google.com
Registry Tech ID: 
Tech Name: DNS Admin
Tech Organization: Google Inc.
Tech Street: 2400 E. Bayshore Pkwy
Tech City: Mountain View
Tech State/Province: CA
Tech Postal Code: 94043
Tech Country: US
Tech Phone: +1.6503300100
Tech Phone Ext: 
Tech Fax: +1.6506181499
Tech Fax Ext: 
Tech Email: @google.com
Name Server: ns4.google.com
Name Server: ns3.google.com
Name Server: ns1.google.com
Name Server: ns2.google.com
DNSSEC: unsigned
URL of the ICANN WHOIS Data Problem Reporting System: http://wdprs.internic.net/
>>> Last update of WHOIS database: 2017-09-06T06:29:02-0700 <<<

The Data in MarkMonitor.com's WHOIS database is provided by MarkMonitor.com for
information purposes, and to assist persons in obtaining information about or
related to a domain name registration record.  MarkMonitor.com does not guarantee
its accuracy.  By submitting a WHOIS query, you agree that you will use this Data
only for lawful purposes and that, under no circumstances will you use this Data to:
 (1) allow, enable, or otherwise support the transmission of mass unsolicited,
     commercial advertising or solicitations via e-mail (spam); or
 (2) enable high volume, automated, electronic processes that apply to
     MarkMonitor.com (or its systems).
MarkMonitor.com reserves the right to modify these terms at any time.
By submitting this query, you agree to abide by this policy.

MarkMonitor is the Global Leader in Online Brand Protection.

MarkMonitor Domain Management(TM)
MarkMonitor Brand Protection(TM)
MarkMonitor AntiPiracy(TM)
MarkMonitor AntiFraud(TM)
Professional and Managed Services

Visit MarkMonitor at http://www.markmonitor.com
Contact us at +1.8007459229
In Europe, at +44.02032062220

For more information on Whois status codes, please visit
 https://www.icann.org/resources/pages/epp-status-codes-2014-06-16-en
--

名称服务器的数量被假定为未知,并且必须使用正则表达式进行删除。

1 个答案:

答案 0 :(得分:2)

您不需要拆分任何内容,只需搜索以private String imageToString(Bitmap bitmap){ ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); bitmap.compress(Bitmap.CompressFormat.JPEG, 100, byteArrayOutputStream); byte[] imageByte = byteArrayOutputStream.toByteArray(); return Base64.encodeToString(imageByte, Base64.DEFAULT); } 开头的行。 正确的工具是re.findall

  

返回字符串中pattern的所有非重叠匹配,作为字符串列表。

这是我使用的正则表达式。它查找以"Name Server:"开头并且后跟多个非断裂字符出现的内容。 外观表达(遗憾的是没有直接链接,但转到the documentation并点击ctrl + f - &gt;&#34; lookbehind&#34;)"Name Server:说我不想要那个部分在结果中。

(?<=...)

输出:

results = re.findall(r"(?<=Name Server: )[^\n]*", info)

此外,在使用它之前编译正则表达式通常更好,因为['ns4.google.com', 'ns3.google.com', 'ns1.google.com', 'ns2.google.com'] 无论如何都会编译它。 事先编译它并将函数作为它的方法调用可能会在循环中节省大量时间。 在这里,它还允许您指定标志,例如re

re.IGNORECASE

虽然这些标志也可以传递给函数:

regex = re.compile(r"(?<=Name Server: )[^\n]*", re.IGNORECASE)
results = regex.findall(info)