我在python控制台中改进了这个表达式:
texts = re.findall(r"text[^>]*\>(?P<text>(:?[^<]|</\s?[^tT])*)\</text", text)
它非常有效,当我在控制台中执行时,他的执行时间几乎是即时的,但当我把它放入我的代码并通过解释器执行时,它似乎被阻止了。
我在控制台中再次测试它,并在不到一秒的时间内再次执行。
我检查阻塞句是正则表达式执行,文本是否与所有执行相同。
发生了什么事?
----------------------------------------代码------ ---------------------------------------
class Wiki:
# Regex definition
search_text_regex = re.compile(r"text[^>]*\>(?P<text>(:?[^<]|</\s?[^tT])*)\</text")
def search_by_title(self, name, text):
""" Search the slice(The last) of the text that contains the exact name and return the slice index.
"""
print "Backoff Launched:"
# extract the tex from wikipedia Pages
print "\tExtracting Texts from pages..."
texts = self.search_text_regex.findall(text) # <= The Regex Launch
# find the name in the text
print "\tFinding names on text..."
for index, text in enumerate(texts):
if name in text:
return index
return None
-----------------源----------------------------- -----
<page><title>Andrew Johnson</title><id>1624</id><revision><id>244612901</id><timestamp>2008-10-11T18:30:44Z</timestamp><contributor><username>Excirial</username><id>5499713</id></contributor><minor/><comment>Reverted edits by [[Special:Contributions/71.113.103.209|71.113.103.209]] to last version by Soliloquial ([[WP:HG|HG]])</comment><text xml:space="preserve">{{otherpeople2|Andrew Johnson (disambiguation)}}
{{Infobox President
|name=Andrew Johnson
|nationality=American
|image=Andrew Johnson - 3a53290u.png
|caption=President Andrew Johnson, taken in 1865 by [[Mathew Brady|Matthew Brady]].
|order=17th [[President of the United States]]
|vicepresident=none
|term_start=April 15, 1865
|term_end=March 4, 1869
|predecessor=[[Abraham Lincoln]]
|successor=[[Ulysses S. Grant]]
|birth_date={{birth date|mf=yes|1808|12|29}}
|birth_place=[[Raleigh, North Carolina]]
|death_date={{death date and age|mf=yes|1875|7|31|1808|12|29}}
|death_place=[[Elizabethton, Tennessee]]
|spouse=[[Eliza McCardle Johnson]]
|occupation=[[Tailor]]
|party=[[History of the Democratic Party (United States)|Democratic]] until 1864 and after 1869; elected Vice President in 1864 on a [[National Union Party (United States)|National Union]] ticket; no party affiliation 1865–1869
|signature=Andrew Johnson Signature.png
|order2=16th [[Vice President of the United States]]
|term_start2=March 4, 1865
|term_end2=April 15, 1865
|president2=[[Abraham Lincoln]]
|predecessor2=[[Hannibal Hamlin]]
|successor2=[[Schuyler Colfax]]
|jr/sr3=United States Senator
|state3=[[Tennessee]]
|term_start3=October 8, 1857
|term_end3=March 4, 1862
|preceded3=[[James C. Jones]]
|succeeded3=[[David T. Patterson]]
|term_start4=March 4, 1875
|term_end4=July 31, 1875
|preceded4=[[William Gannaway Brownlow|William G. Brownlow]]
|succeeded4=[[David M. Key]]
|order5=17th
|title5=[[Governor of Tennessee]]
|term_start5=October 17, 1853
|term_end5=November 3, 1857
|predecessor5=[[William B. Campbell]]
|successor5=[[Isham G. Harris]]
|religion=[[Christian]] (no denomination; attended Catholic and Methodist services)<ref>[http://www.adherents.com/people/pj/Andrew_Johnson.html Adherents.com: The Religious Affiliation of Andrew Johnson]</ref>
}}
Johnson was nominated for the [[Vice President of the United States|Vice President]] slot in 1864 on the [[National Union Party (United States)|National Union Party]] ticket. He and Lincoln were [[United States presidential election, 1864|elected in November 1864]]. Johnson succeeded to the Presidency upon Lincoln's assassination on April 15, 1865.
==Bibliography==
{{portal|Tennessee}}
{{portal|United States Army|United States Department of the Army Seal.svg}}
{{portal|American Civil War}}
* Howard K. Beale, ''The Critical Year. A Study of Andrew Johnson and Reconstruction'' (1930). ISBN 0-8044-1085-2
* Winston; Robert W. ''Andrew Johnson: Plebeian and Patriot'' (1928) [http://www.questia.com/PM.qst?a=o&d=3971949 online edition]
===Primary sources===
* Ralph W. Haskins, LeRoy P. Graf, and Paul H. Bergeron et al, eds. ''The Papers of Andrew Johnson'' 16 volumes; University of Tennessee Press, (1967–2000). ISBN 1572330910.) Includes all letters and speeches by Johnson, and many letters written to him. Complete to 1875.
* [http://www.impeach-andrewjohnson.com/ Newspaper clippings, 1865–1869]
* [http://www.andrewjohnson.com/09ImpeachmentAndAcquittal/ImpeachmentAndAcquittal.htm Series of [[Harper's Weekly]] articles covering the impeachment controversy and trial]
*[http://starship.python.net/crew/manus/Presidents/aj2/aj2obit.html Johnson's obituary, from the ''New York Times'']
==Notes==
{{reflist|2}}
==External links==
{{sisterlinks|s=Author:Andrew Johnson}}
*{{gutenberg author|id=Andrew+Johnson | name=Andrew Johnson}}
{{s-start}}
{{s-par|us-hs}}
{{s-aft|after=[[Ulysses S. Grant]]}}
{{s-par|us-sen}}
{{s-bef|before=[[James C. Jones]]}}
{{s-ttl|title=[[List of United States Senators from Tennessee|Senator from Tennessee (Class 1)]]|years=October 8, 1857{{ndash}} March 4, 1862|alongside=[[John Bell (Tennessee politician)|John Bell]], [[Alfred O. P. Nicholson]]}}
{{s-vac|next=[[David T. Patterson]]|reason=[[American Civil War|Secession of Tennessee from the Union]]}}
{{s-bef|before=[[William Gannaway Brownlow|William G. Brownlow]]}}
{{s-ttl|title=[[List of United States Senators from Tennessee|Senator from Tennessee (Class 1)]]| years=March 4, 1875{{ndash}} July 31, 1875|alongside=[[Henry Cooper (U.S. Senator)|Henry Cooper]]}}
{{s-aft|after=[[David M. Key]]}}
{{s-ppo}}
{{s-bef|before=[[Hannibal Hamlin]]}}
{{s-ttl|title=[[List of United States Republican Party presidential tickets|Republican Party¹ vice presidential candidate]]|years=[[U.S. presidential election, 1864|1864]]}}
{{Persondata
|NAME= Johnson, Andrew
|ALTERNATIVE NAMES=
|SHORT DESCRIPTION= seventeenth [[President of the United States]]<br/> [[Union (American Civil War)|Union]] [[Union Army|Army]] [[General officer|General]]
|DATE OF BIRTH={{birth date|mf=yes|1808|12|29|mf=y}}
|PLACE OF BIRTH= [[Raleigh, North Carolina]]
|DATE OF DEATH={{death date|mf=yes|1875|7|31|mf=y}}
|PLACE OF DEATH= [[Greeneville, Tennessee]]
}}
{{Lifetime|1808|1875|Johnson, Andrew}}
[[Category:Presidents of the United States]]
[[vi:Andrew Johnson]]
[[tr:Andrew Johnson]]
[[uk:Ендрю Джонсон]]
[[ur:انڈریو جانسن]]
[[yi:ענדרו זשאנסאן]]
[[zh:安德鲁·约翰逊]]</text></revision></page>
答案 0 :(得分:1)
我解决了。 代码有一个清理文本的管道,删除一些必要的标记以进行正确匹配。 由于文本的长度,搜索不可能的匹配需要花费太多时间。
答案 1 :(得分:0)
我会用这个:
result = re.findall(r"(?s)<text[^>]*>(?P<text>(?:(?!</?text>).)*)</text>", subject)
(?:(?!</?text>).)*
一次消耗一个字符,但只有在前瞻后才会验证它不是<text>
或</text>
标记的第一个字符。