正则表达式捕获2个引用之间的部分

时间:2013-10-21 12:16:57

标签: python regex string nlp quotations

当我试图在引文之间抓住这句话时,我似乎无法正确使用我的正则表达式。例如。以粗体显示(注意:输入前后都有字符串):

  

“我完全理解你的想法。” 我说。“当然,在   你对非正式顾问和帮助者的立场   绝对困惑,在三大洲,你被带入   接触所有奇怪而奇异的东西。但是这里“

     

“当然,在你的非正式顾问和助手的位置   每个人都非常困惑,在三大洲,你   与所有奇怪和奇异的人接触。但   在这里“ - 我从地上拿起晨报 - ”让我们放   它是一个实际的测试。这是我来的第一个标题。   “丈夫对妻子的残忍。”有半列打印,   但我知道,如果没有阅读它,我完全熟悉它。   当然,还有另一个女人,喝酒,推,打击,   瘀伤,同情的妹妹或女房东。最粗鲁的作家   可以发明更多的原油。“

我试图在引号之前和之后获取文本,但我无法获得所需的输出。必须有一些方法可以将正则表达式分组,以便我可以在引号之间以及周围的两个引号中捕获字符串。

尝试:

import re

def get_quotes(paragraph):
    quote_rx = r'''([""])(?:(?=(\\?))\2.)*?\1'''
    return [i.group(0) for i in \
           re.finditer(quote_rx, paragraph, re.S)]

def get_said(paragraph, quote):
    quote_start = paragraph.index(quote)
    quote_end = quote_start + len(quote)
    before = paragraph[:quote_start]
    after = paragraph[quote_end:]
    return before, after


paragraphs = ['''I smiled and shook my head. "I can quite understand your thinking so." I said. "Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"--I picked up the morning paper from the ground--"let us put it to a practical test. Here is the first heading upon which I come. 'A husband's cruelty to his wife.' There is half a column of print, but I know without reading it that it is all perfectly familiar to me. There is, of course, the other woman, the drink, the push, the blow, the bruise, the sympathetic sister or landlady. The crudest of writers could invent nothing more crude."''', 
'''Such was the remarkable narrative to which I listened on that April evening -- a narrative which would have been utterly incredible to me had it not been confirmed by the actual sight of the tall, spare figure and the keen, eager face, which I had never thought to see again. In some manner he had learned of my own sad bereavement, and his sympathy was shown in his manner rather than in his words. "Work is the best antidote to sorrow, my dear Watson," said he, "and I have a piece of work for us both to-night which, if we can bring it to a successful conclusion, will in itself justify a man's life on this planet." In vain I begged him to tell me more. "You will hear and see enough before morning," he answered. "We have three years of the past to discuss. Let that suffice until half-past nine, when we start upon the notable adventure of the empty house."''']

for p in paragraphs:
    saids = set()
    for i in get_quotes(p):
        b,a = get_said(p,i)
        print b
        print a
        print

期望的输出:

in-btw: I said.
quotes: ["I can quite understand your thinking so.","Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"]
section: "I can quite understand your thinking so." **I said.** "Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"


in-btw: --I picked up the morning paper from the ground--
quotes: ['''"Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"''', '''"let us put it to a practical test. Here is the first heading upon which I come. 'A husband's cruelty to his wife.' There is half a column of print, but I know without reading it that it is all perfectly familiar to me. There is, of course, the other woman, the drink, the push, the blow, the bruise, the sympathetic sister or landlady. The crudest of writers could invent nothing more crude."''']
section: "Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"**--I picked up the morning paper from the ground--**"let us put it to a practical test. Here is the first heading upon which I come. 'A husband's cruelty to his wife.' There is half a column of print, but I know without reading it that it is all perfectly familiar to me. There is, of course, the other woman, the drink, the push, the blow, the bruise, the sympathetic sister or landlady. The crudest of writers could invent nothing more crude."

1 个答案:

答案 0 :(得分:2)

这很简单,你需要的正则表达式是r'^("[^"]+")([^"]+)("[^"]+")'

import re

s = """
"I can quite understand your thinking so." I said. "Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"

"Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"--I picked up the morning paper from the ground--"let us put it to a practical test. Here is the first heading upon which I come. 'A husband's cruelty to his wife.' There is half a column of print, but I know without reading it that it is all perfectly familiar to me. There is, of course, the other woman, the drink, the push, the blow, the bruise, the sympathetic sister or landlady. The crudest of writers could invent nothing more crude."
"""

for segment in s.splitlines():
    if not segment:
        continue
    first, said, second = re.match(r'^("[^"]+")([^"]+)("[^"]+")', segment).groups()
    print first
    print said
    print second

>>> 
"I can quite understand your thinking so."
 I said. 
"Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"
"Of course, in your position of unofficial adviser and helper to everybody who is absolutely puzzled, throughout three continents, you are brought in contact with all that is strange and bizarre. But here"
--I picked up the morning paper from the ground--
"let us put it to a practical test. Here is the first heading upon which I come. 'A husband's cruelty to his wife.' There is half a column of print, but I know without reading it that it is all perfectly familiar to me. There is, of course, the other woman, the drink, the push, the blow, the bruise, the sympathetic sister or landlady. The crudest of writers could invent nothing more crude."