用python搜索字符串

时间:2017-02-26 16:45:40

标签: python regex

我想知道如何使用python搜索特定的字符串。实际上我打开了一个包含以下表格的降价文件:

| --------- | -------- | --------- |
|**propped**| - | -a flashlight in one hand and a large leather-bound book (A History of Magic by Bathilda Bagshot) propped open against the pillow. |
|**Pointless**| - | -“Witch Burning in the Fourteenth Century Was Completely Pointless — discuss.”|
|**unscrewed**| - | -Slowly and very carefully he unscrewed the ink bottle, dipped his quill into it, and began to write,|
|**downtrodden**| - | -For years, Aunt Petunia and Uncle Vernon had hoped that if they kept Harry as downtrodden as possible, they would be able to squash the magic out of him.|
|**sheets,**| - | -As long as he didn’t leave spots of ink on the sheets, the Dursleys need never know that he was studying magic by night.|
|**flinch**| - | -But he hoped she’d be back soon — she was the only living creature in this house who didn’t flinch at the sight of him.|

我必须从每个用“** ** |”装饰的行中获取字符串,如:

  1. 支持
  2. 无意义
  3. unscrewed
  4. downtrodden
  5. sheet
  6. 退缩
  7. 我尝试使用正则表达式但无法提取它。

4 个答案:

答案 0 :(得分:2)

import re

y = '(?<=\|\*{2}).+?(?=,{0,1}\*{2}\|)'
reg = re.compile(y)
a = '| --------- | -------- | --------- | |**propped**| - | -a flashlight in one hand and a large leather-bound book (A History of Magic by Bathilda Bagshot) propped open against the pillow. | |**Pointless**| - | -“Witch Burning in the Fourteenth Century Was Completely Pointless — discuss.”|'
reg.findall(a)

上面的正则表达式(y)解释说:

(?<=\|\*{2}) - 匹配字符串中的当前位置前面是\|\*{2}的匹配,即|**

.+? - 会尝试重复一次或多次(新线除外)。在限定符之后添加?使其以非贪婪或最小的方式执行匹配;尽可能少的字符将被匹配。

(?=,{0,1}\*{2}\|) - ?=匹配前面提到的正则表达式之前的任何字符串。在这种情况下,我提到了,{0,1}\*{2}\|,这意味着零或一个,和2 *以及|结束。

答案 1 :(得分:1)

尝试使用以下正则表达式

(?<=\|)(?!\s).*?(?!\s)(?=\|)

请参阅demo / explanation

答案 2 :(得分:0)

如果您正在搜索的文本中有星号,并且您不想在sheets之后使用逗号。该模式将是管道,后跟两个星号,然后是后面的任何内容,不是星号或逗号。

\|\*{2}([^*,]+)

如果您可以使用逗号,或者可能有逗号要删除

\|\*{2}([^*]+)

使用re.findall或re.finditer中的任一模式来捕获所需的文本。

如果使用第二种模式,则需要遍历这些组并删除任何不需要的逗号。

答案 3 :(得分:0)

我写了下面的程序来实现所需的输出。我创建了一个文件string_test,其中复制了所有原始字符串:

a=re.compile("^\|\*\*([^*,]+)")
with open("string_test","r") as file1:
for i in file1.readlines():
    match=a.search(i)
    if match:
         print match.group(1)