Question

我想知道如何使用python搜索特定的字符串。实际上我打开了一个包含以下表格的降价文件：

| --------- | -------- | --------- |
|**propped**| - | -a flashlight in one hand and a large leather-bound book (A History of Magic by Bathilda Bagshot) propped open against the pillow. |
|**Pointless**| - | -“Witch Burning in the Fourteenth Century Was Completely Pointless — discuss.”|
|**unscrewed**| - | -Slowly and very carefully he unscrewed the ink bottle, dipped his quill into it, and began to write,|
|**downtrodden**| - | -For years, Aunt Petunia and Uncle Vernon had hoped that if they kept Harry as downtrodden as possible, they would be able to squash the magic out of him.|
|**sheets,**| - | -As long as he didn’t leave spots of ink on the sheets, the Dursleys need never know that he was studying magic by night.|
|**flinch**| - | -But he hoped she’d be back soon — she was the only living creature in this house who didn’t flinch at the sight of him.|

我必须从每个用“** ** |”装饰的行中获取字符串，如：

支持
无意义
unscrewed
downtrodden
sheet
退缩

我尝试使用正则表达式但无法提取它。

Answer 1

import re

y = '(?<=\|\*{2}).+?(?=,{0,1}\*{2}\|)'
reg = re.compile(y)
a = '| --------- | -------- | --------- | |**propped**| - | -a flashlight in one hand and a large leather-bound book (A History of Magic by Bathilda Bagshot) propped open against the pillow. | |**Pointless**| - | -“Witch Burning in the Fourteenth Century Was Completely Pointless — discuss.”|'
reg.findall(a)

上面的正则表达式（y）解释说：

(?<=\|\*{2}) - 匹配字符串中的当前位置前面是\|\*{2}的匹配，即|**

.+? - 会尝试重复一次或多次（新线除外）。在限定符之后添加?使其以非贪婪或最小的方式执行匹配;尽可能少的字符将被匹配。

(?=,{0,1}\*{2}\|) - ?=匹配前面提到的正则表达式之前的任何字符串。在这种情况下，我提到了,{0,1}\*{2}\|，这意味着零或一个,和2 *以及|结束。

Answer 2

尝试使用以下正则表达式：

(?<=\|)(?!\s).*?(?!\s)(?=\|)

请参阅demo / explanation

Answer 3

如果您正在搜索的文本中有星号，并且您不想在sheets之后使用逗号。该模式将是管道，后跟两个星号，然后是后面的任何内容，不是星号或逗号。

\|\*{2}([^*,]+)

如果您可以使用逗号，或者可能有逗号要删除

\|\*{2}([^*]+)

使用re.findall或re.finditer中的任一模式来捕获所需的文本。

如果使用第二种模式，则需要遍历这些组并删除任何不需要的逗号。

Answer 4

我写了下面的程序来实现所需的输出。我创建了一个文件string_test，其中复制了所有原始字符串：

a=re.compile("^\|\*\*([^*,]+)")
with open("string_test","r") as file1:
for i in file1.readlines():
    match=a.search(i)
    if match:
         print match.group(1)

用python搜索字符串

4 个答案: