我想知道如何使用python搜索特定的字符串。实际上我打开了一个包含以下表格的降价文件:
| --------- | -------- | --------- |
|**propped**| - | -a flashlight in one hand and a large leather-bound book (A History of Magic by Bathilda Bagshot) propped open against the pillow. |
|**Pointless**| - | -“Witch Burning in the Fourteenth Century Was Completely Pointless — discuss.”|
|**unscrewed**| - | -Slowly and very carefully he unscrewed the ink bottle, dipped his quill into it, and began to write,|
|**downtrodden**| - | -For years, Aunt Petunia and Uncle Vernon had hoped that if they kept Harry as downtrodden as possible, they would be able to squash the magic out of him.|
|**sheets,**| - | -As long as he didn’t leave spots of ink on the sheets, the Dursleys need never know that he was studying magic by night.|
|**flinch**| - | -But he hoped she’d be back soon — she was the only living creature in this house who didn’t flinch at the sight of him.|
我必须从每个用“** ** |”装饰的行中获取字符串,如:
我尝试使用正则表达式但无法提取它。
答案 0 :(得分:2)
import re
y = '(?<=\|\*{2}).+?(?=,{0,1}\*{2}\|)'
reg = re.compile(y)
a = '| --------- | -------- | --------- | |**propped**| - | -a flashlight in one hand and a large leather-bound book (A History of Magic by Bathilda Bagshot) propped open against the pillow. | |**Pointless**| - | -“Witch Burning in the Fourteenth Century Was Completely Pointless — discuss.”|'
reg.findall(a)
上面的正则表达式(y)解释说:
(?<=\|\*{2})
- 匹配字符串中的当前位置前面是\|\*{2}
的匹配,即|**
.+?
- 会尝试重复一次或多次(新线除外)。在限定符之后添加?
使其以非贪婪或最小的方式执行匹配;尽可能少的字符将被匹配。
(?=,{0,1}\*{2}\|)
- ?=
匹配前面提到的正则表达式之前的任何字符串。在这种情况下,我提到了,{0,1}\*{2}\|
,这意味着零或一个,
和2 *
以及|
结束。
答案 1 :(得分:1)
答案 2 :(得分:0)
如果您正在搜索的文本中有星号,并且您不想在sheets
之后使用逗号。该模式将是管道,后跟两个星号,然后是后面的任何内容,不是星号或逗号。
\|\*{2}([^*,]+)
如果您可以使用逗号,或者可能有逗号要删除
\|\*{2}([^*]+)
使用re.findall或re.finditer中的任一模式来捕获所需的文本。
如果使用第二种模式,则需要遍历这些组并删除任何不需要的逗号。
答案 3 :(得分:0)
我写了下面的程序来实现所需的输出。我创建了一个文件string_test,其中复制了所有原始字符串:
a=re.compile("^\|\*\*([^*,]+)")
with open("string_test","r") as file1:
for i in file1.readlines():
match=a.search(i)
if match:
print match.group(1)