我有以下org-mode语法:
** Hardware [0/1]
- [ ] adapt a programmable motor to a tripod to be used for panning
** Reading - Technology [1/6]
- [X] Introduction to Networking - Charles Severance
- [ ] A Tour of C++ - Bjarne Stroustrup
- [ ] C++ How to Program - Paul Deitel
- [X] Computer Systems - Randal Bryant
- [ ] The C programming language - Brian Kernighan
- [ ] Beginning Linux Programming -Matthew and Stones
** Reading - Health [3/4]
- [ ] Patrick McKeown - The Oxygen Advantage
- [X] Total Knee Health - Martin Koban
- [X] Supple Leopard - Kelly Starrett
- [X] Convict Conditioning 1 and 2
我想提取项目,例如:
getitems "Hardware"
我应该得到:
- [ ] adapt a programmable motor to a tripod to be used for panning
如果我要求“阅读 - 健康”,我应该得到:
- [ ] Patrick McKeown - The Oxygen Advantage
- [X] Total Knee Health - Martin Koban
- [X] Supple Leopard - Kelly Starrett
- [X] Convict Conditioning 1 and 2
我使用以下模式:
pattern = re.compile("\*\* "+ head + " (.+?)\*?$", re.DOTALL)
要求“阅读 - 技术”时的输出是:
- [X] Introduction to Networking - Charles Severance
- [ ] A Tour of C++ - Bjarne Stroustrup
- [ ] C++ How to Program - Paul Deitel
- [X] Computer Systems - Randal Bryant
- [ ] The C programming language - Brian Kernighan
- [ ] Beginning Linux Programming -Matthew and Stones
** Reading - Health [3/4]
- [ ] Patrick McKeown - The Oxygen Advantage
- [X] Total Knee Health - Martin Koban
- [X] Supple Leopard - Kelly Starrett
- [X] Convict Conditioning 1 and 2
我也尝试过:
pattern = re.compile("\*\* "+ head + " (.+?)[\*|\z]", re.DOTALL)
这最后一个适用于除最后一个之外的所有标题。
在要求“阅读 - 健康”时输出:
- [ ] Patrick McKeown - The Oxygen Advantage
- [X] Total Knee Health - Martin Koban
- [X] Supple Leopard - Kelly Starrett
如您所见,它与最后一行不匹配。
我正在使用python 2.7,并且findall。
答案 0 :(得分:1)
如果您确定商品中没有字符*
,您可以使用:
re.compile(r"\*\* "+head+r" \[\d+/\d+\]\n([^*]+)\*?")
答案 1 :(得分:1)
你可以用
来实现它import re
string = """
** Hardware [0/1]
- [ ] adapt a programmable motor to a tripod to be used for panning
** Reading - Technology [1/6]
- [X] Introduction to Networking - Charles Severance
- [ ] A Tour of C++ - Bjarne Stroustrup
- [ ] C++ How to Program - Paul Deitel
- [X] Computer Systems - Randal Bryant
- [ ] The C programming language - Brian Kernighan
- [ ] Beginning Linux Programming -Matthew and Stones
** Reading - Health [3/4]
- [ ] Patrick McKeown - The Oxygen Advantage
- [X] Total Knee Health - Martin Koban
- [X] Supple Leopard - Kelly Starrett
- [X] Convict Conditioning 1 and 2
"""
def getitems(section):
rx = re.compile(r'^\*{2} ' + re.escape(section) + r'.+[\n\r](?P<block>(?:(?!^\*{2})[\s\S])+)', re.MULTILINE)
try:
items = rx.search(string)
return items.group('block')
except:
return None
items = getitems('Reading - Technology')
print(items)
<小时/>
代码的核心是(精简)表达式:
^\*{2}.+[\n\r] # match the beginning of the line, followed by two stars, anything else in between and a newline
(?P<block> # open group "block"
(?: # non-capturing group
(?!^\*{2}) # a neg. lookahead, making sure no ** follows at the beginning of a line
[\s\S] # any character...
)+ # ...at least once
) # close group "block"
在实际代码中**
之后插入搜索字符串。在 regex101.com 上查看Reading - Technology
的演示。
def getitems(section, selected=None):
rx = re.compile(r'^\*{2} ' + re.escape(section) + r'.+[\n\r](?P<block>(?:(?!^\*{2})[\s\S])+)', re.MULTILINE)
try:
items = rx.search(string).group('block')
if selected:
rxi = re.compile(r'^ - \[X\]\ (.+)', re.MULTILINE)
try:
selected_items = rxi.findall(items)
return selected_items
except:
return None
return items
except:
return None
items = getitems('Reading - Health', selected=True)
print(items)
答案 2 :(得分:0)
不确定整场比赛需要正则表达式。我只是使用正则表达式来匹配**
行,然后返回行,直到看到下一行**
行。
像
这样的东西pattern = re.compile("\*\* "+ head)
start = False
output = []
for line in my_file:
if pattern.match(line):
start = True
continue
elif line.startswith("**"): # but doesn't match pattern
break
if start:
output.append(line)
# now `output` should have the lines you want