我尝试使用Python在以下标题之间提取文字:
@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext
@HEADER1
+ @othertext
的确切文字可能会随着时间而改变。所以我需要变得充满活力。
此外,HEADER2
是一个以'@'
开头的单词。那么我可以使用startswith
函数吗?还是正则表达式?
类似的东西。
For line in file:
if(line == 'HEADER1'):
print next line
continue = TRUE
if(continue == TRUE):
print(line)
elif(line == othertext):
break
答案 0 :(得分:4)
这可以完成工作
import re
string = """@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext"""
print '"{}"'.format(re.split(r'(@HEADER1[\n\r]|[\n\r]@othertext)', string)[2])
输出:
"ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe"
答案 1 :(得分:2)
看起来像这样?
import re
string = """@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext
@HEADER2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
@othertext"""
for a in re.findall(r'@\w+(?:\r\n|\r|\n)(.*?)@\w+(?:\r\n|\r|\n)?', string, re.DOTALL):
print a
输出:
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
答案 2 :(得分:0)
没有重新
string = """@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext"""
您可以在字符串拼接中使用str.find
。像这样:
print(string[string.find("\n"):string.find("\n@")])
或者您可以将字符串转换为列表,获取所需的元素并将其重新连接在一起......
list = string.split("\n")
list = list[1:len(list)-1]
print("\n".join(list))
答案 3 :(得分:0)
我在这种情况下使用partition()方法
<style name="MaterialComponentsThemeBlueAvailableDates" parent="MaterialComponentsTheme">
<item name="materialCalendarTheme">@style/OurMaterialCalendar</item>
<item name="mtrl_picker_cancel">Overridden value</item>
</style>
输出:
text_to_extract = "@HEADER1\nExtractMe\nExtractMe\nExtractMe\nExtractMe\nExtractMe\nExtractMe\nExtractMe\nExtractMe\nExtractMe\n@othertext"
extracted = text_to_extract.partition('@HEADER1')[2].partition('@othertext')[0]
print (extracted)