我在Python 3 Anaconda中运行Spyder3时有如下的巨大文本字符串:
search="germany"
text = "germany's gabriel denies report he is eyeing finmin post
berlin (reuters) - german foreign minister sigmar gabriel on saturday denied
a report that said the social democrat, whose party has agreed to enter
talks with chancellor angela merkel's conservatives on forming a coalition,
was eyeing the post of finance minister.
13.5 hours ago
— reuters
iit-kharagpur gets over 1,000 placement offers in eight days
quantiphi analytics emerged as the largest recruiter of the season till date
offering 34 jobs, followed by intel at 33
13.5 hours ago
— business standard"
我可以使用以下条件在文本内搜索:
if search in text:
print("Found")
else:
print("Not Found")
但我真正需要的是获取所有相关的新闻文字让我们说"德国"从"德国的加布里埃尔否定报告开始......"直到财政部长的职位#34;以防在文本中找到德国。
关于如何完成这项壮举的任何想法? 提前感谢您的所有答案。
答案 0 :(得分:1)
这是ez但你应该阅读有关正则表达式(正则表达式)的原因我不知道整个数据结构:
import re
search = input("Insert keyword")
text ="............."
if re.search(r'%s(.*?)\n\n'%(search),text,re.DOTALL) == None:
print("Sorry did't found")
else:
news = re.search(r'%s(.*?)\n\n'%(search),text,re.DOTALL).group()
print(news)
答案 1 :(得分:0)
不是搜索"Germany"
,而是搜索"German"
,而不是搜索这两种情况。您可能还需要将所有内容转换为小写/大写以搜索任何大小写的子字符串。
您可以先使用re.finditer()
获取所有子字符串位置:
import re
search="German"
text = """germany's gabriel denies report he is eyeing finmin postberlin
(reuters) - german foreign minister sigmar gabriel on saturday denied
a report that said the social democrat, whose party has agreed to enter
talks with chancellor angela merkel's conservatives on forming a coalition,
was eyeing the post of finance minister."""
# converted to lowercase to making searching easier
sub_locs = [s.start() for s in re.finditer(search.lower(), text.lower())]
print(sub_locs)
这将给出:
[0, 75]
然后,您可以根据text
的索引在sub_locs
中切片并添加子字符串:
substrings = []
for start, end in zip(sub_locs[:-1], sub_locs[1:]):
substrings.append(text[start:end])
# Get last substring
substrings.append(text[end:])
print("GERMAN SUBSTRINGS:")
for i, substr in enumerate(substrings):
print("{0} -> {1}\n".format(i + 1, substr))
哪个输出:
GERMAN SUBSTRINGS
1 -> germany's gabriel denies report he is eyeing finmin postberlin (reuters) -
2 -> german foreign minister sigmar gabriel on saturday denied
a report that said the social democrat, whose party has agreed to enter
talks with chancellor angela merkel's conservatives on forming a coalition,
was eyeing the post of finance minister.