我有一个包含以下数据的文本文件:
History
The term "data science" (originally used interchangeably with "datalogy") has existed for over thirty years and was used initially as a substitute for computer science by Peter Naur in 1960. In 1974, Naur published Concise Survey of Computer Methods, which freely used the term data science in its survey of the contemporary data processing methods that are used in a wide range of application
Application
In the 2010–2011 time frame, data science software reached an inflection point where open source software started supplanting proprietary software.[30] The use of open source software enables modifying and extending the software, and it allows sharing of the resulting algorithms
现在我想提取包含特定字词集的段落或特定部分,例如{" Software", opensource" }
我已尝试regexp
和if loop
,但无法提取所需的输出,任何人都可以帮助我。
答案 0 :(得分:1)
使用正则表达式:
$a[0][0]
您最终会在列表import re
my_string = """History
The term "data science" (originally used interchangeably with "datalogy") has existed for over thirty years and was used initially as a substitute for computer science by Peter Naur in 1960. In 1974, Naur published Concise Survey of Computer Methods, which freely used the term data science in its survey of the contemporary data processing methods that are used in a wide range of application
Application
In the 2010–2011 time frame, data science software reached an inflection point where open source software started supplanting proprietary software.[30] The use of open source software enables modifying and extending the software, and it allows sharing of the resulting algorithms
"""
pattern = '\n.+(?:software|open\s?source).+\n'
paragraph_list = re.findall(pattern, my_string)
print(paragraph_list)
修改强>
如果您希望关键字是动态的,或者由列表/元组提供:
paragraph_list
答案 1 :(得分:0)
您可以轻松找到子字符串是否是较大字符串的一部分:
>>> str='In the 2010–2011 time frame, data science software reached an inflection point where open source software started supplanting proprietary software.[30] The use of open source software enables modifying and extending the software, and it allows sharing of the resulting algorithms'
>>> "software" in str
True
您可以提取包含特定单词的文件行:
>>> f = open('yourfile.txt','r')
>>> result=[i for i in data if 'software' in i]