我正在使用一些代码在我的搜索条件extractedInfo
和searchKey
之间检查文本文件中的某些信息searchEndKey
:
data1 = mytextfile
searchKey = "https://cars/"
searchEndKey = "/ford/"
extractedInfo = data1[data1.find(searchKey)+len(searchKey):data1.find(searchEndKey,data1.find(searchKey)+len(searchKey)+1)]
如果只有一个搜索键实例,它将按预期工作,但是如果有多个搜索键实例,则它将捕获从第一个searchKey
一直到结束的所有信息最后searchEndKey
中的一个。
例如,如果文本文件包含在其中:
等等等等https://cars/123456/ford/等等等等
我返回的值是123456
。
但是如果文本文件包含在其中:
等等等等https://cars/123456/ford/等等等等等等等等 https://cars/123456/ford/等等等等等等 https://cars/123456/ford/等等
我返回的值是:
123456/ford/ blah blah blah blah https://cars/123456/ford/ blah blah blah blah https://cars/123456
那么有什么方法可以告诉python在它已经找到的第一个搜索键之间获取完信息后停止运行吗?
答案 0 :(得分:2)
简单的文本拆分提取对您有用吗?
with open('a.txt', 'r') as myfile:
data = myfile.read() # read your file into a string
searchKey = "https://cars/"
searchEndKey = "/ford/"
extracted = data.split(searchKey)[1].split(searchEndKey)[0]
这只会第一次出现。当然,对于很长的字符串来说效率不是很高。
输入:
blah blah https://cars/123456/ford/ blah blah blah blah https://cars/123456/ford/ blah blah blah blah https://cars/123456/ford/ blah blah
输出:
123456
答案 1 :(得分:1)
您也可以使用re.search()
对正则表达式进行此操作。像这样:
import re
s = 'blah blah https://cars/123456/ford/ blah blah blah blah https://cars/123456/ford/ blah blah blah blah https://cars/123456/ford/ blah blah'
patt = re.compile(r'https:\/\/cars\/([^\/]*)\/ford\/')
result = patt.search(s)
print(result.group(1))
# OUTPUT
# 123456