如何仅从字符串中获取子字符串的第一个实例?

时间:2018-11-09 19:56:13

标签: python string

我正在使用一些代码在我的搜索条件extractedInfosearchKey之间检查文本文件中的某些信息searchEndKey

data1 = mytextfile
searchKey = "https://cars/"
searchEndKey = "/ford/" 
extractedInfo = data1[data1.find(searchKey)+len(searchKey):data1.find(searchEndKey,data1.find(searchKey)+len(searchKey)+1)]

如果只有一个搜索键实例,它将按预期工作,但是如果有多个搜索键实例,则它将捕获从第一个searchKey一直到结束的所有信息最后searchEndKey中的一个。

例如,如果文本文件包含在其中:

  

等等等等https://cars/123456/ford/等等等等

我返回的值是123456

但是如果文本文件包含在其中:

  

等等等等https://cars/123456/ford/等等等等等等等等   https://cars/123456/ford/等等等等等等   https://cars/123456/ford/等等

我返回的值是: 123456/ford/ blah blah blah blah https://cars/123456/ford/ blah blah blah blah https://cars/123456

那么有什么方法可以告诉python在它已经找到的第一个搜索键之间获取完信息后停止运行吗?

2 个答案:

答案 0 :(得分:2)

简单的文本拆分提取对您有用吗?

with open('a.txt', 'r') as myfile:
    data = myfile.read() # read your file into a string

searchKey = "https://cars/"
searchEndKey = "/ford/"

extracted = data.split(searchKey)[1].split(searchEndKey)[0]

这只会第一次出现。当然,对于很长的字符串来说效率不是很高。

输入:

blah blah https://cars/123456/ford/ blah blah blah blah https://cars/123456/ford/ blah blah blah blah https://cars/123456/ford/ blah blah

输出:

123456

答案 1 :(得分:1)

您也可以使用re.search()对正则表达式进行此操作。像这样:

import re

s = 'blah blah https://cars/123456/ford/ blah blah blah blah https://cars/123456/ford/ blah blah blah blah https://cars/123456/ford/ blah blah'

patt = re.compile(r'https:\/\/cars\/([^\/]*)\/ford\/')
result = patt.search(s)

print(result.group(1))
# OUTPUT
# 123456