Question

我有一个我想要搜索的数据列表。这个新的数据列表就是这样构建的。

姓名，地址dob家庭成员年龄身高等。

我想搜索数据行，以便在名称后面的'，'处停止搜索，以优化搜索。我相信我想使用这个命令：

str.find(sub[, start[, end]])

我在这个结构中编写代码时遇到了麻烦。关于如何让字符串查找工作的任何提示？

以下是一些示例数据：

Bennet, John, 17054099","5","156323558","-","0", 714 // 
Menendez, Juan,7730126","5","158662525" 11844 // 
Brown,  Jamal,"9","22966592","+","0",,"4432 //

这个想法是我希望我的程序只搜索第一个'，'而不是搜索其余的大行。

EDIT。所以这是我的代码。

我希望仅在第一个逗号中搜索completedataset中的行。我仍然对如何在现有代码中实现这些建议感到困惑。

counter = 1
 for line in completedataset:
     print counter
     counter +=1
     for t in matchedLines:
         if t in line:
             smallerdataset.write(line)

Answer 1

你可以直接做到：

s = 'Bennet, John, 17054099","5","156323558","-","0", 714 //'
print s.find('John', 0, s.index(',')) # find the index of ',' and stop there

Answer 2

如果我理解你的规格，

for thestring in listdata:
    firstcomma = thestring.find(',')
    havename = thestring.find(name, 0, firstcomma)
    if havename >= 0:
        print "found name:", thestring[:firstcomma]

编辑：在OP编辑Q的情况下，这将变为：

 counter = 1
 for line in completedataset:
     print counter
     counter += 1
     firstcomma = thestring.find(',')
     havename = thestring.find(t, 0, firstcomma)
     if havename >= 0:
         smallerdataset.write(line)

当然，使用counter是非低级别的，而更好的eqv将是

 for counter, line in enumerate(completedataset):
     print counter + 1
     firstcomma = thestring.find(',')
     havename = thestring.find(t, 0, firstcomma)
     if havename >= 0:
         smallerdataset.write(line)

但这不会影响问题。

Answer 3

你可能会搜索每一行，所以你可以用'，'来分割它们，然后搜索第一个元素：

for line in file:
   name=line.split(', ')[0]
   if name.find('smth'):
       break

Answer 4

你必须使用find的任何理由？为什么不这样做：

if str.split(",", 1)[0] == search_string:
    ...

修改只是想我已经指出了 - 我只是在测试它，而split方法似乎同样快（如果不比查找快）。使用timeit模块测试两种方法的性能，看看你得到了什么。

尝试：

python -m timeit -n 10000 -s "a='''Bennet, John, 17054099','5','156323558','-','0', 714'''" "a.split(',',1)[0] == 'Bennet'"

然后与：
进行比较
python -m timeit -n 10000 -s "a='''Bennet, John, 17054099','5','156323558','-','0', 714'''" "a.find('Bennet', 0, a.find(','))"

使名称更长（例如"BennetBennetBennetBennetBennetBennet"）并且您意识到发现遭受的不仅仅是分裂

注意：我正在使用split选项<{1}}

Answer 5

如果您针对每一行检查了很多名称，似乎最大的优化可能只是处理每一行的逗号一次！

for line in completedataset:
    i = line.index(',')
    first_field = line[:i]
    for name in matchedNames:
        if name in first_field:
            smalldataset.append(name)

关于字符串查找的困惑？

5 个答案: