我想找到一种有效的方法来查找文件中的单词,然后比较长字符串行,没有空格来查找这些单词:
Example:
FileOfWords.txt
THE
HOUSE
DOG
ON
LINE
string1 = " ASASASASASATHEHFGFDFGDFDFDDOGFDFDF"
string2 = "DOGLINEJSDKJSDJKSDKJSDTHECVCVVCV"
string3 = "UHFDUIHKDFSHUIDSFUIHDSFHUSDSHUIS"
compare words in FileOfWords.txt to string
Output:
Words in string1 found: THE, DOG
Words in string2 found: DOG, LINE
Words in string3 found:
最好的方法是什么?
答案 0 :(得分:0)
一种简单的方法就是这样做
in
更大的字符串即,
with open(fname) as f:
wrds = f.read().strip().split('\n')
mtches = [[] for x in range(3)]
for w in wrds:
if w in string1: mtches[0].append(w)
if w in string2: mtches[1].append(w)
if w in string3: mtches[2].append(w)
答案 1 :(得分:0)
最直观的方法是使用强力方法,即寻找长度为 k 的每个子串(k-mer)(其中 k 从1开始)到字符串本身的长度)并检查这样的子字符串是否在您的文件中。
要做的第一件事是定义一个返回每个可能的k-mer的函数(生成器):
def all_kmers(sequence, k):
for i in range(len(sequence) - k + 1):
yield sequence[i:i + k]
现在是时候导入文件和三个字符串了:
# open the file, import its lines thanks to readlines() and then close it
fileIN=open('FileOfWords.txt','r')
myWords=fileIN.readlines()
fileIN.close()
# remove newline tags
for id in range(len(myWords)):
myWords[id]=myWords[id].strip()
# load the strings
string1 = " ASASASASASATHEHFGFDFGDFDFDDOGFDFDF"
string2 = "DOGLINEJSDKJSDJKSDKJSDTHECVCVVCV"
string3 = "UHFDUIHKDFSHUIDSFUIHDSFHUSDSHUIS"
现在是时候摇滚了:
print "In string 1:"
for k in range(len(string1)):
for kmer in all_kmers(string1, k):
if kmer in myWords:
print kmer
print "\nIn string 2:"
for k in range(len(string2)):
for kmer in all_kmers(string2, k):
if kmer in myWords:
print kmer
print "\nIn string 3:"
for k in range(len(string3)):
for kmer in all_kmers(string3, k):
if kmer in myWords:
print kmer
此类代码返回:
In string 1:
THE
DOG
In string 2:
DOG
THE
LINE
In string 3:
注意:通过将三个字符串括在一个列表中,您可以避免3个单独的步骤,并将所有内容包含在通过此列表运行的唯一循环中。
所以基本上在字符串定义之后,代码是:
myList=[string1,string2,string3]
for l in range(len(myList)):
StringUnderTest=myList[l]
print "String #"+str(l+1)
for k in range(len(StringUnderTest)):
for kmer in all_kmers(StringUnderTest, k):
if kmer in myWords:
print kmer
print
这样的代码返回
String #1
THE
DOG
String #2
DOG
THE
LINE
String #3
答案 2 :(得分:0)
一种简单的方法是将字符串加载到列表中。并使用两个嵌套循环进行搜索:
strings = [" ASASASASASATHEHFGFDFGDFDFDDOGFDFDF","DOGLINEJSDKJSDJKSDKJSDTHECVCVVCV","UHFDUIHKDFSHUIDSFUIHDSFHUSDSHUIS"]
words = ['THE','HOUSE','DOG','ON','LINE']
for i,string in enumerate(strings):
result = "Words in String{0} found: ".format(i+1)
for word in words:
idx = string.find(word)
if idx != -1:
result += word +', '
print(result[:-2])
或根据@guiscri的回答
加载文本文件中的单词