我有test.txt文件,从单词表中查找字符串和子字符串
<aardwolf>
<Aargau>
<Aaronic>
<aac>
<akac>
<abaca>
<abactinal>
<abacus>
test.py文件
import sys # the sys module
import os
import re
def hasattr(str,list):
expr = re.compile(str)
# yield the elements
return [elem for elem in list if expr.match(elem)]
isword = {}
FH = open(sys.argv[1],'r',encoding="ISO-8859-1")
for strLine in FH.readlines(): isword.setdefault(''.join(sorted(strLine[1:strLine.find('>')].upper())),[]).append(strLine[:-1])
print (isword)
basestring=str()
for ARGV in sys.argv[2:]:
print ("\n*** %s\n" %ARGV )#print Argv
diffpatletters = re.compile(u'[a-zA-Z]').findall(ARGV.upper())
#print (diffpatletters)
diffpat = '.*' + '(.*)'.join(sorted(diffpatletters)) + '.*'
#print (diffpat)
for KEY in hasattr(diffpat,isword.keys()):
# print (KEY)
SUBKEY = KEY
for X in diffpatletters:
#print (X)
SUBKEY1 = SUBKEY.replace(X,'')
#print (SUBKEY)
if SUBKEY1 in isword:
#print (SUBKEY)
basestring+= "%s -> %s" %(isword[KEY], isword[SUBKEY1])
print (basestring + "\n")
下面是在命令行中运行文件
python test.py test.txt aack aadfl
预计会在第二个参数之后找到匹配的字符串和子字符串。My basestring not printing
答案 0 :(得分:7)
您必须使用正则表达式吗? 如果没关系,您想要这样的结果吗?
for element in array:
with open('test.txt', 'r')as f:
s = f.read()
s = s.split('\n')
s
Out[1]:
['<aardwolf>',
'<Aargau>',
'<Aaronic>',
'<aac>',
'<akac>',
'<abaca>',
'<abactinal>',
'<abacus> ']
ARGVs = ['aard', 'onic', 'abacu']
matches = [x for x in s for arg in ARGVs if arg.lower() in x.lower()]
print(matches)
Out[2]:
['<aardwolf>', '<Aaronic>', '<abacus> ']
ARGVs = ['aard', 'onic', 'abacu', 'aaro', 'ac']
{key:[x for x in s if key in x] for key in ARGVs if len([x for x in s if key in x]) != 0}
Out[3]:
{'aard': ['<aardwolf>'],
'onic': ['<Aaronic>'],
'abacu': ['<abacus> '],
'ac': ['<aac>', '<akac>', '<abaca>', '<abactinal>', '<abacus> ']}