我有两个字符串,我想找到所有常用词。例如,
s1 = 'Today is a good day, it is a good idea to have a walk.'
s2 = 'Yesterday was not a good day, but today is good, shall we have a walk?'
考虑s1匹配s2
'今天是'匹配'今天是'但是'今天是一个'与s2中的任何字符都不匹配。因此,'今天是'是常见的连续字符之一。同样,我们有一个美好的一天,',','一个好的','有一个散步'。所以常见的词是
common = ['today is', 'a good day', 'is', 'a good', 'have a walk']
我们可以使用正则表达式吗?
非常感谢。
答案 0 :(得分:2)
import string
s1 = 'Today is a good day, it is a good idea to have a walk.'
s2 = 'Yesterday was not a good day, but today is good, shall we have a walk?'
z=[]
s1=s1.translate(None, string.punctuation) #remove punctuation
s2=s2.translate(None, string.punctuation)
print s1
print s2
sw1=s1.lower().split() #split it into words
sw2=s2.lower().split()
print sw1,sw2
i=0
while i<len(sw1): #two loops to detect common strings. used while so as to change value of i in the loop itself
x=0
r=""
d=i
#print r
for j in range(len(sw2)):
#print r
if sw1[i]==sw2[j]:
r=r+' '+sw2[j] #if string same keep adding to a variable
x+=1
i+=1
else:
if x>0: # if not same check if there is already one in buffer and add it to result (here z)
z.append(r)
i=d
r=""
x=0
if x>0: #end case of above loop
z.append(r)
r=""
i=d
x=0
i+=1
#print i
print list(set(z))
#O(n^3)
答案 1 :(得分:1)
从Find common substring between two strings
获取参考资料修改了几行并添加了几行 修改是answer =“NULL”的默认返回,如果没有找到任何子字符串。
加 继续搜索,直到你得到NULL并存储到List
def longestSubstringFinder(string1, string2):
answer = "NULL"
len1, len2 = len(string1), len(string2)
for i in range(len1):
match = ""
for j in range(len2):
if (i + j < len1 and string1[i + j] == string2[j]):
match += string2[j]
else:
if (len(match) > len(answer)): answer = match
match = ""
return answer
mylist = []
def call():
s1 = 'Today is a good day, it is a good idea to have a walk.'
s2 = 'Yesterday was not a good day, but today is good, shall we have a walk?'
s1 = s1.lower()
s2 = s2.lower()
x = longestSubstringFinder(s2,s1)
while(longestSubstringFinder(s2,s1) != "NULL"):
x = longestSubstringFinder(s2,s1)
print(x)
mylist.append(x)
s2 = s2.replace(x,' ')
call()
print ('[%s]' % ','.join(map(str, mylist)))
输出
[ a good day, , have a walk,today is , good]
输出的差异
common = ['today is', 'a good day', 'is', 'a good', 'have a walk']
您对第二个“的期望是错误,因为您在s2中看到只有一个”是“