我尝试将decoding algorithm wanted问题细分为较小的问题。这是第一部分。
问题:
示例1:
s1 = "12 November 2010 - 1 visitor"
s2 = "6 July 2010 - 100 visitors"
the identical parts are "2010", "-", "1" and "visitor"
示例2:
s1 = "Welcome, John!"
s2 = "Welcome, Peter!"
the identical parts are "Welcome," and "!"
示例3 :(澄清“!”示例)
s1 = "Welcome, Sam!"
s2 = "Welcome, Tom!"
the identical parts are "Welcome," and "m!"
Python和Ruby首选。感谢
答案 0 :(得分:3)
例如1
>>> s1 = 'November 2010 - 1 visitor'
>>> s2 = '6 July 2010 - 100 visitors'
>>>
>>> [i for i in s1.split() if any(j for j in s2.split() if i in j)]
['2010', '-', '1', 'visitor']
>>>
对于两者
>>> s1 = "Welcome, John!"
>>> s2 = "Welcome, Peter!"
>>> [i for i in s1.replace('!',' !').split() if any(j for j in s2.replace('!',' !').split() if i in j)]
['Welcome,', '!']
>>>
注意:以上代码不适用于示例3,即刚添加的OP
答案 1 :(得分:2)
编辑:更新此示例以使用所有示例,包括#1:
def scan(s1, s2):
# Find the longest match where s1 starts with s2
# Returns None if no matches
l = len(s1)
while 1:
if not l:
return None
elif s1[:l] == s2[:l]:
return s1[:l]
else:
l -= 1
def contains(s1, s2):
D = {} # Remove duplicates using a dict
L1 = s1.split(' ')
L2 = s2.split(' ')
# Don't add results which have already
# been processed to satisfy example #1!
DProcessed = {}
for x in L1:
yy = 0
for y in L2:
if yy in DProcessed:
yy += 1
continue
# Scan from the start to the end of the words
a = scan(x, y)
if a:
DProcessed[yy] = None
D[a] = None
break
# Scan from the end to the start of the words
a = scan(x[::-1], y[::-1])
if a:
DProcessed[yy] = None
D[a[::-1]] = None
break
yy += 1
return list(D.keys())
print contains("12 November 2010 - 1 visitor",
"6 July 2010 - 100 visitors")
print contains("Welcome, John!",
"Welcome, Peter!")
print contains("Welcome, Sam!",
"Welcome, Tom!")
哪个输出:
['1', 'visitor', '-', '2010']
['Welcome,', '!']
['Welcome,', 'm!']
答案 2 :(得分:1)
s1 = "12 November 2010 - 1 visitor"
s2 = "6 July 2010 - 100 visitors"
l1 = s1.split()
for item in l1:
if item in s2:
print item
这分裂在空格上。
同样在单词边界上拆分的解决方案(为了捕获示例2中的!
)在Python中不起作用,因为re.split()
不会在零长度匹配上拆分。
第三个例子,即使字的任何子字符串都是潜在的匹配,也会因为许多可能的变化而使事情变得复杂得多(对于1234
,我必须检查1234
},123
,234
,12
,23
,34
,1
,2
,3
和4
,每增加一个数字,排列数量呈指数增长。
答案 3 :(得分:1)
完整的Ruby解决方案:
def start_similar(i, j)
front = ''
for ix in (0...([i.size, j.size].min))
if i[ix] == j[ix] then
front += i[ix].chr
else
break
end
end
return front
end
def back_similar(i, j)
back = ''
for ix in (0...([i.size, j.size].min)).to_a.reverse
dif = i.size < j.size ? j.size - i.size : i.size - j.size
ci = i[ i.size < j.size ? ix : ix + dif ]
cj = j[ i.size > j.size ? ix : ix + dif ]
if ci == cj then
back = ci.chr + back
else
break
end
end
return back
end
def scan(x, y)
a, b = x.split(' '), y.split(' ')
result = []
for i in a do
for j in b do
result << start_similar(i, j)
result << back_similar(i, j)
end
end
return result.uniq.select do |r| not r.empty? end
end
puts scan(
"12 November 2010 - 1 visitor",
"6 July 2010 - 100 visitors"
).inspect
# ["1", "2010", "0", "-", "visitor"]
puts scan(
"Welcome, John!",
"Welcome, Peter!"
).inspect
# ["Welcome,", "!"]
puts scan(
"Welcome, Sam!",
"Welcome, Tom!"
).inspect
# ["Welcome,", "m!"]