我正在阅读GeeksforGeeks文档。有一个问题,Sentence that contains all the given phrases
。
详细信息如下: 给定一个句子列表和一个短语列表。该任务是查找哪些短语包含一个短语中的所有单词,并为每个短语打印包含给定短语的句子编号。
例如: 输入:
sent = ["Strings are an array of characters",
"Sentences are an array of words"]
ph = ["an array of", "sentences are strings"]
输出:
Phrase1:
1 2
Phrase2:
NONE
代码:
# Python program to find the sentence
# that contains all the given phrases
def getRes(sent, ph):
sentHash = dict()
# Loop for adding hased sentences to sentHash
for s in range(1, len(sent)+1):
sentHash[s] = set(sent[s-1].split())
# For Each Phrase
for p in range(0, len(ph)):
print("Phrase"+str(p + 1)+":")
# Get the list of Words
wordList = ph[p].split()
res = []
# Then Check in every Sentence
for s in range(1, len(sentHash)+1):
wCount = len(wordList)
# Every word in the Phrase
for w in wordList:
if w in sentHash[s]:
wCount -= 1
# If every word in phrase matches
if wCount == 0:
# add Sentence Index to result Array
res.append(s)
if(len(res) == 0):
print("NONE")
else:
print('% s' % ' '.join(map(str, res)))
# Driver Function
def main():
sent = ["Strings are an array of characters",
"Sentences are an array of words"]
ph = ["an array of", "sentences are strings"]
getRes(sent, ph)
main()
这是正确的。但是我想知道如何优化答案以减少时间复杂度或使其运行更快。我也在解决类似的问题,所以这就是为什么我要问。非常感谢您能帮助我。
答案 0 :(得分:1)
您当前的算法大约运行O(| sent | * | phrase | * k),其中k是句子中平均单词数。 Patrik的答案将k降低到词组中单词的平均数量,在您的情况下,该数量应小于10,因此是一个很大的改进。
可能无法改善最坏的情况,但是我们仍然可以改善平均情况。想法是建立一个索引,将出现在句子中的所有单词作为键,并建立一个以该单词为值的句子索引列表。
这样,我们就可以检查给定的短语,每个单词有多少个句子,并只需较少的元素就可以遍历列表。例如,如果您的短语中没有句子,那么我们避免完全迭代该短语的句子。
from collections import Counter
from collections import defaultdict
def containsQty(sentence, phrase):
qty = 100000
for word in phrase:
qty = min(qty, int(sentence[word] / phrase[word]))
if qty == 0:
break
return qty
sent = ["bob and alice like to text each other", "bob does not like to ski but does not like to fall", "alice likes to ski"]
ph = ["bob alice", "alice", "like"]
sent = [Counter(word.lower() for word in sentence.split()) for sentence in sent]
ph = [Counter(word.lower() for word in sentence.split()) for sentence in ph]
indexByWords = defaultdict(list)
for index, counter in enumerate(sent, start = 1):
for word in counter.keys():
indexByWords[word].append(index)
for i, phrase in enumerate(ph, start=1):
print("Phrase{}:".format(i))
best = None
minQty = len(sent) + 1
for word in phrase.keys():
if minQty > len(indexByWords[word]):
minQty = len(indexByWords[word])
best = indexByWords[word]
matched = False
for index in best:
qty = containsQty(sent[index - 1], phrase)
if qty > 0:
matched = True
print((str(index) + ' ') * qty)
if not matched:
print("NONE")
答案 1 :(得分:0)
通过使用Counter
模块中的collections
类,可以使您的逻辑简单得多:
from collections import Counter
def contains(sentence, phrase):
return all(sentence[word] >= phrase[word] for word in phrase)
sent = ["Strings are an array of characters",
"Sentences are an array of words"]
ph = ["an array of", "sentences are strings"]
sent = [Counter(word.lower() for word in sentence.split()) for sentence in sent]
ph = [Counter(word.lower() for word in sentence.split()) for sentence in ph]
for i, phrase in enumerate(ph, start=1):
print("Phrase{}:".format(i))
matches = [j for j, sentence in enumerate(sent, start=1) if contains(sentence, phrase)]
if not matches:
print("NONE")
else:
print(*matches)
这使我们可以一次计算每个句子中每个单词的数目,而不是每个短语一次。
答案 2 :(得分:0)
我正在尝试使用以下代码在O(n ^ 2)中完成它:
import time
millis = int(round(time.time() * 1000))
sent = ["Strings are an array of characters",
"Sentences are an array of words"]
ph = ["an array of","sentences are strings"]
s2 = [c.split() for c in ph]
s1=[d.split() for d in sent]
print(s2)
print(s1)
for i in s2:
z=[]
phcount=set(i)
x = len(i)
for idx1,j in enumerate(s1):
sentcount=set(j)
y = phcount.intersection(sentcount)
if len(y)==x:
z.append(idx1)
if len(z)>0:
print(z)
else:
print("NONE")
millis2 = int(round(time.time() * 1000))
print (millis2-millis)