检查两个句子是否包含python中常见的Palgiarist单词

时间:2019-01-26 13:11:26

标签: python loops

嗨,我正在编写a窃检测程序。

说明

基本上我正在编写一个函数,它以 2个字符串作为输入。该函数应查找两个字符串中是否存在5个或更多连续单词的实例字符串只能是小写字母和空格:不会有标点符号,大写字母。

要求

如果有这样的单词,则返回最长的此类字符串(以单词数而不是字符串的长度表示)。如果没有,则应返回布尔值False。

我的进度

def check_plagiarism(str1,str2):

    list1=str1.split()
    list2=str2.split()

    new1=[]
    new2=[]
    for i in list1:
        if (i in list2):
            new1.append(i)
    for j in list2:
        if (j in list1):
            new2.append(j)
    ans=[]


    for i in range(0,len(new1)-1):

        for j in range(0,len(new2)-1):

            while new1[i]==new2[j]:
                val=new1[i]
                ans.append(val)
                i+=1
                j+=1
                if i==len(new1) or j==len(new2):
                    return False
            if len(ans)>=5:
                value=" ".join(ans)
                return value
            else:
                ans=[]
    value=" ".join(ans)
    return value   

我能够编写此功能。我知道它很长而且效率很低,但是有点用。

输入

我向函数提供了以下输入。

a="i took a walk around the world to ease my troubled mind i left my body lying somewhere in the sands of time i watched the world float to the dark side of the moon i feel there is nothing i can do yeah i watched the world float to the dark side of the moon after all i knew it had to be something to do with you i really dont mind what happens now and then as long as youll be my friend at the end if i go crazy then will you still call me superman if im alive and well will you be there holding my hand ill keep you by my side with my superhuman might kryptonite"
b="i dont care if i go crazy then one two three four five switch crazy go i if care dont i five four three two one and switch"
c="when i was young i took a walk around the woods i found that i was both taller and smaller than the trees returning to my home i set out for the desert i journeyed for long days and nights my spirit left my body and i left my body lying somewhere in the sands of time unburdened by physical form i watched the world float away from me and into the vast moonlight"

print(check_plagiarism(a,b))
print(check_plagiarism(a,c))
print(check_plagiarism(b,c))

收到的输出

if i go crazy then
took a walk around the
False

预期产量

if i go crazy then
i left my body lying somewhere in the sands of time
False

任何帮助将不胜感激

2 个答案:

答案 0 :(得分:1)

您还有其他问题,我不打算讨论,但是要回答一个具体问题,即为什么在存在更好和更长的答案时为什么会得到简短答案,这是因为您使用了“返回”。 / p>

如果代码确实有效,则可以使用蛮力找到最长的匹配答案。我可以通过对内部循环进行一些调整来打印期望值并修复一个主要错误,但是您只需要相信我就可以了。

def check_plagiarism_revised(str1,str2):
    # set up data structures
    best_answer = []

    # logic to find candidates
    for i in range(0,len(new1)-1):
        for j in range(0,len(new2)-1):
            ans = []

            # do stuff

            if len(ans)>=len(best_answer):
                best_answer = list(ans)    

    if len(best_answer) > 5:
        return " ".join(best_answer)
    return ""

想起来就像摘糖果吧。你真的想要黑巧克力,我有一个。我同意给您一根糖果,并且一次给您看我所有的糖果。然后,您可以选择想要的那个。

但是,如果您在我举起士力架时立即阻止我,请将其从我的手中拿出,吃掉它然后跑掉。好吧,你没有得到黑巧克力,对吗?这就是return语句的作用。

我已针对您的特定问题发布了答案,但您应该考虑将所有内容都废弃,然后重新开始(尽管可能有些高级):https://en.wikipedia.org/wiki/Longest_common_substring_problem

答案 1 :(得分:0)

在@Kenny Ostrom的帮助下,我能够删除所有错误并解决了我的代码。

最终代码是:

def check_plagiarism(str1,str2):

new1=str1.split()
new2=str2.split()
# set up data structures
best_answer = []

# logic to find candidates
for i in range(0,len(new1)):
    for j in range(0,len(new2)):
        ans = []
        if new1[i]==new2[j]:
            n=i
            m=j
            while new1[n]==new2[m]:

                ans.append(new1[n])
                if n<len(new1)-1 and m<len(new2)-1:
                    n+=1
                    m+=1        
                else:
                    break

        if len(ans)>=len(best_answer):

            best_answer = list(ans)    

if len(best_answer) >= 5:
    return " ".join(best_answer)
return False

感谢所有帮助