给定一个没有空格的短语,添加空格来做出正确的句子

时间:2013-01-27 01:11:52

标签: string algorithm dynamic-programming

这就是我的想法,但它是O(n ^ 2):

对于例如:输入是"这是非常",我们需要检查添加当前字符是否使旧的找到的集合更长且更有意义。但是为了看到我们需要备份的地方,我们必须一直遍历到开头。对于前:"敬畏"和#34;一些"做出正确的话但是很棒#34;做出更大的话。请建议我们如何提高复杂性。这是代码:

void update(string in)
{
   int len= in.length();
   int DS[len];
   string word;
   for(int i=0; i<len; i++) DS[i]=0;

   for(int i=0; i<len; i++)
        for(int j=i+1; j<=len; j++)
        {
            word = in.substr(i,j-i);
            if(dict.find(word)!=dict.end())
                   DS[j-1] = (DS[j-1] > word.length()) ? DS[j-1] : word.length();   
         }
}

2 个答案:

答案 0 :(得分:3)

有一个动态编程解决方案,起初看起来像是O(n ^ 2),但对于足够大的n和固定大小的字典,结果只是O(n)。

从左到右练习字符串。在第i阶段,您需要确定是否有第一个i字符的解决方案。要解决这个问题,请考虑一切可能的方法将这两个字符分成两个块。如果第二个块是一个单词而第一个块可以分解为单词,那么就有一个解决方案。您可以使用字典检查的第一个要求。您可以通过查看是否找到前j个字符的答案来检查第二个要求,其中j是第一个字符串的长度。

这将是O(n ^ 2),因为对于1,2,3,... n长度中的每一个,您考虑每个可能的分裂。但是,如果您知道字典中最长的单词是什么,那么您就知道没有必要考虑使第二个块长于此的分割。因此,对于1,2,3 ... n长度中的每一个,您最多考虑可能的拆分,其中w是字典中最长的单词,成本为O(n)。

答案 1 :(得分:2)

我今天编写了我的解决方案,明天将把它放在网站上。无论如何,方法如下:

  1. 在字典中排列字典。

    trie可以帮助快速完成多个匹配,因为所有以相同字母开头的词典单词可以同时匹配。

    (例如“主席”与特里的“主席”和“主席”相匹配。)

  2. 使用Dijkstra算法找到最佳匹配。

    (例如,对于“主席”,如果我们将“c”计为位置0,则我们具有关系0-> 5,0-> 8,1-> 5,2-> 5,5 - &gt; 8.这些关系构成了Dijkstra算法的完美网络。)

    (注意:边缘的权重在哪里?请参阅下一点。)

  3. 为字典字词指定权重。

    如果没有加重不好的比赛,那么就会比较好的比赛。 (例如“iamahero”成为“我的英雄”而不是“我是英雄”。)

    http://app.aspell.net/create的SCOWL词典很好地服务于目的,因为它有不同大小的字典。这些尺寸(10,20等)是称量的好选择。

    经过一些尝试,我发现需要减少以“s”结尾的单词的重量,所以“eyesandme”变成“眼睛和我”,而不是“眼睛和我”。

  4. 我能够以毫秒为单位拆分一段。该算法在要分割的字符串长度上具有线性复杂度,因此只要内存足够,算法就可以很好地扩展。

    这是转储(抱歉吹牛)。 (选择的段落是维基百科中的“小说”。)

    D:\GoogleDrive\programs\WordBreaker>"word breaker"<novelnospace.txt>output.txt
    
    D:\GoogleDrive\programs\WordBreaker>type output.txt
    Number of words after reading words-10.txt : 4101
    Number of words after reading words-20.txt : 11329
    Number of words after reading words-35.txt : 43292
    Number of words after reading words-40.txt : 49406
    Number of words after reading words-50.txt : 87966
    
    Time elapsed in reading dictionary: 0.956782s
    
    Enter the string to be broken into words:
    
    Result:
    a novel is along narrative normally in prose which describes fictional character
    s and events usually in the form of a sequential story while i an watt in the ri
    se of the novel 1957 suggests that the novel came into being in the early 18 th
    century the genre has also been described as possessing a continuous and compreh
    ensive history of about two thousand years with historical roots in classical gr
    eece and rome medieval early modern romance and in the tradition of the novel la
    the latter an italian word used to describe short stories supplied the present g
    eneric english term in the 18 th century miguel de cervantes author of don quixo
    te is frequently cited as the first significant europe an novelist of the modern
     era the first part of don quixote was published in 1605 while a more precise de
    finition of the genre is difficult the main elements that critics discuss are ho
    w the narrative and especially the plot is constructed the themes settings and c
    haracterization how language is used and the way that plot character and setting
     relate to reality the romance is a related long prose narrative w alter scott d
    efined it as a fictitious narrative in prose or verse the interest of which turn
    s upon marvellous and uncommon incidents whereas in the novel the events are acc
    ommodated to the ordinary train of human events and the modern state of society
    however many romances including the historical romances of scott emily brontes w
    u the ring heights and her man melvilles mo by dick are also frequently called n
    ovels and scott describes romance as a kind red term romance as defined here sho
    uld not be confused with the genre fiction love romance or romance novel other e
    urope an languages do not distinguish between romance and novel a novel isle rom
     and err o ma nil roman z o
    
    Time elapsed in splitting: 0.00495095s
    
    D:\GoogleDrive\programs\WordBreaker>type novelnospace.txt
    Anovelisalongnarrativenormallyinprosewhichdescribesfictionalcharactersandeventsu
    suallyintheformofasequentialstoryWhileIanWattinTheRiseoftheNovel1957suggeststhat
    thenovelcameintobeingintheearly18thcenturythegenrehasalsobeendescribedaspossessi
    ngacontinuousandcomprehensivehistoryofabouttwothousandyearswithhistoricalrootsin
    ClassicalGreeceandRomemedievalearlymodernromanceandinthetraditionofthenovellaThe
    latteranItalianwordusedtodescribeshortstoriessuppliedthepresentgenericEnglishter
    minthe18thcenturyMigueldeCervantesauthorofDonQuixoteisfrequentlycitedasthefirsts
    ignificantEuropeannovelistofthemodernerathefirstpartofDonQuixotewaspublishedin16
    05Whileamoreprecisedefinitionofthegenreisdifficultthemainelementsthatcriticsdisc
    ussarehowthenarrativeandespeciallytheplotisconstructedthethemessettingsandcharac
    terizationhowlanguageisusedandthewaythatplotcharacterandsettingrelatetorealityTh
    eromanceisarelatedlongprosenarrativeWalterScottdefineditasafictitiousnarrativein
    proseorversetheinterestofwhichturnsuponmarvellousanduncommonincidentswhereasinth
    enoveltheeventsareaccommodatedtotheordinarytrainofhumaneventsandthemodernstateof
    societyHowevermanyromancesincludingthehistoricalromancesofScottEmilyBrontesWuthe
    ringHeightsandHermanMelvillesMobyDickarealsofrequentlycallednovelsandScottdescri
    besromanceasakindredtermRomanceasdefinedhereshouldnotbeconfusedwiththegenreficti
    onloveromanceorromancenovelOtherEuropeanlanguagesdonotdistinguishbetweenromancea
    ndnovelanovelisleromanderRomanilromanzo
    D:\GoogleDrive\programs\WordBreaker>