Question

我正在寻找一种方法来获取没有空格的短语（例如推特上的热门话题），并根据短语中的单词在适当的位置放置空格。据推测，与字典进行某种比较会有效吗？

例如：我有一个功能，它接受了'septemberwish'这个短语（推特上的当前趋势），它将返回'九月愿望'。

Answer 1

这将是棘手的。你很容易得到含糊不清的单词合并，其中几个分词会产生有效的单词。

那就是说，你可以使用拼写检查器。请参阅pspell扩展名。

算法可以在不同的点上分离单词，直到它产生两个有效单词。例如：

septem berwish    (split at floor(length/2); both invalid)
septemb erwish    (split at floor(length/2)+1; both invalid)
septe mberwish    (split at floor(length/2)-1; both invalid)
septembe rwish    (split at floor(length/2)+2; both invalid)
sept emberwish    (split at floor(length/2)-2; first valid, second invalid)
september wish    (split at floor(length/2)+3; both valid; stop)

对于两个以上的词，需要另一种方法。一种可能性是从最后删除字符，直到你有一个有效的单词，然后对没有匹配的休息做同样的事情：

septemberwishtwo (invalid)
septemberwishtw  (invalid)
...
september        (valid; got the first)
wishtwo          (invalid)
...
wish             (valid; got the second)
two              (valid; got the third)

Answer 2

在最天真的实现中，开始从字符串的开头添加字母，并与给定长度的字典进行比较：例如

s // no match
se // no match
sep // no match
...
september // match! add space, add to output
// continue where we left off
w // no match
wi // no match
wis // no match
wish // match! add space, add to output
end of string, terminate

棘手的一点：可能存在可以解析为各种短语的字符串（the-site-that-shall-not-be-named，一个）。实际上，你的例子是一个（虽然一个词不常见）：

sept ember wish vs september wish

我猜你可以先尝试一个较小的常用词典，或者先用最长的词语;或者减少较少使用的常用词语。

根据单词/词典插入空格

2 个答案: