我想:
另外,我想:
如果我有一个字符串let yourArray: [JSON] = []
for element in yourArray {
yourUploadFunc(element)
}
:
words
可接受的数组结果如下所示:
Welcome\r\n About\r\n Hello, I'm John Van der Lyn and welcome to our website. We try to tailor our services to your specific needs, provide personal attention and someone to call with answers to your tax and financial questions and issues throughout the year. We believe in establishing long-term relationships with our clients and in providing good ole fashion service.\r\n \r\n\r\n We provide all levels of services for individuals with their tax and financial needs as well as Personal Representatives of Estates, or Trustees or beneficiaries of
更好,更理想的结果如下所示:
["Welcome About Hello", "Welcome About Hello I'm", "About Hello I'm John", "Hello I'm John Van", "I'm John Van der Lyn", etc.]
完美而特殊(虽然复杂得多)的结果如下:
["Welcome About Hello", "I'm John Van der Lyn", "We try to", etc.]
我尝试使用["Welcome", "About", "Hello", "I'm John Van der Lyn", etc.]
,但我无法弄清楚如何根据正则表达式的规则将正则表达式传递给split
字符串。我也无法弄清楚如何将每个元素分成四个单词,而不是一个单词。
答案 0 :(得分:1)
words = str.scan(/([\w\'\-]+)*/).flatten.compact
>> ["Welcome", "About", "Hello", "I'm", "John", "Van", "der", "Lyn", "and", "welcome", "to", "our", "website", "We", "try", "to", "tailor", "our", "services", "to", "your", "specific", "needs", "provide", "personal", "attention", "and", "someone", "to", "call", "with", "answers", "to", "your", "tax", "and", "financial", "questions", "and", "issues", "throughout", "the", "year", "We", "believe", "in", "establishing", "long-term", "relationships", "with", "our", "clients", "and", "in", "providing", "good", "ole", "fashion", "service", "We", "provide", "all", "levels", "of", "services", "for", "individuals", "with", "their", "tax", "and", "financial", "needs", "as", "well", "as", "Personal", "Representatives", "of", "Estates", "or", "Trustees", "or", "beneficiaries", "of"]
words.each_with_index do |word, i|
if word[0].match(/[A-Z]/)
tmp = []
tmp << words[i-2] unless i-2 < 0
tmp << words[i-1] unless i-1 < 0
tmp << word
tmp << words[i+1]
tmp << words[i+2]
word_groups << tmp
end
end
>> [["Welcome", "About", "Hello"], ["Welcome", "About", "Hello", "I'm"], ["Welcome", "About", "Hello", "I'm", "John"], ["About", "Hello", "I'm", "John", "Van"], ["Hello", "I'm", "John", "Van", "der"], ["I'm", "John", "Van", "der", "Lyn"], ["Van", "der", "Lyn", "and", "welcome"], ["our", "website", "We", "try", "to"], ["the", "year", "We", "believe", "in"], ["fashion", "service", "We", "provide", "all"], ["well", "as", "Personal", "Representatives", "of"], ["as", "Personal", "Representatives", "of", "Estates"], ["Representatives", "of", "Estates", "or", "Trustees"], ["Estates", "or", "Trustees", "or", "beneficiaries"]]
word_groups.map { |grp| grp.join(' ') }
>> ["Welcome About Hello", "Welcome About Hello I'm", "Welcome About Hello I'm John", "About Hello I'm John Van", "Hello I'm John Van der", "I'm John Van der Lyn", "Van der Lyn and welcome", "our website We try to", "the year We believe in", "fashion service We provide all", "well as Personal Representatives of", "as Personal Representatives of Estates", "Representatives of Estates or Trustees", "Estates or Trustees or beneficiaries"]
答案 1 :(得分:0)
这可能没有解决方案。
如果您对如何匹配名称有严格的模式,那么它或多或少是可以解决的。
让我们假装我们有一个名字匹配器。在我们的例子中,它将是:名称最多包含4
个单词,其中至少2
个大写(第一个和最后一个),名称不能包含奇怪的符号,如“。”。
matcher = ->(words) do
words.first =~ /\A\p{Lu}/ && # first in capitalized
words.last =~ /\A\p{Lu}/ && # last in capitalized
words.all?(&/\A\p{L}+\z/.method(:=~)) # letters only
end
这里我们使用正确的unicode character matchers。现在我们可以筛选我们的意见:
(2..4).map { |i| input.split(/\s+/).each_cons(i).select(&matcher) }
.reduce(&:|)
以上将返回
#⇒ [["Welcome", "About"], ["John", "Van"],
# ["Personal", "Representatives"], ["Van", "der", "Lyn"],
# ["John", "Van", "der", "Lyn"]]
现在我们可以删除“弱”重复,但我已将此作为作业。