假设我们有这样的字符串:
#imports os modules
import os
#checks to see if library folder exists. if not it makes one. either way it moves the working directory to it
if os.path.exists('library') != True:
os.mkdir("library")
os.chdir("library")
print(os.getcwd())
else:
os.chdir("library")
print(os.getcwd())
#intilizes infinite loop
while True:
#gets operation from user
op = input('What would you like to do? (insert new book, retrive book data, edit book data, list all books, delete a book): ')
#if you want to make a new book entry a file is created with information gathered from the user
if op == 'insert new book':
title = input('what is the title of the book?: ')
author = input('what is the author of the book?: ')
isbn = input('what is the ISBN?: ')
nb = open(title + ".txt", "w")
nb.write(title )
nb.write(author + ' end, ')
nb.write(isbn + ' end, ')
nb.close()
title = ''
author = ''
isbn = ''
#if you want to display the book data it displays the file content
elif op == 'retrive book data':
title = input('what is the title of the book?: ')
cb = open(title + ".txt", "r")
print(cb.read())
cb.close()
title = ""
#deletes the book entry
elif op == "delete a book":
title = input('what is the title of the book?: ')
os.remove(title)
title = ""
#here is where i need
elif op == 'edit book data':
我想将以上3个字符串的常用词提取为:
Tommy is a very good child
Tommy has a very wonderful child
Tommy loves his very child
我该怎么做?感谢。
答案 0 :(得分:2)
为了简单起见,我在这里使用lodash
:
var a = 'Hello world'.split(' ');
var b = 'Hello again world!'.split(' ');
var c = 'Hello tomorrow'.split(' ');
var commonWords = _.union(a, b, c);
// => ['Hello']
我之所以使用lodash
只是因为它提供了一种succint方法,实际上是你尝试做的,这是一个联合,基于(例如)分隔符和变换。
联合与语言无关:用于实现联合的算法会根据您选择的语言而有所不同。
你可以在函数中使用它来定义分隔符(例如,我是否在空格中分开?)和变换(例如,单词必须是大写才能匹配?)
答案 1 :(得分:2)
您可以使用名为inverted index
首先,为每个输入字符串分配一个唯一的整数。然后,我们的想法是,对于输入字符串中的每个单词,您需要计算一个整数列表,表示出现该单词的字符串。请注意,只需处理所有输入字符串即可轻松完成。在您的情况下,为了在所有字符串中出现单词,您可以输出出现列表与输入中字符串数相同的条目数的单词。
有关详细信息,请参阅此处:
答案 2 :(得分:1)
编辑我刚刚意识到@ Joce的评论,我把答案放在JavaScript中。但它可以很容易地适应其他语言。如果它不是JavaScript,请将其视为伪代码。
编辑2 哇!我第一次尝试时效果很好!请参阅JSFiddle.net上的工作示例。
这可能是一个非常庞大的脚本答案,但这里有:
将原始句子称为字符串数组:
var sentences = [
"Tommy is a very good child",
"Tommy has a very wonderful child",
"Tommy loves his very child"
];
您可以尝试从每个数组创建一个单词数组,并将其存储在多维数组中。
var split = [];
for(var i = 0; i < sentences.length; i++) {
split[i] = sentences[i].split(" ");
}
你也可以在这里删除单词重复,但我不知道如何在现场,但你可能会得到一些简单的算法来做到这一点。当然,除非你允许重复的单词短语。
然后,您可以创建另一个包含相同单词的数组,并按如下方式填充:
var same = [];
for(var i = 0; i < split.length; i++) { // loop through sentences
for(var j = 0; j < split[i].length; j++) { // go through each sentence for new words
if(same.indexOf(split[i][j]) <= -1) { // if not already found
var inAll = true;
for(var k = 0; k < split.length; k++) { // check if in every sentence
if(k == i) continue;
if(split[k].indexOf(split[i][j]) <= -1) inAll = false; // if not found, make `inAll` false
}
if(inAll) same.push(split[i][j]); // if found in all other sentences, add to array `same`
}
}
}
对不起,这是一个令人费解的答案,但它应该显示算法背后的逻辑。如果你愿意,试着改变JSFiddle上的字符串。