Question

我知道as3具有一些强大的新文本搜索功能，特别是与正则表达式结合使用时。

我甚至不知道这是否可行，但我想以某种方式搜索任何文本块，并返回所有名词，形容词和动词。

这样做最好（最有效）的方法是什么？正则表达式是一种选择吗？或者我是否需要在拼写检查中使用某种类型的开源词典9来与之比较？

之后，我已经提取了所有的名词，形容词和动词，我需要根据频率计算和优先排序。

欢迎任何建议......

Answer 1

没有正则表达式具有语法句法或词性的任何概念。正则表达式只是一种搜索字符串模式的方法。

要做你想做的事，你需要插入“某种开源词典”。所涉及的工作量可能很大。

Answer 2

我遇到了这个开源的完整搜索引擎

http://www.servebox.org/actionscript-foundry/actionscript-foundry-documentation/full-text-search-tree/

步骤顺序就像我看到的那样

1）创建或获取所有英语名词，动词，形容词的列表（获取或创建此列表的任何提示非常感谢！）

2）搜索数据源以查看第一个字典词

是否存在匹配

3）如果存在匹配，则构建一个包含出现次数的索引。

4）转到字典中的第二个单词并重复步骤2和3。

5）重复，直到字典中的每个单词都被用来搜索。

Answer 3

所以@Robusto是正确的，你需要某种有词的词典数据并将它们作为名词，动词或形容词联系起来。但是，如果您可以找到它或自己构建它（可能需要一段时间），您可以使用AS3中的Dictionary对象来构建结果数组：

//dummy data
var testString:String = "Mary had a little lamb her fleece was white as snow";
var testString2:String = "The blue zebra had a rad jacket";

var nouns:Array = ['cup', 'Mary', 'phone', 'lamb', 'jacket', 'fleece', 'snow', 'zebra'];
var verbs:Array = ['had', 'was', 'ran', 'jumped', 'read'];
var adj:Array =   ['awesome', 'rad', 'little', 'tall', 'white', 'blue', 'red'];

//SETUP
//Create the dictionaries, in a more complex setting you might load data in from an XML file
//here I'm just pulling the data from the arrays created above
var nounDict:Dictionary = createDictionary( nouns );
var verbDict:Dictionary = createDictionary( verbs );
var adjDict:Dictionary =  createDictionary( adj );

//Creates a dictionary based on an Array of words
function createDictionary( wordData:Array ):Dictionary {
    var dict:Dictionary = new Dictionary( true );

    for(var i:uint = 0; i < wordData.length; i++) {

        //add the word as a key to the dictionary
        dict[ wordData[i] ] = wordData[i];

    }

    return dict;
}


//SEARCHING
//str is the string you want to search through
//dict is the dictionary you want to use to search against the string
function searchDictionary( str:String, dict:Dictionary ):Array {

    //break up the words by the spaces (you can figure out how to deal with punctuation)
    var words:Array = str.split(' ');
    //store the matching words in the matches array
    var matches:Array = [];

    for( var i:uint = 0; i < words.length; i++) {


        //check the dictionary for the word
        if(dict[ words[i] ]) {
            matches.push(words[i]);
        }

    }
    return matches;

}


//TEST IT OUT
trace( searchDictionary( testString, nounDict ) );
trace( searchDictionary( testString, verbDict ) );
trace( searchDictionary( testString, adjDict )  );

trace( searchDictionary( testString2, nounDict ) );
trace( searchDictionary( testString2, verbDict ) );
trace( searchDictionary( testString2, adjDict ) );

您可以将此代码弹出到新的FLA文件中，看看它是如何工作的。

Answer 4

感谢您的建议！

我正在考虑的另一种方法是先从源集合中删除所有代词，介词，然后将所有其余单词编入索引。

应该遗留的是所有名词，动词，副词的索引列表。

我认为所有代词，介词（和连词？）的总清单比所有名词，动词，副词的总清单要小得多，所以对于任何给定的集合，这种消除类型搜索应该快得多...... / p>

动作中的高级文本搜索 - 返回所有名词，形容词和动词

4 个答案: