匹配字符串,考虑一些字符是相同的

时间:2013-12-03 07:33:30

标签: objective-c performance search character

请帮我解决这个问题。

我想检查targetString是否与keyword匹配。考虑一些字符可能不同,但仍应返回true。

示例:

targetString = @"@ß<"
keyword = @"abc", @"∂B(", @"@Aß<"
result: all must return true. 

(匹配。targetString和所有keyword都相同。)

考虑我有一个数组,包含可以相同的字符集列表:

NSArray *variants = [NSArray arrayWithObjects:@"aA@∂", @"bBß", @"c©C<(", nil]

因此,在匹配时,使用此规则,它可以匹配上面的示例。

这是我到目前为止所做的(使用递归):

- (BOOL) test:(NSString*)aString include:(NSString*) keyWord doTrim:(BOOL)doTrim {
//    break recursion.
    if([aString length] < [keyWord length]) return false;

// First, loop through each keyword's character
    for (NSUInteger i = 0; i < [keyWord length]; i++) {

// Get @"aA@∂", @"bBß", @"c©C<("  or only the character itself.
// like, if the keyword's character is A, return the string @"aA@∂". 
// If the character is not in the variants set, eg. P, return @"P"

        char c = [keyWord characterAtIndex:i];
        NSString *rs = [self variantsWithChar:c];

//       Check if rs (@"aA@∂" or @"P") contains aString[i] character           
        if([rs rangeOfString:[NSString stringWithCharacters:[aString characterAtIndex:i] length:1]].location == NSNotFound) {
//        If not the same char, remove first char in targetString (aString), recursion to match again.

            return [self test:[aString substringFromIndex:1] include:keyWord doTrim:NO];
        }
    }
 // If all match with keyword, return true.
    return true;
}

- (NSString *) variantsWithChar:(char) c {
    for (NSString *s in self.variants) {
        if ([s rangeOfString:[NSString stringWithFormat:@"%c",c]].location != NSNotFound) {
            return s;
        }
    }
    return [NSString stringWithFormat:@"%c", c];
}

主要问题是,variantsWithChar:没有返回正确的字符串。我不知道我应该在哪个数据类型和哪个函数。请帮忙。

对于认识红宝石的人,以红宝石为例。它工作得非常好!

require 'test/unit/assertions'

include Test::Unit::Assertions

class String
  def matching?(keyword)
    length >= keyword.length && (keyword.chars.zip(chars).all? { |cs| variants(cs[0]).include?(cs[1]) } || slice(1, length - 1).matching?(keyword))
  end

  private

  VARIANTS = ["aA@∂", "bBß", "c©C<("]

  def variants(c)        
      VARIANTS.find { |cs| cs.include?(c) } || c        
  end
end

assert "abc".matching?("@ß<")
PS:事实是,它包含一个听起来相同的日语字符集(如あアいイ ......对于知道日语的人来说

PS 2:请随意编辑这个问题,因为我的英语很糟糕。我可能不会说出我的想法。

PS 3:也许有些人可能会对性能发表评论。比如,搜索大约10,000个目标词,有近100个变体,每个变体最多有4个相同的字符。

3 个答案:

答案 0 :(得分:1)

首先,忽略关于ASCII的注释并停止使用char。 NSString和CFString使用unichar

如果您真正想做的是转置平假名和片假名,您可以使用CFStringTransform() 它包含OS X和iOS中包含的ICU库。 它使它非常简单。 搜索该功能,您将找到如何使用它的示例。

答案 1 :(得分:1)

经过一段时间(一天)处理上面的代码,我终于完成了它。但不知道性能。请有人评论并帮助我提高性能。感谢。

- (BOOL) test:(NSString*)aString include:(NSString*) keyWord doTrim:(BOOL)doTrim {
//    break recursion.
    if([aString length] < [keyWord length]) return false;

// First, loop through each keyword's character
    for (NSUInteger i = 0; i < [keyWord length]; i++) {

// Get @"aA@∂", @"bBß", @"c©C<("  or only the character itself.
// like, if the keyword's character is A, return the string @"aA@∂". 
// If the character is not in the variants set, eg. P, return @"P"

    NSString* c = [NSString stringWithFormat:@"%C", [keyWord characterAtIndex:i]];
    NSString *rs = [self variantsWithChar:c];
    NSString *theTargetChar = [NSString stringWithFormat:@"%C", [aString characterAtIndex:i]];

//       Check if rs (@"aA@∂" or @"P") contains aString[i] character           
        if([rs rangeOfString:theTargetChar].location == NSNotFound) {
//        If not the same char, remove first char in targetString (aString), recursion to match again.
            return [self test:[aString substringFromIndex:1] include:keyWord doTrim:NO];
        }
    }
 // If all match with keyword, return true.
    return true;
}

如果你删除所有评论,那么它会很短......

////////////////////////////////////////

- (NSString *) variantsWithChar:(NSString *) c{
    for (NSString *s in self.variants) {
        if ([s rangeOfString:c].location != NSNotFound) {
            return s;
        }
    }
    return c;
}

答案 2 :(得分:-1)

您可以尝试在变体的每个字符的ascii值中比较日语字符的ascii值。这些日语字符不会像通常的字符或字符串那样对待。因此,rangeOfString等字符串函数不适用于它们。

更准确:看看下面的代码。 它将在字符串“aA @∂”

中搜索“∂”
NSString *string = @"aA@∂";

NSMutableSet *listOfAsciiValuesOfString = [self getListOfAsciiValuesForString:string]; //method definition given below

NSString *charToSearch = @"∂";

NSNumber *ascii = [NSNumber numberWithInt:[charToSearch characterAtIndex:0]];

int countBeforeAdding = [listOfAsciiValuesOfString count],countAfterAdding = 0;

[listOfAsciiValuesOfString addObject:ascii];

countAfterAdding = [listOfAsciiValuesOfString count];


if(countAfterAdding == countBeforeAdding){ //element found
    NSLog(@"element exists"); //return string
}else{
    NSLog(@"Doesnt exists"); //return char
}

===================================

-(NSMutableSet*)getListOfAsciiValuesForString:(NSString*)string{
    NSMutableSet *set = [[NSMutableSet alloc] init];
    for(int i=0;i<[string length];i++){

        NSNumber *ascii = [NSNumber numberWithInt:[string characterAtIndex:i]];

        [set addObject:ascii];
    }

    return set;
}