Question

Objective-c的新手，需要帮助来解决这个问题：

编写一个带有两个参数的函数：

1表示文本文档的字符串
2一个整数，提供要返回的项目数。实现该函数，使其返回按字频排序的字符串列表，这是最常出现的字首先。用你最好的判断来决定单词是如何分开的。您的解决方案应在O（n）时间运行，其中n是文档中的字符数。像生产/商业系统一样实现此功能。您可以使用任何标准数据结构。

到目前为止我尝试过的（正在进行的工作）：`//功能正在进行中

// -(NSString *) wordFrequency:(int)itemsToReturn  inDocument:(NSString *)textDocument ;
//  Get the desktop directory (where the text document is)

NSURL *desktopDirectory = [[NSFileManager defaultManager] URLForDirectory:NSDesktopDirectory inDomain:NSUserDomainMask appropriateForURL:nil create:NO error:nil];

 //  Create full path to the file
 NSURL *fullPath = [desktopDirectory URLByAppendingPathComponent:@"document.txt"];

 //  Load the string
 NSString *content = [NSString stringWithContentsOfURL:fullPath encoding:NSUTF8StringEncoding error:nil];
 //  Optional code for confirmation - Check that the file is here and print its content to the console
 //  NSLog(@" The string is:%@", content);

 // Create an array with the words contain in the string
  NSArray *myWords = [content componentsSeparatedByString:@" "];

 //  Optional code for confirmation - Print content of the array to the console
 //  NSLog(@"array: %@", myWords);
 //  Take an NSCountedSet of objects in an array and order those objects by their object count then returns a sorted array, sorted in descending order by the count of the objects.

  NSCountedSet *countedSet = [[NSCountedSet alloc] initWithArray:myWords];
  NSMutableArray *dictArray = [NSMutableArray array];
  [countedSet enumerateObjectsUsingBlock:^(id obj, BOOL *stop) {
  [dictArray addObject:@{@"word": obj,
                               @"count": @([countedSet countForObject:obj])}];
    }];

  NSLog(@"Words sorted by count: %@", [dictArray sortedArrayUsingDescriptors:@[[NSSortDescriptor sortDescriptorWithKey:@"count" ascending:NO]]]);
 }
return 0;
 }

Answer 1

这是 map-reduce 的经典作业。我对objective-c非常熟悉，但据我所知 - 这些概念很容易在其中实现。

1st map-reduce正在计算出现次数。
这一步基本上是根据单词对元素进行分组，然后对它们进行计数。

map(text):
   for each word in text:
       emit(word,'1')
reduce(word,list<number>):
    emit (word,sum(number))

使用map-reduce的另一种方法是使用迭代计算和哈希映射，这将是一个直方图，用于计算每个单词的出现次数。

在你有一个数字和出现的列表之后，你所要做的就是从它们中获得前k个。这个帖子很好地解释了这一点：Store the largest 5000 numbers from a stream of numbers。
在这里，＆＃39;比较器＆＃39;是每个单词的#occurances，如上一步计算的那样。

基本思想是使用最小堆，并在其中存储k个第一个元素。
现在，迭代剩余的元素，如果新元素大于顶部（堆中的最小元素），则移除顶部并用新元素替换它。

最后，你有一个包含k个最大元素的堆，它们已经在堆中 - 所以它们已经被排序了（虽然顺序相反，但处理它相当容易）。

复杂性为O(nlogK)

要实现O(n + klogk)，您可以使用selection algorithm而不是min-heap解决方案来获取top-k，然后对检索到的元素进行排序。

打印文件中最常用的单词（字符串）Objective-C

1 个答案: