Question

我厌倦了学习，我决定尝试使用我的C知识并制作一个程序来抓取一条随机的推文，我已将其保存在文件中并显示给我。

文本文件的组织方式如下：

@username
§
tweet1
§
tweet2
§
@username2

这个想法是当我运行该程序时，它会抓取一个随机用户，然后是随机推文。

我认为随机化用户的唯一方法是：

浏览所有文本文件，每次看到用户名时都会保存该行并增加一个计数器。然后我随机选择选择器并获取用户名。
避免浏览所有文本文件。并将每个用户分成单独的文本文件。只需获取某个文件夹中的文件名，然后从那里随机化（如果可能的话）。

然后同样的问题出现了，如何随机化一条推文，我知道它何时开始和结束，但是选择一个随机的推文，我能想到的唯一方法就是上面提到的第一个。

你们是否建议采用更聪明的方式？

非常感谢！

Answer 1

以下是我编写的一些代码中的注释，其中包含有用的信息：

/*
** From Wikipedia on Reservoir Sampling
** https://en.wikipedia.org/wiki/Reservoir_sampling
**
** Algorithm R
** The most common example was labelled Algorithm R by Jeffrey Vitter in
** his paper on the subject.  This simple O(n) algorithm as described in
** the Dictionary of Algorithms and Data Structures consists of the
** following steps (assuming k < n and using one-based array indexing):
**
**    // S has items to sample, R will contain the result
**    ReservoirSample(S[1..n], R[1..k])
**        // fill the reservoir array
**        for i = 1 to k
**            R[i] := S[i]
**
**        // replace elements with gradually decreasing probability
**        for i = k+1 to n
**            j := random(1, i)   // important: inclusive range
**            if j <= k
**                R[j] := S[i]
**
** Alternatively: https://stackoverflow.com/questions/232237
** What's the best way to return one random line in a text file
**
**      count = 0;
**      while (fgets(line, length, stream) != NULL)
**      {
**          count++;
**          // if ((rand() * count) / RAND_MAX == 0)
**          if ((rand() / (float)RAND_MAX) <= (1.0 / count))
**              strcpy(keptline, line);
**      }
**
** From Perl perlfaq5:
** Here's a reservoir-sampling algorithm from the Camel Book:
**
**      srand;
**      rand($.) < 1 && ($line = $_) while <>;
**
** This has a significant advantage in space over reading the whole file
** in.  You can find a proof of this method in The Art of Computer
** Programming, Volume 2, Section 3.4.2, by Donald E. Knuth.
*/

您需要就您的案例中随机选择的构成做出一些决定。

如果您的文件中有12个推文，（为了便于讨论）每个推文有1到12个推文，那么您是否想要选择每个高音扬声器的概率为1/12，然后每个高音扬声器选择其中一个他们的推文是随机的（来自属于那个推特的集合），或者你还有其他一些方案 - 例如，如果有66条推文，那么选择给定推文的概率为1/66，但是高音扬声器发帖最多的人比只发了一次推文的人更容易出现。

一旦您决定要遵循哪些规则，基于上述信息的编码就相当简单。

C：从文件中获取以特定字符开头的随机字符串

1 个答案: