Question

我想写一个方法来计算字母＆＃34; a＆＃34;或＆＃34; A＆＃34;。＆＃34; a＆＃34;可以在字符串的开头跟随空格，也可以在字符串中由空格包围的任何位置。结果应该是2，但我的代码是5，我如何修改代码，以便检测到前后的空间？

library(tm)
data("acq")
data("crude")
m1 <- DocumentTermMatrix(acq)
m2 <- DocumentTermMatrix(crude)
Zipf_plot(m1, col = "red")
par(new=T)
Zipf_plot(m2, col="blue")
Zipf_plot_multi <- function (xx, type = "l", cols = rainbow(length(xx)), ...) {
    stopifnot(is.list(xx) & length(xx)==length(cols))
    for (idx in seq_along(xx)) {
      x <- xx[[idx]]
      if (inherits(x, "TermDocumentMatrix")) 
          x <- t(x)
      y <- log(sort(slam::col_sums(x), decreasing = TRUE))
      x <- log(seq_along(y))
      m <- lm(y ~ x)
      dots <- list(...)
      if (is.null(dots$xlab)) 
          dots$xlab <- "log(rank)"
      if (is.null(dots$ylab)) 
          dots$ylab <- "log(frequency)"
      if (idx==1) {
        do.call(plot, c(list(x, y, type = type, col = cols[idx]), dots))
      } else {
        lines(x, y, col = cols[idx])
      }
      abline(m, col = cols[idx], lty = "dotted")
      print(coef(m))
    }
}
Zipf_plot_multi(list(m1, m2), xlim=c(0, 7), ylim=c(0,6))

Answer 1

我建议使用正则表达式来计算所有匹配项;像这样的东西：

  using System.Text.RegularExpressions;

  ... 

  string t1 = "A book was lost. There is a book on the table. Is that the book?";

  int count = Regex.Matches(t1, @"\bA\b", RegexOptions.IgnoreCase).Count;

如果您坚持for循环，则必须检查空格：

  static int CountArticles(string text)
  {
      int count = 0;

      for (int i = 0; i < text.Length; ++i)
      {
          if (text[i] == 'a' || text[i] == 'A')
          {
             // So we have a or A, now we have to check for spaces:
             if (((i == 0) || char.IsWhiteSpace(text[i - 1])) &&
                 ((i == text.Length - 1) || char.IsWhiteSpace(text[i + 1])))
                ++count;
           }
       }            

       return count;
  }

Answer 2

就个人而言，我是简单DFA状态机的忠实粉丝。感觉很奇怪，所以我会解释为什么......这一切归结为几个原因：

DFA非常快;如果你像我一样进行解析，你很可能会在这段代码中抛出大量数据。表现很重要。
DFA非常容易进行单元测试;您唯一需要做的就是确保测试所有状态和转换。
DFA的代码覆盖率报告非常实用。它并不保证你的设计是正确的，但如果它是，它就会起作用。你肯定会得到更多来自它的信息，而不是正则表达式上的报道。

主要缺点是：

他们需要更多的工作来构建。（*）
你应该用一张纸来思考它们（并将其记录给其他人）。

一旦你明白了，就很容易构建一个DFA。拿一张纸，考虑你的程序的可能状态（绘制圆圈），以及它们之间的过渡（圆圈之间的箭头）。最后，想想什么时候会发生什么。

代码的翻译几乎是1：1。使用开关只是一种实现 - 还有其他方法可以做到这一点。无论如何，没有进一步的中断，这里是：

enum State
{
    SpaceEncountered,
    ArticleEncountered,
    Default
};

static int CountArticles(string text)
{
    int count = 0;
    State state = State.SpaceEncountered; // start of line behaves the same

    for (int i = 0; i < text.Length; ++i)
    {
        switch (state)
        {
            case State.SpaceEncountered:
                if (text[i] == 'a' || text[i] == 'A')
                {
                    state = State.ArticleEncountered;
                }
                else if (!char.IsWhiteSpace(text[i]))
                {
                    state = State.Default;
                }
                break;

            case State.ArticleEncountered:
                if (char.IsWhiteSpace(text[i]))
                {
                    ++count;
                    state = State.SpaceEncountered;
                }
                else
                {
                    state = State.Default;
                }
                break;
            case State.Default: // state 2 = 
                if (char.IsWhiteSpace(text[i]))
                {
                    state = State.SpaceEncountered;
                }
                break;
        }
    }

    // if we're in state ArticleEncountered, the next is EOF and we should count one extra
    if (state == State.ArticleEncountered)
    {
        ++count;
    }
    return count;
}

static void Main(string[] args)
{
    Console.WriteLine(CountArticles("A book was lost. There is a book on the table. Is that the book?"));
    Console.ReadLine();
}

（*）现在，我看到人们在思考，这就是为这么简单的问题提供了很多代码。是的，这是非常正确的，这就是为什么有办法生成DFA的原因。最常见的方法是构造词法分析器或正则表达式。对于这个玩具问题有点多，但也许你真正的问题有点大......

Answer 3

像这样使用String.Split：

int count = text.Split(' ').Count(c => c == "a" || c == "A");

Answer 4

您也可以使用 TextInfo 类将字符串设为标题案例 所以字符串的开头或后跟空格将是

一本书丢了。桌上有一本书。是那本书吗？

现在，您可以使用 CountArticles 功能来计算角色

  namespace Hi
{
    class Program
    {
        static void Main(string[] args)
        {


    string t1 = "A book was lost. There is a book on the table. Is that the book?";

            Console.WriteLine(t1);
            Console.WriteLine(" - Found {0} articles, should be 2.", CountArticles(t1));
            Console.ReadKey();
        }

        static int CountArticles(string text)
        {
            int count = 0;

            // Here you may also try TextInfo
            //Make string as a Title Case
            //the beginning of the string OR followed by space would be now  'A'
            TextInfo textInfo = new CultureInfo("en-US", false).TextInfo;
            text = textInfo.ToTitleCase(text); 


            {
                for (int i = 0; i < text.Length; ++i)
                {
                    if (text[i] == 'A')
                    {
                        ++count;
                    }
                }
                return count;
            }
        }
    }
}

在字符串的开头找到字母，后跟空格

4 个答案: