我想写一个方法来计算字母" a"或" A"。 " a"可以在字符串的开头跟随空格,也可以在字符串中由空格包围的任何位置。 结果应该是2,但我的代码是5,我如何修改代码,以便检测到前后的空间?
library(tm)
data("acq")
data("crude")
m1 <- DocumentTermMatrix(acq)
m2 <- DocumentTermMatrix(crude)
Zipf_plot(m1, col = "red")
par(new=T)
Zipf_plot(m2, col="blue")
Zipf_plot_multi <- function (xx, type = "l", cols = rainbow(length(xx)), ...) {
stopifnot(is.list(xx) & length(xx)==length(cols))
for (idx in seq_along(xx)) {
x <- xx[[idx]]
if (inherits(x, "TermDocumentMatrix"))
x <- t(x)
y <- log(sort(slam::col_sums(x), decreasing = TRUE))
x <- log(seq_along(y))
m <- lm(y ~ x)
dots <- list(...)
if (is.null(dots$xlab))
dots$xlab <- "log(rank)"
if (is.null(dots$ylab))
dots$ylab <- "log(frequency)"
if (idx==1) {
do.call(plot, c(list(x, y, type = type, col = cols[idx]), dots))
} else {
lines(x, y, col = cols[idx])
}
abline(m, col = cols[idx], lty = "dotted")
print(coef(m))
}
}
Zipf_plot_multi(list(m1, m2), xlim=c(0, 7), ylim=c(0,6))
答案 0 :(得分:5)
我建议使用正则表达式来计算所有匹配项;像这样的东西:
using System.Text.RegularExpressions;
...
string t1 = "A book was lost. There is a book on the table. Is that the book?";
int count = Regex.Matches(t1, @"\bA\b", RegexOptions.IgnoreCase).Count;
如果您坚持for
循环,则必须检查空格:
static int CountArticles(string text)
{
int count = 0;
for (int i = 0; i < text.Length; ++i)
{
if (text[i] == 'a' || text[i] == 'A')
{
// So we have a or A, now we have to check for spaces:
if (((i == 0) || char.IsWhiteSpace(text[i - 1])) &&
((i == text.Length - 1) || char.IsWhiteSpace(text[i + 1])))
++count;
}
}
return count;
}
答案 1 :(得分:1)
就个人而言,我是简单DFA状态机的忠实粉丝。感觉很奇怪,所以我会解释为什么......这一切归结为几个原因:
主要缺点是:
一旦你明白了,就很容易构建一个DFA。拿一张纸,考虑你的程序的可能状态(绘制圆圈),以及它们之间的过渡(圆圈之间的箭头)。最后,想想什么时候会发生什么。
代码的翻译几乎是1:1。使用开关只是一种实现 - 还有其他方法可以做到这一点。无论如何,没有进一步的中断,这里是:
enum State
{
SpaceEncountered,
ArticleEncountered,
Default
};
static int CountArticles(string text)
{
int count = 0;
State state = State.SpaceEncountered; // start of line behaves the same
for (int i = 0; i < text.Length; ++i)
{
switch (state)
{
case State.SpaceEncountered:
if (text[i] == 'a' || text[i] == 'A')
{
state = State.ArticleEncountered;
}
else if (!char.IsWhiteSpace(text[i]))
{
state = State.Default;
}
break;
case State.ArticleEncountered:
if (char.IsWhiteSpace(text[i]))
{
++count;
state = State.SpaceEncountered;
}
else
{
state = State.Default;
}
break;
case State.Default: // state 2 =
if (char.IsWhiteSpace(text[i]))
{
state = State.SpaceEncountered;
}
break;
}
}
// if we're in state ArticleEncountered, the next is EOF and we should count one extra
if (state == State.ArticleEncountered)
{
++count;
}
return count;
}
static void Main(string[] args)
{
Console.WriteLine(CountArticles("A book was lost. There is a book on the table. Is that the book?"));
Console.ReadLine();
}
(*)现在,我看到人们在思考,这就是为这么简单的问题提供了很多代码。是的,这是非常正确的,这就是为什么有办法生成DFA的原因。最常见的方法是构造词法分析器或正则表达式。对于这个玩具问题有点多,但也许你真正的问题有点大......
答案 2 :(得分:0)
像这样使用String.Split:
int count = text.Split(' ').Count(c => c == "a" || c == "A");
答案 3 :(得分:0)
您也可以使用 TextInfo 类将字符串设为标题案例 所以字符串的开头或后跟空格将是
一本书丢了。桌上有一本书。是那本书吗?
现在,您可以使用 CountArticles 功能来计算角色
namespace Hi
{
class Program
{
static void Main(string[] args)
{
string t1 = "A book was lost. There is a book on the table. Is that the book?";
Console.WriteLine(t1);
Console.WriteLine(" - Found {0} articles, should be 2.", CountArticles(t1));
Console.ReadKey();
}
static int CountArticles(string text)
{
int count = 0;
// Here you may also try TextInfo
//Make string as a Title Case
//the beginning of the string OR followed by space would be now 'A'
TextInfo textInfo = new CultureInfo("en-US", false).TextInfo;
text = textInfo.ToTitleCase(text);
{
for (int i = 0; i < text.Length; ++i)
{
if (text[i] == 'A')
{
++count;
}
}
return count;
}
}
}
}