Question

我正在浏览一系列项目创意，我的目标是完成所有这些工作，希望到那时我会在c#做得相当不错。我编写了一个程序来计算给定文件中的单词数量，但是它有效，但是它有一个错误。

工作原理：

文件名作为提示输入，用户输入文件路径或名称。
然后该文件通过正则表达式运行："[a-zA-Z]+"将单词拆分为数组。
然后计算数组的长度。

我遇到的唯一问题是，如果您使用'（撇号），它会将单词拆分为两个单词，例如，如果我从一个文件中读取：this is a test of my program and now I'm going to test it again, to see what happens...它会在输出20时输出19，因为它会将I'm分成两个单词。有没有办法可以让正则表达式来补偿正确的语法使用，或者有没有办法可以在没有regex的情况下做到这一点？

来源：

using System;
using System.IO;
using Reg = System.Text.RegularExpressions.Regex;

namespace count
{
    class CountWordsInString
    {
        static string Count(string list)
        {
            string[] arrStr = Reg.Split(list, "[a-zA-Z]+");
            int length = arrStr.Length - 1;

            return length.ToString();
        }

        static void Main(string[] args)
        {
            Console.Write("Enter file path: ");
            var file = Console.ReadLine();

            var info = File.ReadAllText(file);

            Console.WriteLine(Count(info));
        }
    }
}

Answer 1

你可以这样做的方法是匹配任何不是空格的东西（空格标签等）。这可以通过这样的否定字符类来完成：

[^\s]+

^表示一个字符类，它将匹配除其中的字符之外的任何内容。当然，这假定您对“单词”的定义是在空格上分割的字符串。

尝试here。

Answer 2

在我看来，如果你想数字，你不需要RegEx。 RegEx是一个很大的库，如果你不注意如何使用它，可能会占用大量的资源。

split函数是一个更好的选择，在变量上加载文本并以这种方式应用split方法：

string[] separators = {" ","\r\n", "\n"}; string value = "the string that will be word counted"; string[] words = value.Split(separators, StringSplitOptions.RemoveEmptyEntries); Console.WriteLine(words.Count);

Answer 3

如果你想要＆＃34;单词＆＃34;要包含可选的撇号，可以使用正则表达式

[A-Za-z]+('[A-Za-z]+)*

只要撇号被字母包围，那将匹配包含撇号的单词。因此，它将匹配fo'c's'le（根据Ubuntu字典的单词），但不匹配a''b或'Twas。对于单词计数，初始和最终撇号没有任何区别 - 'Twas无论如何都被视为一个单词 - 但如果你想对单词做一些事情，比如拼写检查，然后，您需要一种更复杂的方法来正确处理'Twas，同时仍然从中提取单词Go：

"Start running when I say 'Go!'," he said.

Answer 4

using System.Text.RegularExpressions; //regex
using System.IO; //File reading

#region //Return the count of words in a file
public int wordamount(string filename) 
{
     return Regex.Matches(File.ReadAllText(filename), @"\w+|\w+\'\w+").Count; //Match all the alphanumeric characters, and or with commas
}
#endregion

从文件中读取并计算文件中的单词

4 个答案: