Question

我正在学习LINQ，我想用LINQ逐字逐句阅读文本文件（比如说一本电子书）。

这是我能想到的：

static void Main()
        {
            string[] content = File.ReadAllLines("text.txt");

            var query = (from c in content
                         select content);

            foreach (var line in content)
            {
                Console.Write(line+"\n");
            }

        }

这会逐行读取文件。如果我将ReadAllLines更改为ReadAllText，则会逐字逐句阅读该文件。

有什么想法吗？

Answer 1

string[] content = File.ReadAllLines("text.txt");
var words=content.SelectMany(line=>line.Split(' ', StringSplitOptions.RemoveEmptyEntries));
foreach(string word in words)
{
}

您需要添加所需的任何空白字符。使用StringSplitOptions来处理连续的空格比我最初使用的Where子句更清晰。

在.net 4中，您可以使用File.ReadLines进行延迟评估，从而在处理大型文件时降低RAM使用率。

Answer 2

string str = File.ReadAllText();
char[] separators = { '\n', ',', '.', ' ', '"', ' ' };    // add your own
var words = str.Split(separators, StringSplitOptions.RemoveEmptyEntries);

Answer 3

string content = File.ReadAllText("Text.txt");

var words = from word in content.Split(WhiteSpace, StringSplitOptions.RemoveEmptyEntries) 

select word;

您需要使用自己的值来定义空白字符数组，如下所示：

List<char> WhiteSpace = { Environment.NewLine, ' ' , '\t'};

此代码假定panctuation是单词的一部分（如逗号）。

Answer 4

最好使用ReadAllText（）读取所有文本，然后使用正则表达式来获取单词。使用空格字符作为分隔符可能会导致一些麻烦，因为它还会检索标点符号（逗号，点等等）。例如：

Regex re = new Regex("[a-zA-Z0-9_-]+", RegexOptions.Compiled); // You'll need to change the RE to fit your needs
Match m = re.Match(text);
while (m.Success)
{
    string word = m.Groups[1].Value;

    // do your processing here

    m = m.NextMatch();
}

Answer 5

以下使用迭代器块，因此使用延迟加载。其他解决方案允许您将整个文件加载到内存中，然后才能迭代单词。

static IEnumerable<string> GetWords(string path){  

    foreach (var line in File.ReadLines(path)){
        foreach (var word in line.Split(null)){
            yield return word;
        }
    }
}

（Split(null) automatically removes whitespace）

像这样使用：

foreach (var word in GetWords(@"text.txt")){
    Console.WriteLine(word);
}

也适用于标准的Linq funness：

GetWords(@"text.txt").Take(25);
GetWords(@"text.txt").Where(w => w.Length > 3)

当然为了学习而遗漏了错误处理等。

Answer 6

你可以写content.ToList().ForEach(p => p.Split(' ').ToList().ForEach(Console.WriteLine))但这不是很多linq。

使用LINQ逐字读取文本文件

6 个答案: