如何查找和删除大文本文档

时间:2016-11-01 03:52:46

标签: c# string indexing line large-files

我试图找出如何从500 000行的大文本文档中删除特定字符串。按内容查找行,但同时获取文本文档顺序中的当前行索引值(不得受到干扰)删除找到行的下一行或上一行,换句话说,按索引查找最近行,以删除大文档。因为我尝试使用File.WriteAllLines程序的任何方法都会挂起这样的大小。我有活跃的请求这个文件,似乎需要找到一些其他方式。例如,文件内容是:

1. line 1
2. line 2
3. line 3
4. line 4
5. line 5

和查找和删除的行是:

string input = "line 3" 

获取此结果,删除找到的行索引和下一行的下一行索引+ 1,如果找到行索引号为奇数:

line 1
line 2
line 5

并且同时能够删除找到的行索引和索引 - 前一行1,如果找到行索引是搜索字符串的偶数:

string input = "line 4" 

,结果应为:

line 1
line 2
line 5

并知道文本文档中是否存在行。

写入同一个文件。

3 个答案:

答案 0 :(得分:1)

如果要处理非常大的文件,则应使用FileStream以避免将所有内容加载到内存中。

为了满足您的上一个要求,您可以逐行阅读这些行。它实际上使您的代码更简单。

var inputFileName = @"D:\test-input.txt";
var outputFileName = Path.GetTempFileName();

var search = "line 4";

using (var strInp = File.Open(inputFileName, FileMode.Open))
using (var strOtp = File.Open(outputFileName, FileMode.Create))
using (var reader = new StreamReader(strInp))
using (var writer = new StreamWriter(strOtp))
{
    while (reader.Peek() >= 0)
    {
        var lineOdd = reader.ReadLine();
        var lineEven = (string)null;
        if (reader.Peek() >= 0)
            lineEven = reader.ReadLine();

        if(lineOdd != search && lineEven != search)
        {
            writer.WriteLine(lineOdd);

            if(lineEven != null)
                writer.WriteLine(lineEven);
        }
    }    
}

// at this point, operation is sucessfull
// rename temp file with original one
File.Delete(inputFileName);
File.Move(outputFileName, inputFileName);

答案 1 :(得分:0)

让输入文件为mongo.connect('mongodb://localhost:27017', function (err, db) { if (err) { console.log("error: " + err); // logs nothing } else { var users = db.collection("users"); var tasks = db.collection("tasks"); app.post("/login", function(req, res) { var emailRegex = /^(([^<>()\[\]\\.,;:\s@"]+(\.[^<>()\[\]\\.,;:\s@"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/; var userInDb; var userEmail = req.body.email; var userPassword = req.body.password; console.log(req.body.email); // logs "johndoe@gmail.com" console.log(req.body.password); // logs "pass" if (!userEmail || !userPassword) { return res.sendStatus(403); } else if ( !emailRegex.test(userEmail)) { return res.sendStatus(403); } else { users.findOne( { "email": userEmail, "password": userPassword }, function(err, results) { console.log(results); // logs "null" if(err) { console.log("error: " + err); // logs nothing res.sendStatus(403); } else { console.log("here"); // logs "here" res.sendStatus(200); } }); } }); } }); ,然后您可以使用inputFile.txt方法获取该特定文件中的所有行。然后使用File.ReadAllLines()方法查找该列表中特定行的索引,如果未找到则表示它将返回IndexOf(),然后使用-1删除该特定索引处的行。考虑一下代码:

RemoveAt()

如果要将其写回文件,请使用以下行:

List<string> linesInFile = File.ReadAllLines(filePath).ToList(); // gives you list of lines
string input = "line 3";
int lineIndex = linesInFile.IndexOf(input);
if (lineIndex != -1)
{
    linesInFile.RemoveAt(lineIndex);
}

// If you may have more number of match for particular line means you can try this as well :

linesInFile.RemoveAll(x=> x== input);

答案 2 :(得分:0)

使用System.IO.StreamReader

private static void RemoveLines(string lineToRemove, bool skipPrevious, bool skipNext)
{
            string previousLine = string.Empty;
            string currentLine;
            bool isNext = false;
            using (StreamWriter sw = File.CreateText(@"output.txt"))
            {
                using (StreamReader sr = File.OpenText(@"input.txt"))
                {

                    while ((currentLine = sr.ReadLine()) != null)
                    {
                        if (isNext)
                        {
                            currentLine = string.Empty;
                            isNext = false;
                        }

                        if (currentLine == lineToRemove)
                        {
                            if (skipPrevious)
                            {
                                previousLine = string.Empty;
                            }

                            if (skipNext)
                            {
                                currentLine = string.Empty;
                                isNext = true;
                            }
                        }

                        if (previousLine != string.Empty && previousLine != lineToRemove)
                        {
                            sw.WriteLine(previousLine);
                        }
                        previousLine = currentLine;
                    }
                }
                if (previousLine != string.Empty && previousLine != lineToRemove)
                {
                    sw.WriteLine(previousLine);
                }
            }
}

尚未测试过,但这会给出必要的指示。