当whitelist.txt VB.NET中存在同一行时,从主机文件中删除行

时间:2019-02-24 10:59:44

标签: .net vb.net

我有两个文本文件主机文件和 whitelist.txt

主机

google.com
facebook.com
x.com
y.com
z.com
youtube.com
duckduckgo.com
stackoverflow.com
w.com

whitelist.txt

w.com
x.com
y.com
z.com

当我单击按钮时,必须从主机文件

中删除 whitelist.txt 中的行

例如: x.com 位于 whitelist.txt 中,因此应将其从主机文件中删除

预期输出: 主机

google.com
facebook.com
youtube.com
duckduckgo.com
stackoverflow.com

主机文件将是一个较大的文件,大小为3-6 MB。

这将是一个大过程。

无需从两个文件中删除相同的行。仅来自主机文件。

修改

Public Sub RemoveLines(file1path As String, file2path As String)
        Dim s1 As String() = IO.File.ReadAllLines(file1path)
        Dim s2 As String() = IO.File.ReadAllLines(file2path)
        Dim l As List(Of String) = New List(Of String) l.AddRange(s1.ToList) l.AddRange(s2.ToList) 
        If s1.ToList = s2.ToList Then
            RemoveLines = s1.text
        End If
End Sub

2 个答案:

答案 0 :(得分:0)

GitHub sample project filter-lines

从以前删除的帖子中,我的做法是:

  1. 将2个文件作为行数组加载到内存中(列表);
  2. 为每个文件创建2个哈希集
  3. 为文件中的每一行(哈希,行)创建哈希条目
  4. 计算每个字符串的MD5哈希,将一半位(64位)写入哈希
  5. 从每个哈希对(哈希,_)中提取密钥
  6. 提取具有过滤条件的行(keys2不包含当前行的哈希)
  7. 将结果写入文件
using System;
using System.Collections.Generic;
using System.IO;
using System.Security.Cryptography;
using System.Text;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using System.Runtime.CompilerServices;
using System.Diagnostics;

namespace filter_lines
{
    class Program
    {
        static async Task Main(string[] args)
        {
            var md5 = MD5.Create();
            var hashes = new HashSet<long>();
            (long, string) Hash(string input, Encoding encoding = null)
            {
                var bytes = (Span<byte>)stackalloc byte[encoding.GetByteCount(input)];
                var destination = (Span<byte>)stackalloc byte[md5.HashSize / 8];
                encoding.GetBytes(input, bytes);
                return md5.TryComputeHash(bytes, destination, out int _bytesWritten)
                    ? (BitConverter.ToInt64(destination.ToArray()), input) : (0, null);
            }
            async Task<IEnumerable<string>> ReadFileAsync(string fileName) =>
                await File.ReadAllLinesAsync(fileName).ConfigureAwait(false);
            var dict1 = await ReadFileAsync(args[0]).ConfigureAwait(false);
            var dict2 = await ReadFileAsync(args[1]).ConfigureAwait(false);
            var hashes1 = dict1.Select(_ => Hash(_, Encoding.UTF8));
            var hashes2 = dict2.Select(_ => Hash(_, Encoding.UTF8));
            var keys1 = new HashSet<long>(hashes1.Select(_ => _.Item1));
            var keys2 = new HashSet<long>(hashes2.Select(_ => _.Item1));
            var stopwatch = new Stopwatch();
            stopwatch.Start();
            File.WriteAllLines(args[0], hashes1
                .Where(_ => !keys2.Contains(_.Item1))
                .Select(_ => _.Item2));
            stopwatch.Watch();
            Console.WriteLine("Hello World!");
        }
    }

    public static class StopwatchExtensions
    {
        public static void Watch(this Stopwatch stopwatch, string message = "",
        [CallerMemberName] string memberName = "",
        [CallerFilePath] string sourceFilePath = "",
        [CallerLineNumber] int sourceLineNumber = 0) =>
        Console.WriteLine(
            $"{stopwatch.Elapsed} " +
            $"{message} " +
            $"{memberName} " +
            $"{sourceFilePath}:{sourceLineNumber}");
    }
}

结果:44毫秒用于过滤测试1000000行的all.txt

44ms - 1000000 lines of all.txt, .NET Core 3.0, C# 8.0

答案 1 :(得分:0)

使用Imports System.IO文件类检索数据。

StringBuilder在文件顶部需要Imports System.TextStringBuilder保存创建时的代码,并且每次字符串更改时都保存新的String。 (字符串是不可变的).ReadAllLines返回文本文件中的行数组。

遍历hosts.txt中的每一行,并检查whitelist.txt中的行数组中的该行.Exists。如果不存在,请将其添加到StringBuilder

最后,将StringBuilder更改为String,然后使用File.WriteAllText方法将所有文本写入hosts.txt。

Private Sub Button3_Click(sender As Object, e As EventArgs) Handles Button3.Click
    Dim linesFromHost = File.ReadAllLines("hosts.txt") 'You will need to add the full path 
    Dim linesFromWhiteList = File.ReadAllLines("whitelist.txt")
    Dim sb As New StringBuilder
    For Each line As String In linesFromHost
        'If the line from linesFromHost is not found in the linesFromWhiteList then add it to the StringBuilder
        If Not Array.Exists(linesFromWhiteList, Function(x) x = line) Then
            sb.AppendLine(line)
        End If
    Next
    File.WriteAllText("hosts.txt", sb.ToString)
    MessageBox.Show("Done")
End Sub

我确信@ArturMustafin提供的代码要快得多,但这可能会让您入门。