我有两个文本文件主机文件和 whitelist.txt
主机
google.com
facebook.com
x.com
y.com
z.com
youtube.com
duckduckgo.com
stackoverflow.com
w.com
whitelist.txt
w.com
x.com
y.com
z.com
当我单击按钮时,必须从主机文件
中删除 whitelist.txt 中的行例如: x.com 位于 whitelist.txt 中,因此应将其从主机文件中删除
预期输出: 主机
google.com
facebook.com
youtube.com
duckduckgo.com
stackoverflow.com
主机文件将是一个较大的文件,大小为3-6 MB。
这将是一个大过程。
无需从两个文件中删除相同的行。仅来自主机文件。
修改
Public Sub RemoveLines(file1path As String, file2path As String)
Dim s1 As String() = IO.File.ReadAllLines(file1path)
Dim s2 As String() = IO.File.ReadAllLines(file2path)
Dim l As List(Of String) = New List(Of String) l.AddRange(s1.ToList) l.AddRange(s2.ToList)
If s1.ToList = s2.ToList Then
RemoveLines = s1.text
End If
End Sub
答案 0 :(得分:0)
GitHub sample project filter-lines
从以前删除的帖子中,我的做法是:
using System;
using System.Collections.Generic;
using System.IO;
using System.Security.Cryptography;
using System.Text;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using System.Runtime.CompilerServices;
using System.Diagnostics;
namespace filter_lines
{
class Program
{
static async Task Main(string[] args)
{
var md5 = MD5.Create();
var hashes = new HashSet<long>();
(long, string) Hash(string input, Encoding encoding = null)
{
var bytes = (Span<byte>)stackalloc byte[encoding.GetByteCount(input)];
var destination = (Span<byte>)stackalloc byte[md5.HashSize / 8];
encoding.GetBytes(input, bytes);
return md5.TryComputeHash(bytes, destination, out int _bytesWritten)
? (BitConverter.ToInt64(destination.ToArray()), input) : (0, null);
}
async Task<IEnumerable<string>> ReadFileAsync(string fileName) =>
await File.ReadAllLinesAsync(fileName).ConfigureAwait(false);
var dict1 = await ReadFileAsync(args[0]).ConfigureAwait(false);
var dict2 = await ReadFileAsync(args[1]).ConfigureAwait(false);
var hashes1 = dict1.Select(_ => Hash(_, Encoding.UTF8));
var hashes2 = dict2.Select(_ => Hash(_, Encoding.UTF8));
var keys1 = new HashSet<long>(hashes1.Select(_ => _.Item1));
var keys2 = new HashSet<long>(hashes2.Select(_ => _.Item1));
var stopwatch = new Stopwatch();
stopwatch.Start();
File.WriteAllLines(args[0], hashes1
.Where(_ => !keys2.Contains(_.Item1))
.Select(_ => _.Item2));
stopwatch.Watch();
Console.WriteLine("Hello World!");
}
}
public static class StopwatchExtensions
{
public static void Watch(this Stopwatch stopwatch, string message = "",
[CallerMemberName] string memberName = "",
[CallerFilePath] string sourceFilePath = "",
[CallerLineNumber] int sourceLineNumber = 0) =>
Console.WriteLine(
$"{stopwatch.Elapsed} " +
$"{message} " +
$"{memberName} " +
$"{sourceFilePath}:{sourceLineNumber}");
}
}
结果:44毫秒用于过滤测试1000000行的all.txt
答案 1 :(得分:0)
使用Imports System.IO
文件类检索数据。
StringBuilder
在文件顶部需要Imports System.Text
。 StringBuilder
保存创建时的代码,并且每次字符串更改时都保存新的String
。 (字符串是不可变的).ReadAllLines
返回文本文件中的行数组。
遍历hosts.txt中的每一行,并检查whitelist.txt中的行数组中的该行.Exists
。如果不存在,请将其添加到StringBuilder
。
最后,将StringBuilder
更改为String
,然后使用File.WriteAllText
方法将所有文本写入hosts.txt。
Private Sub Button3_Click(sender As Object, e As EventArgs) Handles Button3.Click
Dim linesFromHost = File.ReadAllLines("hosts.txt") 'You will need to add the full path
Dim linesFromWhiteList = File.ReadAllLines("whitelist.txt")
Dim sb As New StringBuilder
For Each line As String In linesFromHost
'If the line from linesFromHost is not found in the linesFromWhiteList then add it to the StringBuilder
If Not Array.Exists(linesFromWhiteList, Function(x) x = line) Then
sb.AppendLine(line)
End If
Next
File.WriteAllText("hosts.txt", sb.ToString)
MessageBox.Show("Done")
End Sub
我确信@ArturMustafin提供的代码要快得多,但这可能会让您入门。