如果这个例子我首先遍历文件目录以获取尚未重命名的所有文件,我这样做是通过识别当前年份的文件(因为它是用时间戳生成的)一旦完成,它将这些文件提取到一个临时目录,从那里我想读取文件中的索引页面并将其写入文本文件,以便我可以根据索引的信息重命名以前提取的zip。我遇到了麻烦,因为我无法提出足够准确的逻辑来隔离我想要提取的HTML部分。我所知道的是 1.信息位于第二对标签之间,标签后面的前三个单词是“人员记录”。任何帮助隔离和写入此标签都将非常感激。
-Method我已经尝试过完全剥离所有html但是我发现它是不一致和繁琐的
Option Explicit On
Option Strict Off
Imports System
Imports System.Text
Imports System.IO
Imports System.Xml
Imports System.Diagnostics
Imports System.IO.Compression
模块Module1
Sub Main()
Dim year As String
year = Date.Today.Year
Dim sc As New Shell32.Shell()
'Dim EFMD As String() = Directory.GetFiles("C:\Users\Pepper\Desktop\In")
Dim di As DirectoryInfo = New DirectoryInfo("C:\Users\Pepper\Desktop\In")
For Each fi In di.GetFiles("*" + year + "*")
'Dim startPath As String = "c:\example\start"
Dim zipPath As String = "C:\Users\Pepper\Desktop\In\" + fi.ToString
Dim extractPath As String = "C:\Users\Pepper\Desktop\Out\" + fi.ToString + "\"
ZipFile.ExtractToDirectory(zipPath, extractPath)
Console.WriteLine(fi)
Next
Console.WriteLine()
Dim di_t As DirectoryInfo = New DirectoryInfo("C:\Users\Pepper\Desktop\Out")
For Each fi In di_t.GetFiles("*" + year + "*")
Dim g As String = "C:\Users\Pepper\Desktop\OUT\" + fi.ToString + "\INDEX.HTM"
Dim h As String = "C:\Users\Pepper\Desktop\OUT\" + fi.ToString + "\INDEX" + fi.ToString + ".TXT"
Dim sw As StreamWriter
Dim sr As StreamReader = New StreamReader(g)
sw = New StreamWriter(h)
Dim line As String
Do
line = sr.ReadLine
sw.WriteLine(line)
Loop Until line.Trim = 'need logic here
sw.WriteLine(line)
sr.Close()
Next
End Sub
结束模块 '解压缩确认人员身份的文件,一旦验证了名称和人员ID的重命名源文件。 '返回并删除提取的文件,将文件放在临时文件夹中 '读取/写入文件,直到我们获得人名,因此我们可以使用它来重命名
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:xsi="http://www.w3.org/2001/XMLSchema- instance" xmlns:voc="urn:hl7-org:v3/voc" xmlns:n2="urn:hl7-org:v3/meta/voc" xmlns:n1="urn:hl7-org:v3">
<head>
<title>Person Identification Sheet</title>
<style type="text/css">
body
{
font-family: Arial, Helvetica, sans-serif;
font-size: 12px;
color: black;
}
td
{
font-size: 12px;
}
h2
{
font-size: 14px;
}
.dHeader
{
background-color: #FFFFFF;
}
.dFooter
{
background-color: #e4e7f1;
}
.dSectionTitle
{
font-weight: bold;
font-size: 14px;
}
.dTable
{
border: 0px #ffffff solid;
border-collapse: collapse;
}
.dTableHeading
{
font-weight: bold;
font-size: 12px;
text-align: left;
}
.dTableHeadingCell
{
padding-right: 20px;
}
.dTableRow0
{
background-color: #f6f6f6;
}
.dTableRow1
{
}
.dTableCell
{
padding-right: 20px;
}
.pHeader
{
background-color: #2f3c6e;
}
.pHeaderLabel
{
font-weight: normal;
color: white;
}
.pHeaderValue
{
font-weight: bold;
color: white;
}
.pHeaderName
{
font-size: 22px;
font-weight: bold;
color: white;
}
</style>
</head>
<body>
<h2> Person record for Jon D. Doe ( PID: 2813308004 ) </h2>
<b> Gender: </b>Male<b> DOB: </b>June 10, 2011<br />