使用正则表达式提取数据

时间:2016-11-07 13:02:17

标签: regex excel powershell

嗨,我有一个格式为

的文件
 [stuff not needed]Type:A1[stuff not needed]
 [stuff not needed]Name:B1[stuff not needed]
 Row:Sampletext
 Row:Sampletext
 [stuff not needed]Type:A2[stuff not needed]
 [stuff not needed]Name:B2[stuff not needed]
 Row:Sampletext2
 Row:Sampletext2
 Row:Sampletext2

我在中使用来提取数据。

我正在使用类似Regex1|Regex2|Regex3的内容,并将输出保存到文件中。

输出格式为:

A1
B1
Sampletext
Sampletext
A2
B2
Sampletext2
Sampletext2
Sampletext2

我希望格式为

A1 B1 Sampletext
A1 B1 Sampletext
A2 B2 Sampletext2
A2 B2 Sampletext2
A2 B2 Sampletext2

我是PowerShell的新手,有什么方法可以做到这一点吗?

这是我的确切代码:

$input_path = ‘idx.txt’
$output_file = ‘output.txt’
$regex = ‘Type:\s([A-Za-z]*)|Name:\s\s([A-Za-z]*)|[A-Za-z][a-z0-9A-Z_]*(?:\s*[0-6]\s*[0-4]\s\s[\s\d]\d\s*0)’
select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value } > $output_file

数据太大,无法在此发布,但生病只是创建一个示例数据集。但是正则表达式正在运行,可能是粗略的,但它捕获所需的数据。 为了这个例子,我们可以有类型:([A-Za-z] )|名称:([A-Za-z] )|行:([A-Za-z] ] *)作为正则表达式

1 个答案:

答案 0 :(得分:1)

检查每一行是否有 public static void AesDecrypt(byte[] keyBytes, byte[] ivBytes, Stream dataStream, FileStream outStream) { RijndaelManaged symmetricKey = new RijndaelManaged(); symmetricKey.Mode = CipherMode.CBC; symmetricKey.Padding = PaddingMode.PKCS7; symmetricKey.BlockSize = 128; symmetricKey.KeySize = keyBytes.Length == 32 ? 256 : 128; const int chunkSize = 4096; using (ICryptoTransform decryptor = symmetricKey.CreateDecryptor(keyBytes, ivBytes)) { using (CryptoStream cryptoStream = new CryptoStream(dataStream, decryptor, CryptoStreamMode.Read)) { while (dataStream.Position != dataStream.Length) { long remainingBytes = dataStream.Length - dataStream.Position; var buffer = chunkSize > remainingBytes ? new byte[(int) remainingBytes] : new byte[chunkSize]; cryptoStream.Read(buffer, 0, buffer.Length); outStream.Write(buffer, 0, buffer.Length); outStream.Flush(); } //cryptoStream.FlushFinalBlock(); // Was throwing an exception } } symmetricKey.Clear(); } public static void StreamCompress(Stream dataStream, FileStream outStream) { dataStream.Position = 0; outStream.Position = 0; const int chunkSize = 4096; using (GZipStream gzs = new GZipStream(outStream, CompressionMode.Compress)) { while (dataStream.Position != dataStream.Length) { long remainingBytes = dataStream.Length - dataStream.Position; var buffer = chunkSize > remainingBytes ? new byte[(int)remainingBytes] : new byte[chunkSize]; dataStream.Read(buffer, 0, buffer.Length); gzs.Write(buffer, 0, buffer.Length); gzs.Flush(); } } } 并仅设置相应的变量,但是如果它有 public static void StreamDecompress(Stream dataStream, FileStream outStream) { byte[] buffer = new byte[4096]; dataStream.Position = 0; using (GZipStream gzs = new GZipStream(dataStream, CompressionMode.Decompress)) { for (int r = -1; r != 0; r = gzs.Read(buffer, 0, buffer.Length)) if (r > 0) outStream.Write(buffer, 0, r); } } 输出类型和名称变量以及当前行内容。< / p>

type

注意:

  • 我们使用更快的name 表达式,而不是使用带有脚本块的foreach更慢的流水线操作。
  • 正则表达式中的
  • row表示任何单词字符,包括a-zA-Z0-9和_以及some more
  • 正则表达式匹配和字符串比较在PowerShell中默认为 敏感