将电子邮件(文本文件)的每个单词放入数组C#的最简单方法

时间:2014-05-05 19:38:12

标签: c# arrays streamreader

我正在尝试为类项目构建一个网络钓鱼扫描程序,我一直试图将一个电子邮件保存在一个文本文件中,以便正确地复制到一个数组中以便以后处理。我想要的是每个单词都在其自己的数组索引中。

以下是我的示例电子邮件:

Subject: Insufficient Funds Notice
Date: September 25, 2013

Insufficient Funds Notice
Unfortunately, on 09/25/2013 your available balance in your Wells Fargo account XXXXXX4653 was insufficient to cover one or more of your checks, Debit Card purchases, or other transactions. 
An important notice regarding one or more of your payments is now available in your Messages & Alerts inbox. 
To read the message, click here, and first confirm your identity. 
Please make deposits to cover your payments, fees, and any other withdrawals or transactions you have initiated. If you have already taken care of this, please disregard this notice. 
We appreciate your business and thank you for your prompt attention to this matter. 
If you have questions after reading the notice in your inbox, please refer to the contact information in the notice. Please do not reply to this automated email. 
Sincerely, 
Wells Fargo Online Customer Service 
wellsfargo.com | Fraud Information Center
4f57e44c-5d00-4673-8eae-9123909604b6

我不想要任何标点符号,只需要单词和数字。

这是我到目前为止编写的代码。

    StreamReader sr1 = new StreamReader(lblDisplaySelectedFilePath.Text);
    string line = sr1.ReadToEnd();
    words = line.Split(' ');
    int wordslowercount = 0;
    foreach (string word in words)
    {
        words[wordslowercount] = word.ToLower();
        wordslowercount = wordslowercount + 1;   
    }

上述代码的问题在于我不断收到串在一起的文字和/或有" \ r"或" \ n"他们在阵列中。这是一个我不想要的数组中的例子。

"notice\r\ndate:"不想要\ r,\ n或者:\ n。这两个词也应该在不同的索引中。

3 个答案:

答案 0 :(得分:3)

正则表达式\W将允许您拆分字符串并创建单词列表。这使用了单词边界,因此它不包括标点符号。

Regex.Split(inputString, "\\W").Where(x => !string.IsNullOrWhiteSpace(x));

答案 1 :(得分:2)

using System;
using System.Text.RegularExpressions;

public class Example
{
    static string CleanInput(string strIn)
    {
        // Replace invalid characters with empty strings. 
        try {
           return Regex.Replace(strIn, @"[^\w\.@-]", "", 
                                RegexOptions.None, TimeSpan.FromSeconds(1.5)); 
        }
        // If we timeout when replacing invalid characters,  
        // we should return Empty. 
        catch (RegexMatchTimeoutException) {
           return String.Empty;   
        }
    }
}

答案 2 :(得分:1)

使用line.Split(null)会在空白处分开。来自C# String.Split method documentation

  

如果separator参数为null或不包含字符,则假定空格字符为分隔符。空格字符由Unicode标准定义,如果将它们传递给Char.IsWhiteSpace方法,则返回true。