替换C#中的字符

时间:2009-11-09 05:33:08

标签: character replace

我有一个要求。

我有一个可以包含任何字符的文字。

a)我必须只保留字母数字字符 b)如果找到带有前缀或带有该词后缀的空格的“The”,则需要将其删除。

e.g。

CASE 1:

 Input:  The Company Pvt Ltd. 

 Output: Company Pvt Ltd

But 

     Input:  TheCompany Pvt Ltd. 

     Output: TheCompany Pvt Ltd

because there is no space between The & Company words.

CASE 2:

Similarly, Input:  Company Pvt Ltd.  The 

     Output: Company Pvt Ltd

But Input:  Company Pvt Ltd.The 

     Output: Company Pvt Ltd

Case 3:

Input: Company@234 Pvt; Ltd.

Output: Company234 Pvt Ltd

No , or . or any other special characters.

我基本上将数据设置为某些变量,如

 _company.ShortName = _company.CompanyName.ToUpper();

所以在保存时我什么也做不了。只有当我从数据库中获取数据时,我才需要应用此过滤器。数据来自 _company.CompanyName

我必须对其应用过滤器。

到目前为止,我已经完成了

public string ReplaceCharacters(string words)
{
    words = words.Replace(",", " ");
    words = words.Replace(";", " ");
    words = words.Replace(".", " ");
    words = words.Replace("THE ", " ");
    words = words.Replace(" THE", " ");
    return words;
}

private void button1_Click(object sender, EventArgs e)
{
    MessageBox.Show(ReplaceCharacters(textBox1.Text.ToUpper()));
}

提前致谢。我正在使用C#

2 个答案:

答案 0 :(得分:10)

这是一个与您提供的案例相匹配的基本正则表达式。有了Kobi所说的警告,你提供的案例是不一致的,所以我从前四次测试中抽出时间。如果您需要两者,请添加评论。

这可以处理你需要的所有情况,但是边缘情况的迅速扩散让我觉得你应该重新考虑最初的问题吗?

    [TestMethod]
    public void RegexTest()
    {
        Assert.AreEqual("Company Pvt Ltd", RegexMethod("The Company Pvt Ltd"));
        Assert.AreEqual("TheCompany Pvt Ltd", RegexMethod("TheCompany Pvt Ltd"));
        Assert.AreEqual("Company Pvt Ltd", RegexMethod("Company Pvt Ltd. The"));
        Assert.AreEqual("Company Pvt LtdThe", RegexMethod("Company Pvt Ltd.The"));
        Assert.AreEqual("Company234 Pvt Ltd", RegexMethod("Company@234 Pvt; Ltd."));
        // Two new tests for new requirements
        Assert.AreEqual("CompanyThe Ltd", RegexMethod("CompanyThe Ltd."));
        Assert.AreEqual("theasdasdatheapple", RegexMethod("the theasdasdathe the the the ....apple,,,, the"));
        // And the case where you have THETHE at the start
        Assert.AreEqual("CCC", RegexMethod("THETHE CCC"));
    }

    public string RegexMethod(string input)
    {   
        // Old method before new requirement          
        //return Regex.Replace(input, @"The | The|[^A-Z0-9\s]", string.Empty, RegexOptions.IgnoreCase);  
        // New method that anchors the first the          
        //return Regex.Replace(input, @"^The | The|[^A-Z0-9\s]", string.Empty, RegexOptions.IgnoreCase);            
        // And a third method that does look behind and ahead for the last test
        return Regex.Replace(input, @"^(The)+\s|\s(?<![A-Z0-9])[\s]*The[\s]*(?![A-Z0-9])| The$|[^A-Z0-9\s]", string.Empty, RegexOptions.IgnoreCase);
    }

我还在我的示例中添加了一个测试方法,用于运行包含正则表达式的RegexMethod。要在代码中使用它,您只需要第二种方法。

答案 1 :(得分:2)

string company = "Company; PvtThe Ltd.The  . The the.the";
company = Regex.Replace(company, @"\bthe\b", "", RegexOptions.IgnoreCase);
company = Regex.Replace(company, @"[^\w ]", "");
company = Regex.Replace(company, @"\s+", " ");
company = company.Trim();
// company == "Company PvtThe Ltd"

这些是步骤。 1和2可以组合,但这更清楚。

  1. 删除“the”作为整个单词(也适用于“.the”)。
  2. 删除任何非字母或空格的内容。
  3. 删除所有相邻的空格。
  4. 从边缘移除空格。