我有一个要求。
我有一个可以包含任何字符的文字。
a)我必须只保留字母数字字符 b)如果找到带有前缀或带有该词后缀的空格的“The”,则需要将其删除。
e.g。
CASE 1:
Input: The Company Pvt Ltd.
Output: Company Pvt Ltd
But
Input: TheCompany Pvt Ltd.
Output: TheCompany Pvt Ltd
because there is no space between The & Company words.
CASE 2:
Similarly, Input: Company Pvt Ltd. The
Output: Company Pvt Ltd
But Input: Company Pvt Ltd.The
Output: Company Pvt Ltd
Case 3:
Input: Company@234 Pvt; Ltd.
Output: Company234 Pvt Ltd
No , or . or any other special characters.
我基本上将数据设置为某些变量,如
_company.ShortName = _company.CompanyName.ToUpper();
所以在保存时我什么也做不了。只有当我从数据库中获取数据时,我才需要应用此过滤器。数据来自 _company.CompanyName
我必须对其应用过滤器。
到目前为止,我已经完成了
public string ReplaceCharacters(string words)
{
words = words.Replace(",", " ");
words = words.Replace(";", " ");
words = words.Replace(".", " ");
words = words.Replace("THE ", " ");
words = words.Replace(" THE", " ");
return words;
}
private void button1_Click(object sender, EventArgs e)
{
MessageBox.Show(ReplaceCharacters(textBox1.Text.ToUpper()));
}
提前致谢。我正在使用C#
答案 0 :(得分:10)
这是一个与您提供的案例相匹配的基本正则表达式。有了Kobi所说的警告,你提供的案例是不一致的,所以我从前四次测试中抽出时间。如果您需要两者,请添加评论。
这可以处理你需要的所有情况,但是边缘情况的迅速扩散让我觉得你应该重新考虑最初的问题吗?
[TestMethod]
public void RegexTest()
{
Assert.AreEqual("Company Pvt Ltd", RegexMethod("The Company Pvt Ltd"));
Assert.AreEqual("TheCompany Pvt Ltd", RegexMethod("TheCompany Pvt Ltd"));
Assert.AreEqual("Company Pvt Ltd", RegexMethod("Company Pvt Ltd. The"));
Assert.AreEqual("Company Pvt LtdThe", RegexMethod("Company Pvt Ltd.The"));
Assert.AreEqual("Company234 Pvt Ltd", RegexMethod("Company@234 Pvt; Ltd."));
// Two new tests for new requirements
Assert.AreEqual("CompanyThe Ltd", RegexMethod("CompanyThe Ltd."));
Assert.AreEqual("theasdasdatheapple", RegexMethod("the theasdasdathe the the the ....apple,,,, the"));
// And the case where you have THETHE at the start
Assert.AreEqual("CCC", RegexMethod("THETHE CCC"));
}
public string RegexMethod(string input)
{
// Old method before new requirement
//return Regex.Replace(input, @"The | The|[^A-Z0-9\s]", string.Empty, RegexOptions.IgnoreCase);
// New method that anchors the first the
//return Regex.Replace(input, @"^The | The|[^A-Z0-9\s]", string.Empty, RegexOptions.IgnoreCase);
// And a third method that does look behind and ahead for the last test
return Regex.Replace(input, @"^(The)+\s|\s(?<![A-Z0-9])[\s]*The[\s]*(?![A-Z0-9])| The$|[^A-Z0-9\s]", string.Empty, RegexOptions.IgnoreCase);
}
我还在我的示例中添加了一个测试方法,用于运行包含正则表达式的RegexMethod。要在代码中使用它,您只需要第二种方法。
答案 1 :(得分:2)
string company = "Company; PvtThe Ltd.The . The the.the";
company = Regex.Replace(company, @"\bthe\b", "", RegexOptions.IgnoreCase);
company = Regex.Replace(company, @"[^\w ]", "");
company = Regex.Replace(company, @"\s+", " ");
company = company.Trim();
// company == "Company PvtThe Ltd"
这些是步骤。 1和2可以组合,但这更清楚。