我有一些字符串包含表情符号图标的代码,例如:grinning:
,:kissing_heart:
或:bouquet:
。我想处理它们以删除表情符号代码。
例如,给定:
你好:咧嘴笑:你好吗?:kissing_heart:你还好吗?:花束:
我想得到这个:
你好,你好吗?你还好吗?
我知道我可以使用这段代码:
richTextBox2.Text = richTextBox1.Text.Replace(":kissing_heart:", "").Replace(":bouquet:", "").Replace(":grinning:", "").ToString();
但是,我必须删除856个不同的表情符号图标(使用此方法,将对Replace()
进行856次调用)。有没有其他方法可以实现这个目标?
答案 0 :(得分:27)
您可以使用正则表达式匹配:anything:
之间的单词。使用Replace
函数可以进行其他验证。
string pattern = @":(.*?):";
string input = "Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet: Are you super fan, for example. :words not to replace:";
string output = Regex.Replace(input, pattern, (m) =>
{
if (m.ToString().Split(' ').Count() > 1) // more than 1 word and other validations that will help preventing parsing the user text
{
return m.ToString();
}
return String.Empty;
}); // "Hello , how are you? Are you fine? Are you super fan, for example. :words not to replace:"
如果您不想使用使用lambda表达式的Replace
,可以使用\w
,如@ yorye-nathan所述,仅匹配单词。
string pattern = @":(\w*):";
string input = "Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet: Are you super fan, for example. :words not to replace:";
string output = Regex.Replace(input, pattern, String.Empty); // "Hello , how are you? Are you fine? Are you super fan, for example. :words not to replace:"
答案 1 :(得分:16)
string Text = "Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:";
我会这样解决
List<string> Emoj = new List<string>() { ":kissing_heart:", ":bouquet:", ":grinning:" };
Emoj.ForEach(x => Text = Text.Replace(x, string.Empty));
更新 - 参阅详细评论
另一种方法:只替换现有的Emojs
List<string> Emoj = new List<string>() { ":kissing_heart:", ":bouquet:", ":grinning:" };
var Matches = Regex.Matches(Text, @":(\w*):").Cast<Match>().Select(x => x.Value);
Emoj.Intersect(Matches).ToList().ForEach(x => Text = Text.Replace(x, string.Empty));
但我不确定这种短聊天串是否会有很大的不同,而且让代码易于阅读/维护更为重要。 OP的问题是关于减少冗余Text.Replace().Text.Replace()
而不是最有效的解决方案。
答案 2 :(得分:8)
我会结合使用已经建议的一些技术。首先,我将800多个表情符号字符串存储在数据库中,然后在运行时加载它们。使用HashSet将它们存储在内存中,这样我们就有了O(1)查找时间(非常快)。使用正则表达式从输入中提取所有可能的模式匹配,然后将每个模式匹配与我们的哈希表情符号进行比较,删除有效的表情符号并保留用户自己输入的任何非表情符号模式......
public class Program
{
//hashset for in memory representation of emoji,
//lookups are O(1), so very fast
private HashSet<string> _emoji = null;
public Program(IEnumerable<string> emojiFromDb)
{
//load emoji from datastore (db/file,etc)
//into memory at startup
_emoji = new HashSet<string>(emojiFromDb);
}
public string RemoveEmoji(string input)
{
//pattern to search for
string pattern = @":(\w*):";
string output = input;
//use regex to find all potential patterns in the input
MatchCollection matches = Regex.Matches(input, pattern);
//only do this if we actually find the
//pattern in the input string...
if (matches.Count > 0)
{
//refine this to a distinct list of unique patterns
IEnumerable<string> distinct =
matches.Cast<Match>().Select(m => m.Value).Distinct();
//then check each one against the hashset, only removing
//registered emoji. This allows non-emoji versions
//of the pattern to survive...
foreach (string match in distinct)
if (_emoji.Contains(match))
output = output.Replace(match, string.Empty);
}
return output;
}
}
public class MainClass
{
static void Main(string[] args)
{
var program = new Program(new string[] { ":grinning:", ":kissing_heart:", ":bouquet:" });
string output = program.RemoveEmoji("Hello:grinning: :imadethis:, how are you?:kissing_heart: Are you fine?:bouquet: This is:a:strange:thing :to type:, but valid :nonetheless:");
Console.WriteLine(output);
}
}
结果是:
你好:imadethis:你好吗?你还好吗?这是:a:奇怪的:东西:输入:, 但有效:尽管如此:
答案 3 :(得分:7)
您无需更换所有856表情符号。您只需要替换字符串中出现的那些。所以看看:
Finding a substring using C# with a twist
基本上你提取所有标记,即:和之间的字符串,然后用string.Empty()替换它们
如果您担心搜索将返回非emojis的字符串,例如:其他一些文本:那么您可以进行哈希表查找以确保替换所述找到的令牌是合适的。
答案 4 :(得分:5)
终于开始写点什么了。我结合了前面提到的几个想法,我们应该只在字符串上循环一次。根据这些要求,这听起来像是Linq
的完美工作。
您应该缓存HashSet
。除此之外,这具有O(n)性能,并且仅在列表上进行一次。对基准测试很有意思,但这很可能是最有效的解决方案。
这种方法非常简单。
HashSet
中加载所有Emoij,以便我们快速查找它们。input.Split(':')
:
拆分字符串
:
并保留。StringBuilder
重新构建字符串。using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication1
{
static class Program
{
static void Main(string[] args)
{
ISet<string> emojiList = new HashSet<string>(new[] { "kissing_heart", "bouquet", "grinning" });
Console.WriteLine("Hello:grinning: , ho:w: a::re you?:kissing_heart:kissing_heart: Are you fine?:bouquet:".RemoveEmoji(':', emojiList));
Console.ReadLine();
}
public static string RemoveEmoji(this string input, char delimiter, ISet<string> emojiList)
{
StringBuilder sb = new StringBuilder();
input.Split(delimiter).Aggregate(true, (prev, curr) =>
{
if (prev)
{
sb.Append(curr);
return false;
}
if (emojiList.Contains(curr))
{
return true;
}
sb.Append(delimiter);
sb.Append(curr);
return false;
});
return sb.ToString();
}
}
}
编辑:我使用Rx library做了一些很酷的事情,但后来意识到Aggregate
是Rx中IEnumerable
的{{1}}对应物,从而进一步简化了代码。< / p>
答案 5 :(得分:3)
如果效率是一个问题,并且为了避免处理误报&#34;,请考虑使用StringBuilder重写字符串,同时跳过特殊的表情符号标记:
static HashSet<string> emojis = new HashSet<string>()
{
"grinning",
"kissing_heart",
"bouquet"
};
static string RemoveEmojis(string input)
{
StringBuilder sb = new StringBuilder();
int length = input.Length;
int startIndex = 0;
int colonIndex = input.IndexOf(':');
while (colonIndex >= 0 && startIndex < length)
{
//Keep normal text
int substringLength = colonIndex - startIndex;
if (substringLength > 0)
sb.Append(input.Substring(startIndex, substringLength));
//Advance the feed and get the next colon
startIndex = colonIndex + 1;
colonIndex = input.IndexOf(':', startIndex);
if (colonIndex < 0) //No more colons, so no more emojis
{
//Don't forget that first colon we found
sb.Append(':');
//Add the rest of the text
sb.Append(input.Substring(startIndex));
break;
}
else //Possible emoji, let's check
{
string token = input.Substring(startIndex, colonIndex - startIndex);
if (emojis.Contains(token)) //It's a match, so we skip this text
{
//Advance the feed
startIndex = colonIndex + 1;
colonIndex = input.IndexOf(':', startIndex);
}
else //No match, so we keep the normal text
{
//Don't forget the colon
sb.Append(':');
//Instead of doing another substring next loop, let's just use the one we already have
sb.Append(token);
startIndex = colonIndex;
}
}
}
return sb.ToString();
}
static void Main(string[] args)
{
List<string> inputs = new List<string>()
{
"Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:",
"Tricky test:123:grinning:",
"Hello:grinning: :imadethis:, how are you?:kissing_heart: Are you fine?:bouquet: This is:a:strange:thing :to type:, but valid :nonetheless:"
};
foreach (string input in inputs)
{
Console.WriteLine("In <- " + input);
Console.WriteLine("Out -> " + RemoveEmojis(input));
Console.WriteLine();
}
Console.WriteLine("\r\n\r\nPress enter to exit...");
Console.ReadLine();
}
输出:
In <- Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:
Out -> Hello , how are you? Are you fine?
In <- Tricky test:123:grinning:
Out -> Tricky test:123
In <- Hello:grinning: :imadethis:, how are you?:kissing_heart: Are you fine?:bouquet: This is:a:strange:thing :to type:, but valid :nonetheless:
Out -> Hello :imadethis:, how are you? Are you fine? This is:a:strange:thing :to type:, but valid :nonetheless:
答案 6 :(得分:3)
使用我在下面提到的代码我认为使用此功能可以解决您的问题。
string s = "Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:";
string rmv = ""; string remove = "";
int i = 0; int k = 0;
A:
rmv = "";
for (i = k; i < s.Length; i++)
{
if (Convert.ToString(s[i]) == ":")
{
for (int j = i + 1; j < s.Length; j++)
{
if (Convert.ToString(s[j]) != ":")
{
rmv += s[j];
}
else
{
remove += rmv + ",";
i = j;
k = j + 1;
goto A;
}
}
}
}
string[] str = remove.Split(',');
for (int x = 0; x < str.Length-1; x++)
{
s = s.Replace(Convert.ToString(":" + str[x] + ":"), "");
}
Console.WriteLine(s);
Console.ReadKey();
答案 7 :(得分:3)
我会使用这样的扩展方法:
public static class Helper
{
public static string MyReplace(this string dirty, char separator)
{
string newText = "";
bool replace = false;
for (int i = 0; i < dirty.Length; i++)
{
if(dirty[i] == separator) { replace = !replace ; continue;}
if(replace ) continue;
newText += dirty[i];
}
return newText;
}
}
用法:
richTextBox2.Text = richTextBox2.Text.MyReplace(':');
与Regex相比,此方法在性能方面表现更好
答案 8 :(得分:0)
我会用':'拆分文本,然后构建不包含找到的表情符号名称的字符串。
<html>
<body>
<input type="text" id="fname" onkeyup="myFunction()">
<input type="button" id="a" disabled = "true" value ="click me">
<script>
function myFunction() {
var x = document.getElementById("fname");
x.value = x.value.toUpperCase();
if(x.value != "")
{
document.getElementById('a').disabled=false;
}else {
document.getElementById('a').disabled=true;
}
}
</script>
</body>
</html>