我试图从一大堆非常长的字符串中提取主要单词,以简化显示它...
假设我们有一个字符串数组输出:
Something One
Something [ABC] Two
Something [ABC] Three
Something Four Section 1
Something Four Section 2
Something Five
如何移除non-constant
重复字词,例如Something
和[ABC]
,以便它只留下每个字符串的唯一标识符,例如One
Two
{{ 1}}并输出此列表:
Three
知道:
副本是;在列表中重复多次的任何单词
{" One"," Two"," Three",..}如上所述,不是常数,只是为了示例并且可以改为其他任何东西,例如{" Alpha" " Bravo"," Charlie"}或{" Nu"," Xi"," Pi"}不要重复。
如果存在某个单词(在这种情况下)"第1节",则保留之前的单词以便" Something Four Section 1"将成为"四部分1"
答案 0 :(得分:1)
除了Section 1"
之类的某些单词之外,此解决方案假定您一无所知(就像John Snow一样)。它适用于任意字符串输入。它有两个要点。
1)FindRepeatedWords
是一个填充UniqueWords
hashset和Repeats
hashset的方法。 UniqueWords,顾名思义就是列表中的每个单词,重复是重复的单词。
2)CleanUpWordsAndDoNotChangeList
是做你想要的主要方法。它决定删除基于某些单词的单词。
namespace StackOverfFLow {
using System;
using System.Collections.Generic;
using System.Linq;
internal class Program {
private static readonly HashSet<string> UniqueWords = new HashSet<string>();
private static readonly HashSet<string> Repeats = new HashSet<string>();
private static readonly List<string> CertainWords = new List<string> { "Section 1", "Section 2" };
private static readonly List<string> Words = new List<string> { "Something One", "Something [ABC] Two", "Something [ABC] Three", "Something Four Section 1", "Something Four Section 2", "Something Five" };
private static void Main(string[] args) {
FindRepeatedWords();
var result = CleanUpWordsAndDoNotChangeList();
result.ForEach(Console.WriteLine);
Console.ReadKey();
}
/// <summary>
/// Cleans Up Words And Des oNot Change List.
/// </summary>
/// <returns></returns>
private static List<string> CleanUpWordsAndDoNotChangeList() {
var newList = new List<string>();
foreach(var t in Words) {
var sp = SeperateStringByString(t);
for(var index = 0; index < sp.Count; index++) {
if(Repeats.Contains(sp[index]) != true) { continue; }
var fixedTocheck = sp.ElementAtOrDefault(index + 1);
if(fixedTocheck == null || CertainWords.Contains(fixedTocheck)) { continue; }
sp.RemoveAt(index);
index = index - 1;
}
newList.Add(string.Join(" ", sp));
}
return newList;
}
/// <summary>
/// Finds Unique and Repeated Words.
/// </summary>
private static void FindRepeatedWords() {
foreach(var eachWord in Words) {
foreach(var element in SeperateStringByString(eachWord)) {
if(UniqueWords.Add(element) == false) { Repeats.Add(element); };
}
}
}
/// <summary>
/// Seperates a string by another string
/// </summary>
/// <param name="source">Source string</param>
/// <returns></returns>
private static List<string> SeperateStringByString(string source) {
var seperatedStringByString = new List<string>();
foreach(var certainWord in CertainWords) {
var indexOf = source.IndexOf(certainWord);
if(indexOf <= -1) { continue; }
var a = source.Substring(0, indexOf).Trim().Split(' ');
seperatedStringByString.AddRange(a);
seperatedStringByString.Add(certainWord);
}
if(seperatedStringByString.Count < 1) { seperatedStringByString.AddRange(source.Split(' ')); }
return seperatedStringByString;
}
}
}
答案 1 :(得分:0)
我不确定这是你想要的,但我会通过我的代码。
快速代码:
string itemName = "";
List<string> destinationArray = new List<string>();
List<string> inputArrayList = new List<string>();
inputArrayList.Add("Something One");
inputArrayList.Add("Something [ABC] Two");
inputArrayList.Add("Something [ABC] Three");
inputArrayList.Add("Something Four Section 1");
inputArrayList.Add("Something Four Section 2");
inputArrayList.Add("Something Five");
inputArrayList.Add("Other Text");
List<string> allWordList = new List<string>();
foreach (var item in inputArrayList)
{
allWordList.AddRange(item.Split(' ').ToList());
}
List<string> searchingArrayList = new List<string>();
searchingArrayList = allWordList.GroupBy(x => x)
.Where(group => group.Count() > 1)
.Select(group => group.Key).ToList();
foreach (var itemInput in inputArrayList)
{
itemName = itemInput;
foreach (var itemSearching in searchingArrayList)
{
itemName = itemName.Replace(itemSearching, "");
}
destinationArray.Add(itemName);
}
destinationArray.ToList().ForEach(x => Console.WriteLine(x));
Console.ReadKey();