我想从表中提取所有行,其中一列(字符串)中至少有一个以指定字符开头的单词。 示例:
Row 1: 'this is the first row'
Row 2: 'this is th second row'
Row 3: 'this is the third row'
如果指定的字符为T->我将提取所有3行 如果指定的字符是S->我将仅提取第二列 ...
请帮助我
答案 0 :(得分:0)
假设您用“单词”表示“用空格分隔的字符序列,或者开始以空格或空格结束”,那么您可以在分隔符上进行分割并测试它们是否匹配:
var src = new[] {
"this is the first row",
"this is th second row",
"this is the third row"
};
var findChar = 'S';
var lowerFindChar = findChar.ToLower();
var matches = src.Where(s => s.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).Any(w => w.ToLower()[0] == lowerFindChar));
LINQ Enumerable.Any
方法tests a sequence to see if any element matches,因此您可以将每个字符串拆分为一系列单词,并查看是否有任何单词以所需字母开头,以补偿大小写。
答案 1 :(得分:0)
尝试一下:
rows.Where(r => Regex.IsMatch(r, " [Tt]"))
您可以将Tt
替换为Ss
(假设您要使用大写或小写)。
答案 2 :(得分:0)
问题当然是什么是“单词”?
根据您的定义,单词上方句子中的字符序列“单词”是否正确?它不是以空格开头,甚至不是空格。
一个单词的定义可以是:
定义 wordCharacter :类似A-Z,a-z。
定义单词:
-字符串开头的wordCharacters的非空序列,后跟非wordcharacter
-或在字符串末尾以非单词字符开头的单词字符的非空序列
-字符串前后任何非空的wordCharacter序列,后跟一个非wordcharacter
定义单词开头:单词的第一个字符。
字符串:“一些奇怪的字符:'A',9,äll,B9 C $ X? -文字:一些奇怪的字符,A -不是字:9,äll,B9,C $ X?
因此,您首先必须精确指定单词的含义,然后才能定义函数。
我将其写为IEnumerable<string>
的扩展方法。用法将类似于LINQ。参见Extension Methods Demystified
bool IsWordCharacter(char c) {... TODO: implement your definition of word character}
static IEnumerable<string> SplitIntoWords(this string text)
{
// TODO: exception if text null
if (text.Length == 0) return
int startIndex = 0;
while (startIndex != text.Length)
{ // not at end of string. Find the beginning of the next word:
while (startIndex < text.Length && !IsWordCharacter(text[startIndex]))
{
++startIndex;
}
// now startIndex points to the first character of the next word
// or to the end of the text
if (startIndex != text.Length)
{ // found the beginning of a word.
// the first character after the word is either the first non-word character,
// or the end of the string
int indexAfterWord = startWordIndex + 1;
while (indexAfterWord < text.Length && IsWordCharacter(text[indexAfterWord]))
{
++indexAfterWord;
}
// all characters from startIndex to indexAfterWord-1 are word characters
// so all characters between startIndexWord and indexAfterWord-1 are a word
int wordLength = indexAfterWord - startIndexWord;
yield return text.SubString(startIndexWord, wordLength);
}
}
}
现在您已经有了将任何字符串拆分为单词定义的过程,您的查询将很简单:
IEnumerabl<string> texts = ...
char specifiedChar = 'T';
// keep only those texts that have at least one word that starts with specifiedChar:
var textsWithWordThatStartsWithSpecifiedChar = texts
// split the text into words
// keep only the words that start with specifiedChar
// if there is such a word: keep the text
.Where(text => text.SplitIntoWords()
.Where(word => word.Length > 0 && word[0] == specifiedChar)
.Any());
答案 3 :(得分:0)
var yourChar = "s";
var texts = new List<string> {
"this is the first row",
"this is th second row",
"this is the third row"
};
var result = texts.Where(p => p.StartsWith(yourChar) || p.Contains(" " + yourChar));
编辑:
替代方式(我不确定它是否适用于 linq 查询)
var result = texts.Where(p => (" " + p).Contains(" " + yourChar));