LINQ:选择行,其中字符串的任何单词均以特定字符开头

时间:2019-02-06 17:08:33

标签: linq

我想从表中提取所有行,其中一列(字符串)中至少有一个以指定字符开头的单词。 示例:

Row 1: 'this is the first row'
Row 2: 'this is th second row'
Row 3: 'this is the third row'

如果指定的字符为T->我将提取所有3行 如果指定的字符是S->我将仅提取第二列 ...

请帮助我

4 个答案:

答案 0 :(得分:0)

假设您用“单词”表示“用空格分隔的字符序列,或者开始以空格或空格结束”,那么您可以在分隔符上进行分割并测试它们是否匹配:

var src = new[] {
    "this is the first row",
    "this is th second row",
    "this is the third row"
};

var findChar = 'S';
var lowerFindChar = findChar.ToLower();
var matches = src.Where(s => s.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).Any(w => w.ToLower()[0] == lowerFindChar));

LINQ Enumerable.Any方法tests a sequence to see if any element matches,因此您可以将每个字符串拆分为一系列单词,并查看是否有任何单词以所需字母开头,以补偿大小写。

答案 1 :(得分:0)

尝试一下:

rows.Where(r => Regex.IsMatch(r, " [Tt]"))

您可以将Tt替换为Ss(假设您要使用大写或小写)。

答案 2 :(得分:0)

问题当然是什么是“单词”?

根据您的定义,单词上方句子中的字符序列“单词”是否正确?它不是以空格开头,甚至不是空格。

一个单词的定义可以是:

定义 wordCharacter :类似A-Z,a-z。
定义单词:  -字符串开头的wordCharacters的非空序列,后跟非wordcharacter  -或在字符串末尾以非单词字符开头的单词字符的非空序列  -字符串前后任何非空的wordCharacter序列,后跟一个非wordcharacter 定义单词开头:单词的第一个字符。

字符串:“一些奇怪的字符:'A',9,äll,B9 C $ X? -文字:一些奇怪的字符,A -不是字:9,äll,B9,C $ X?

因此,您首先必须精确指定单词的含义,然后才能定义函数。

我将其写为IEnumerable<string>的扩展方法。用法将类似于LINQ。参见Extension Methods Demystified

bool IsWordCharacter(char c) {... TODO: implement your definition of word character}

static IEnumerable<string> SplitIntoWords(this string text)
{
    // TODO: exception if text null
    if (text.Length == 0) return 

    int startIndex = 0;
    while (startIndex != text.Length)
    {   // not at end of string. Find the beginning of the next word:
        while (startIndex < text.Length && !IsWordCharacter(text[startIndex])) 
        {
            ++startIndex;
        }

        // now startIndex points to the first character of the next word
        // or to the end of the text

        if (startIndex != text.Length)
        {   // found the beginning of a word.
            // the first character after the word is either the first non-word character,
            // or the end of the string

            int indexAfterWord = startWordIndex + 1;
            while (indexAfterWord < text.Length && IsWordCharacter(text[indexAfterWord]))
            {
                ++indexAfterWord;
            }

            // all characters from startIndex to indexAfterWord-1 are word characters
            // so all characters between startIndexWord and indexAfterWord-1 are a word
            int wordLength = indexAfterWord - startIndexWord;
            yield return text.SubString(startIndexWord, wordLength);
        }
    }
}

现在您已经有了将任何字符串拆分为单词定义的过程,您的查询将很简单:

IEnumerabl<string> texts = ...
char specifiedChar = 'T';

// keep only those texts that have at least one word that starts with specifiedChar:
var textsWithWordThatStartsWithSpecifiedChar = texts
    // split the text into words
    // keep only the words that start with specifiedChar
    // if there is such a word: keep the text
    .Where(text => text.SplitIntoWords()
                   .Where(word => word.Length > 0 && word[0] == specifiedChar)
                   .Any());

答案 3 :(得分:0)

var yourChar = "s";

var texts = new List<string> {
    "this is the first row",
    "this is th second row",
    "this is the third row"
};

var result = texts.Where(p => p.StartsWith(yourChar) || p.Contains(" " + yourChar));

编辑:

替代方式(我不确定它是否适用于 linq 查询)

var result = texts.Where(p => (" " + p).Contains(" " + yourChar));
  • 如果需要不区分大小写的检查,可以使用 .ToLower()。