我正在循环使用C#中的很多字符串:
“看起来,对抗遥控器是好事,对生活有好处,那就是别的。”
在这些字符串中,我有一个选定的单词,由前一个函数的索引确定,就像上面例子中的第二个“好”一样。
“看,好的(< - 不是这一个)对抗遥控器是一回事,好(< - 这一个)对抗生活,那是别的。“
我想查找所选单词周围的单词。在上面的例子中,事物和反对。
“看,对于遥控器来说好的是一个的东西,好的反对生活,那就是别的了。”
我尝试用.split()
和正则表达式的不同方法分开字符串,但我找不到实现这个目标的好方法。我可以访问上面示例中的单词 good ,以及位于字符串中的索引(上面的41)。
如果它会忽略标点符号和逗号,那将是一个巨大的奖励,所以在上面的例子中,我的理论函数只会返回反对,因为 thing 之间有逗号和的好
有没有简单的方法来实现这一目标?任何帮助表示赞赏。
答案 0 :(得分:5)
包括"巨额奖金":
string text = "Look, good against remotes is one thing, good against the living, that’s something else.";
string word = "good";
int index = 41;
string before = Regex.Match(text.Substring(0, index), @"(\w*)\s*$").Groups[1].Value;
string after = Regex.Match(text.Substring(index + word.Length), @"^\s*(\w*)").Groups[1].Value;
在这种情况下,由于逗号,before
将为空字符串,而after
将为"反对"。
说明:获取before
时,第一步是抓住字符串的第一部分直到目标字text.Substring(0, index)
执行此操作。然后我们使用正则表达式(\w*)\s*$
匹配并捕获一个单词(\w*
),后跟字符串末尾的任意数量的空格\s*
($
)。第一个捕获组的内容是我们想要的单词,如果我们无法匹配一个单词,正则表达式仍将匹配,但它将匹配一个空字符串或只有空格,并且第一个捕获组将包含一个空字符串。
获取after
的逻辑几乎相同,只是text.Substring(index + word.Length)
用于获取目标字之后的其余字符串。正则表达式^\s*(\w*)
类似,只是它被^
锚定到字符串的开头,而\s*
位于\w*
之前,因为我们需要删除空格上的空格这个词的前端。
答案 1 :(得分:3)
string phrase = "Look, good against remotes is one thing, good against the living, that’s something else.";
int selectedPosition = 41;
char[] ignoredSpecialChars = new char[2] { ',', '.' };
string afterWord = phrase.Substring(selectedPosition)
.Split(' ')[1]
.Trim(ignoredSpecialChars);
string beforeWord = phrase.Substring(0, selectedPosition)
.Split(' ')
.Last()
.Trim(ignoredSpecialChars);
您可以更改ignoredSpecialChars
数组,以摆脱您不需要的特殊字符。
<强>更新强>
如果你的单词和它周围的单词之间有任何特殊字符,则会返回null
。
string phrase = "Look, good against remotes is one thing, good against the living, that’s something else.";
int selectedPosition = 41;
char[] ignoredSpecialChars = new char[2] { ',', '.' };
string afterWord = phrase.Substring(selectedPosition)
.Split(' ')[1];
afterWord = Char.IsLetterOrDigit(afterWord.First()) ?
afterWord.TrimEnd(ignoredSpecialChars) :
null;
string beforeWord = phrase.Substring(0, selectedPosition)
.Split(' ')
.Last();
beforeWord = Char.IsLetterOrDigit(beforeWord.Last()) ?
beforeWord.TrimStart(ignoredSpecialChars) :
null;
答案 2 :(得分:0)
我还没有测试过,但它应该可行。你可以在单词之前和之后查看Substring,然后搜索第一个或最后一个“”。然后你知道单词的开始和结束位置。
string word = "good";
int index = 41
string before = word.Substring(0,index-1).Trim(); //-1 because you want to ignore the " " right in front of the word
string after = word.Substring(index+word.length+1).Trim(); //+1 because of the " " after the word
int indexBefore = before.LastIndexOf(" ");
int indexAfter = after.IndexOf(" ");
string wordBefore = before.Substring(indexBefore, index-1);
string wordAfter = after.Substring(index+word.length+1, indexAfter);
修改强>
如果您想忽略标点符号和逗号,只需将其从字符串
中删除即可答案 3 :(得分:0)
您可以使用正则表达式[^’a-zA-Z]+
从字符串中获取字词:
words = Regex.Split(text, @"[^’a-zA-Z0-9]+");
实施导航取决于您。存储所选单词的索引并使用它来获取下一个或前一个单词:
int index = Array.IndexOf(words, "living");
if (index < words.Count() - 1)
next = words[index + 1]; // that's
if (index > 0)
previous = words[index - 1]; // the
答案 4 :(得分:0)
这是用vb编写的linqpad程序
Sub Main
dim input as string = "Look, good against remotes is one thing, good against the living, that’s something else."
dim words as new list(of string)(input.split(" "c))
dim index = getIndex(words)
dim retVal = GetSurrounding(words, index, "good", 2)
retVal.dump()
End Sub
function getIndex(words as list(of string)) as dictionary(of string, list(of integer))
for i as integer = 0 to words.count- 1
words(i) = getWord(words(i))
next
'words.dump()
dim index as new dictionary(of string, List(of integer))(StringComparer.InvariantCultureIgnoreCase)
for j as integer = 0 to words.count- 1
dim word = words(j)
if index.containsKey(word) then
index(word).add(j)
else
index.add(word, new list(of integer)({j}))
end if
next
'index.dump()
return index
end function
function getWord(candidate) as string
dim pattern as string = "^[\w'’]+"
dim match = Regex.Match(candidate, pattern)
if match.success then
return match.toString()
else
return candidate
end if
end function
function GetSurrounding(words, index, word, position) as tuple(of string, string)
if not index.containsKey(word) then
return nothing
end if
dim indexEntry = index(word)
if position > indexEntry.count
'not enough appearences of word
return nothing
else
dim left = ""
dim right = ""
dim positionInWordList = indexEntry(position -1)
if PositionInWordList >0
left = words(PositionInWordList-1)
end if
if PositionInWordList < words.count -1
right = words(PositionInWordList +1)
end if
return new tuple(of string, string)(left, right)
end if
end function
答案 5 :(得分:0)
如果没有正则表达式,可以使用Array.IndexOf
递归执行此操作。
public class BeforeAndAfterWordFinder
{
public string Input { get; private set; }
private string[] words;
public BeforeAndAfterWordFinder(string input)
{
Input = input;
words = Input.Split(new string[] { ", ", " " }, StringSplitOptions.None);
}
public void Run(int occurance, string word)
{
int index = 0;
OccuranceAfterWord(occurance, word, ref index);
Print(index);
}
private void OccuranceAfterWord(int occurance, string word, ref int lastIndex, int thisOccurance = 0)
{
lastIndex = lastIndex > 0 ? Array.IndexOf(words, word, lastIndex + 1) : Array.IndexOf(words, word);
if (lastIndex != -1)
{
thisOccurance++;
if (thisOccurance < occurance)
{
OccuranceAfterWord(occurance, word, ref lastIndex, thisOccurance);
}
}
}
private void Print(int index)
{
Console.WriteLine("{0} : {1}", words[index - 1], words[index + 1]);//check for index out of range
}
}
用法:
string input = "Look, good against remotes is one thing, good against the living, that’s something else.";
var F = new BeforeAndAfterWordFinder(input);
F.Run(2, "good");
答案 6 :(得分:-2)
创建一个字符串,用于删除标点符号和逗号(使用“删除”)。从该字符串中,搜索Substring“thing good against”。 等等,如果需要的话。