切片和切块字符串与正则表达式

时间:2014-03-20 03:05:19

标签: javascript regex

我对正在跳转到第一个非空白字符的正则表达式模式感兴趣,返回该索引,然后从那里返回下一个空白字符,并返回该索引。

基本上,我要做的是从字符串中拉出一个单词。但是我也想保留索引,因为我需要重建那个字符串而没有我们刚刚拔出的字。

这类事:

var start = txt.search(/\S/); //this gets the index of the first non whitespace character
var end = txt.search(/\s/); //this gets the index of the first whitespace character
var word = txt.slice(start,end); //get the word
txt = txt.slice(end); //update txt to hold the rest of the string

该实现的问题是如果第一个空白字符出现在第一个非空白字符之前,我们会得到不希望的结果。

如果.search有一个超级有用的起始索引,但除此之外我感到难过。

试着更好地说出来:

我需要第一个非空白字符的索引,然后是第一个非空白字符后面的第一个空白字符的索引。这将允许我从字符串中获取单个单词。

2 个答案:

答案 0 :(得分:0)

如果您真的想从句子中取出单词,您有两种选择: 修剪句子,这很容易,但因为你想要保留第一句话。还有下一个选择。 2.将句子分成数组

var txt = " this is a sentece ";
var words = txt.split(/\W/);
while(words.indexOf("") !== -1) {
    words.splice(words.indexOf(""), 1);
}

现在字词["this", "is", "a", "sentece"]

答案 1 :(得分:0)

这个怎么样?

var string = "      Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.",
    matches = [],
    re = /\S+/g,
    match;

while (match = re.exec(string)){
    matches.push(match);
}

/* matches = [
    [0: "Lorem"
     index: 6
     input: "      Lorem Ipsum is simply dummy te..."],
    [0: "Ipsum"
     index: 12
     input: "      Lorem Ipsum is simply dummy te..."],

     ...

]; */