如何将文本转换为有用的数据数组?

时间:2019-04-15 19:41:06

标签: javascript jquery arrays json

我正在从服务器提取文本。我提取的数据没有组织以供进一步使用。我提取的文本如下所示:-

>>[Extracted] id: 194805284, got 55 points from  jones  (252906152669) date: 15/04/19 08:44:40 you have 30 points remaining 

我不想要所有这些文本,我只想要id,点,数字和日期。

注意:我可能会不时提取一次以上的消息。

因此,要提取id,点,数字和日期,我用span标签包裹了每个单词,然后使用此代码:

var getData = {
    //gets the id, points, date and number respectively
    number1 : $('span:contains("id:")').next().text(),
    amount : $('span:contains("got")').next().text(),
    time : $('span:contains("date:")').next().text(),
    number : $('span:contains("date:")').prev().text()
}

我使用此代码的原因是,我可能会自动提取1条以上的消息,因此对于每条提取的消息,除id,点,日期和数字外,它包含的每个单词都是相同的。 / p>

我使用上面的代码提取了我想要的数据,但是这次有2条[提取的]消息,请看下面。

HTML

<p>[Extracted] id: 194805284, got 55 points from  jones  (252906152669)
 date: 15/04/19 08:44:40 you have 30 points remanining  [Extracted] id: 193537533, got 3 points from  Micheal (907794804)
 date: 14/04/19 10:15:32, you have  100 points remaining</p>
<div class="processed-data">
</div>

CSS:

span {
    border: 1px solid red;
}

JS:

// wrap every word with <span> tag
var words = $("p").text().split(" ");
$("p").empty();
$.each(words, function(i, v) {
    $("p").append($("<span>").text(v));
});

//extract the id, points, time and number respectively
var getData = {
    number1: $('span:contains("id:")').next().text(),
    amount: $('span:contains("got")').next().text(),
    //amount : $('span:contains("got")').next().text().substring(1),
    time: $('span:contains("date:")').next().text(),
    number: $('span:contains("date:")').prev().text()
}

// Output the extracted data to .processed-data div
$('.processed-data').append("thisTime = { [id: " + getData.number1 + " amount: " + getData.amount + ", time: " + getData.time + " number: " + getData.number + "]}'");

这里是a JSFiddle

输出:

thisTime = {[id: 194805284,193537533, amount: 553, time: 15/04/1914/04/19 number: (252906152669) (907794804) ]}'

我期望的结果是: 对于每个[提取的]消息,以获取其自己的数组。通过使用循环或其他方式。

示例:

现在我明白了

thisTime = {
        [id: 194805284,193537533, // All the ids are stored in 1 array data 
        amount: 553, // All the points are stored in 1 array data e.t.c
        time: 15/04/1914/04/19 
        number: (252906152669) (907794804)]
        }

我想得到:

thisTime = {
[id: 194805284, 
amount: 55, 
time: 15/04/19
number: (252906152669)],
[id:193537533, 
amount: 3, 
time: 14/04/19 
number: (907794804)]
}

我只希望提取的每个消息都有自己的数组。

3 个答案:

答案 0 :(得分:1)

您可以轻松地使用正则表达式(Regex)来解决此问题-是否有任何特殊原因将每个单词都用一个跨度包装?

以下正则表达式应匹配字符串中的所有标记:

id:\s+(\d+),\s+got\s+(\d+)\s+points\s+from\s+.+?\s+\((\d+)\)\s+date:\s+(\d+)\/(\d+)\/(\d+)\s+(\d+):(\d+):(\d+)

我在这里使用\s+代替空格,因为似乎上面模板中的间距不一致,并且为了安全起见,我喜欢对任何数量的空白使用\s+。 / p>

您可以像这样提取一条消息...

const regex = /id:\s+(\d+),\s+got\s+(\d+)\s+points\s+from\s+.+?\s+\((\d+)\)\s+date:\s+(\d+)\/(\d+)\/(\d+)\s+(\d+):(\d+):(\d+)/; // construct the regex literal
const message = // some string matching your "extracted" template

const match = message.match(regex); // now your match contains all the data

const [fullMatch, idString, pointString, dayString, monthString, yearString, hourString, minuteString, secondString] = match; // you don't have to destructure, but this is the order of the capturing groups.

通过执行以下操作,您还应该也可以匹配多个...

let match;
while (match = regex.exec(message)) {
  // now match can be handled the same way as above. You could alternatively push the matches to a list as well here.
} 

答案 1 :(得分:1)

我建议您使用正则表达式来解决它,我认为比您使用的Jquery方法更好。

查看可能的正则表达式解决方案:

var text = '[Extracted] id: 194805284, got 55 points from  jones  (252906152669)  date: 15/04/19 08:44:40 you have 30 points remanining  [Extracted] id: 193537533, got 3 points from  Micheal (907794804)  date: 14/04/19 10:15:32, you have  100 points remaining';
var textArray = text.split('[Extracted]');

var regularExpression = /id:\s+([0-9]+).+got\s+([0-9]+).+[^\(]+\(([0-9]+)\)\s+date:\s+([0-9\/\s:]+)/i;
var output = [];
var item;
for(var i = 1; i < textArray.length;  i++){
	item = textArray[i].match(regularExpression);
	output.push({
		id: item[1].trim(),
		amount: item[2].trim(),
		time: item[4].trim(),
		number: item[3].trim()
	});
}

console.log(output);

答案 2 :(得分:1)

您的问题是 getData 。我建议分解在 Extracted 上和在空格之后分割的字符串。之后,您可以选择按句子分组的子跨度并进行过滤,以创建包含一个或多个对象的数组。

var sentences = $("p").text().split("\[Extracted\]").slice(1);
$("p").empty();
$.each(sentences, function(i, v) {
    var words = ['Extracted'].concat(v.trim().split(/ +/));
    $.each(words, function(idx, word) {
        $("p").append($("<span/>", {text: word.trim()}));
    });
});

var result  = {thisTime: $("p span:contains(Extracted)").map(function(idx, txt) {
    var x = $(this).nextUntil('span:contains(Extracted)');
    return {id: x.filter('span:contains("id:")').next().text(),
        amount: x.filter('span:contains("got")').next().text(),
        time: x.filter('span:contains("date:")').next().text(),
        number: x.filter('span:contains("date:")').prev().text()};
}).get()};
$('.processed-data').append(JSON.stringify(result));
span {
    border: 1px solid red;
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>

<p>[Extracted] id: 194805284, got 55 points from  jones  (252906152669)
    date: 15/04/19 08:44:40 you have 30 points remanining  [Extracted] id: 193537533, got 3 points from  Micheal (907794804)
    date: 14/04/19 10:15:32, you have  100 points remaining</p>
<div class="processed-data">
</div>