基于第一个数字javascript拆分文本

时间:2015-07-22 19:29:42

标签: javascript html

how to get numeric value from string?

从这里我看到如何从字符串中提取数字。但我还需要将其后的信息提取到下一个数字。我有很多与此类似的文本,我需要提取每一个时间戳。此文本来自youtube API。

Information Technology- Lecture #1
June 4, 2015
Professor Vasarhelyi
Please visit our website at http://raw.rutgers.edu
Time Stamps:

00:00:28 What is ASEC?
00:02:59 Continuous Monitoring & Continuous Accounting
00:03:43 Assurance
00:07:25 Predictive v. Preventive (Traditional Audit)
00:10:36 Audit Data Standard (ADS)
00:16:37 XBRL and XML
00:20:13 How is technology changing our brains?
00:21:36 Singularity: Artificial Intelligence vs. Human Intelligence
00:37:57 Big Data
00:40:39 NSA Snooping
00:47:59 Internet Trends
00:59:58 E-Education: What will change?
01:08:42 What do you need to know in the age of Google?
01:13:45 Delivery, Assessment, and Granting
01:17:00 Automatic Student Learning Management System
01:20:49 A Degree’s Role in Society
01:23:02 Summary
01:28:52 Primary Priorities for Maintaining Relevance 
01:30:01 GAAP
Summary:
In this lecture, Professor Vasarhelyi introduces what the course will talk about in future sessions while reviewing key and basic concepts with the class.  He also discusses how the Internet changes the way that we think and whether or not robots will soon replace humans in the work force.
Please subscribe to our channel to get the latest updates on the RU Digital Library.

我当前的方法遇到了限制因此我想知道是否可以使用其他方法来仅提取此信息:

00:00:28 What is ASEC?
00:02:59 Continuous Monitoring & Continuous Accounting
00:03:43 Assurance
00:07:25 Predictive v. Preventive (Traditional Audit)
00:10:36 Audit Data Standard (ADS)
00:16:37 XBRL and XML
00:20:13 How is technology changing our brains?
00:21:36 Singularity: Artificial Intelligence vs. Human Intelligence
00:37:57 Big Data
00:40:39 NSA Snooping
00:47:59 Internet Trends
00:59:58 E-Education: What will change?
01:08:42 What do you need to know in the age of Google?
01:13:45 Delivery, Assessment, and Granting
01:17:00 Automatic Student Learning Management System
01:20:49 A Degree’s Role in Society
01:23:02 Summary
01:28:52 Primary Priorities for Maintaining Relevance 
01:30:01 GAAP

我还需要在每个时间戳的末尾添加一个<span>标记,并使用结束标记。所以期望的产出:

<span>00:00:28 What is ASEC?</span>
<span>00:02:59 Continuous Monitoring & Continuous Accounting</span>
<span>00:03:43 Assurance</span>
<span>00:07:25 Predictive v. Preventive (Traditional Audit)</span>
<span>00:10:36 Audit Data Standard (ADS)</span>
<span>00:16:37 XBRL and XML</span>
<span>00:20:13 How is technology changing our brains?</span>
<span>00:21:36 Singularity: Artificial Intelligence vs. Human Intelligence</span>
<span>00:37:57 Big Data</span>
<span>00:40:39 NSA Snooping</span>
<span>00:47:59 Internet Trends</span>
<span>00:59:58 E-Education: What will change?</span>
<span>01:08:42 What do you need to know in the age of Google?</span>
<span>01:13:45 Delivery, Assessment, and Granting</span>
<span>01:17:00 Automatic Student Learning Management System</span>
<span>01:20:49 A Degree’s Role in Society</span>
<span>01:23:02 Summary</span>
<span>01:28:52 Primary Priorities for Maintaining Relevance</span>
<span>01:30:01 GAAP</span>

2 个答案:

答案 0 :(得分:1)

这是使用正则表达式和String.match的另一种方法。定义一个函数从文本中提取时间戳行,一个函数输出它们。传递给第一个函数的正则表达式为:/\n\d.*(?=\n)/g,它表示:查找每个新行,其中一个数字作为第一个字符,然后是另一个新行,全局。请参阅下面的代码段以获取演示。

注意:如果您还可以在第二行(June 4, 2015)上获取日期,您甚至可以向对象添加date属性,并构造一个Javascript日期(可转换为unicode)只需在result[i].date = new Date('June 4, 2015' + ' ' + result[i].time)函数中执行findTimestamps即可获得时间戳。

var text = document.getElementsByTagName('p')[0].textContent;

function findTimestamps(regex, target) {
  var result = target.match(regex);
  for (var i = 0; i < result.length; i++) {
    result[i] = { 
      time: result[i].slice(1, result[i].indexOf(' ')),
      msg: result[i].slice(result[i].indexOf(' ') + 1)
    };
  }
  return result;
}
function outputTimestamps(target, array) {
  var output = '';
  for (var i = 0; i < array.length; i++) {
    output += '<p><span>' + array[i].time + '</span>' + array[i].msg + '</p>';
  }
  target.innerHTML = output;
}

var r = findTimestamps(/\n\d.*(?=\n)/g, text);
outputTimestamps(document.getElementsByTagName('div')[0], r);
body>p { display: none; }
div:last-child { white-space: pre; }
span { margin-right: 20px; }
<p>Information Technology- Lecture #1
June 4, 2015
Professor Vasarhelyi
Please visit our website at http://raw.rutgers.edu
Time Stamps:
00:00:28 What is ASEC?
00:02:59 Continuous Monitoring & Continuous Accounting
00:03:43 Assurance
00:07:25 Predictive v. Preventive (Traditional Audit)
00:10:36 Audit Data Standard (ADS)
00:16:37 XBRL and XML
00:20:13 How is technology changing our brains?
00:21:36 Singularity: Artificial Intelligence vs. Human Intelligence
00:37:57 Big Data
00:40:39 NSA Snooping
00:47:59 Internet Trends
00:59:58 E-Education: What will change?
01:08:42 What do you need to know in the age of Google?
01:13:45 Delivery, Assessment, and Granting
01:17:00 Automatic Student Learning Management System
01:20:49 A Degree’s Role in Society
01:23:02 Summary
01:28:52 Primary Priorities for Maintaining Relevance 
01:30:01 GAAP
Summary:
In this lecture, Professor Vasarhelyi introduces what the course will talk about in future sessions while reviewing key and basic concepts with the class.  He also discusses how the Internet changes the way that we think and whether or not robots will soon replace humans in the work force.
Please subsc</p>
<div></div>
<div></div>

答案 1 :(得分:0)

对于某些伪代码如何:

lines = <your text as an array of strings>
events = []
for (var i = 0; i < lines.length; i++) {
    line = lines[i]
    timestamp = line.split(" ")[0] // get everything before the first space
    description = line.substring(timestamp.length+1) // get everything after the first space
    event = {
        "timestamp": timestamp,
        "description": description
    };
    events.push(event);
}

这应该用数组events填充时间戳为字符串的对象(你说你知道如何将字符串转换为数字,所以我可以让你从那里取出)和描述为另一个字串。拥有该数组后,应该很容易生成项目符号列表或者您想要用来显示它的任何其他HTML;只需创建另一个for循环来生成HTML标记。这是否足以解决您的问题?