正则表达式在冒号前捕获转录说话者姓名

时间:2017-10-06 10:06:26

标签: regex

从文本记录中,我想捕获所有名称的发言人。 目标名称从一行的开头开始,应以“:”结尾(即结肠和空格)。

可选地,为了更精细的控制,可以安全地假设第一个冒号和两个空格。

示例文字:

Julian Z.:          What's really exciting is the opportunity to be more intelligent about how you approach trying to reach your consumer. In a world where digital and the use of digital has exploded, to be able to have one-on-one conversations in the digital world, and to be able to eventually translate that into the TV space, whether that be addressable or data-driven, is really fantastic. Because at the end of the day, you want your brand, in our case, our networks, to be able to have a relationship with the consumer. Data is a proxy to allow for that to occur.
            From an advertiser perspective, obviously now the ability to go to the broadcast networks and have a data-driven buy has absolutely blown up and proliferated. That's with us. That's with some of our competitors. Obviously, we think we're the best at it, but neither here nor there. I think it's a really wonderful foundational approach for advertisers to take. I think it's a great advancement in the market.
                As a spender of money, and as somebody who is trying to get people to engage with our brands, the ability to use data to really have, again, these really one-on-one, unique conversations, and to be able to deliver creative content that's relevant for individual consumers, that's driven by what we know about the consumer, now, ultimately, where we can reach them effectively and in environments where we know they're engaged, is really a great, tremendous advancement. You'll see by our ratings numbers, which are on the upswing, that approach has really had a direct impact on what our linear ratings have resulted in.

Speaker 2:          Great. Tell us a little bit about Viacom. It's a lot of fans, a lot of passion in people. How do you define the audience in broad strokes? How do they respond to advertising and what are some of the concerns that consumers have around ads?

Julian Z.:          Well, I think, again, when you're talking about how we're reaching fans, it is using intelligence, and information, and data, not only to profile who our fans are, but ultimately where they're best reached. Our job is to deliver great, compelling content, which we believe we're really, really good at. 
                In order to do that, there's the linear side of the equation, but of course we want to make sure that we're reaching our fans in digital as well, and that there's a 360 kind of fan experience. We believe holistically that our fans are really the base of what we're trying to do. We're trying to please and create value for our fans. The more we engage with them, and the more we know about them, the better we're able to deliver customized content that fits their need. 
                Ultimately, as a content creator, what's more exciting than to delivery really great content to people that they really, really engage with and they build relationships with? That's all you can really hope for is, somebody that creates content, is to be able to develop compelling content and content that your audience really wants to engage with.

Speaker 2:          When you look at targeting, is that a cross-platform? Where does that targeting happen?

Julian Z.:          It absolutely is cross-platform. Of course, there is natural addressability in the digital market, because it is much more of a one-to-one. But now you see a lot of the MVPDs have obviously opened up addressable inventory. A lot of the MVPDs now have matured their addressable footprint, which allows you now to have a digital-like, not exactly the same obviously, but a digital-like experience in the linear space, to deliver content to the consumer or advertising to the consumer when it's relevant and when it's going to have the most impact for your message. 
                Ultimately, it's absolutely cross-platform because addressability is all about having that conversation, having that direct one-to-one with your audience. Our partners on the MVPD side have really matured over the last several years as of regard to addressable, and now you can have that 360 experience of having a conversation in linear and in digital that really is addressable. 

要捕获的示例字符串为:Julian Z.Speaker 2。名称因文本而异。我需要所有/多个名字。如您所见,名称可能包含alpha case,标点符号和数字的混合。

我想重复删除文本中重复的名称,但我相信我现在应该搁置它,将这个问题集中在捕获上。

我已经尝试了很多,最后一两天。

例如。 ^[^:]+\s*/g接近,但只捕获第一个Julian Z.,而我想要一切。目前,我缺乏想法,需要学习如何做到这一点。

2 个答案:

答案 0 :(得分:1)

您可以根据否定的字符类使用此正则表达式:

/^\w[^:\n]*/mg

RegEx分手:

  • ^\w:在开头匹配单词字符
  • [^:\n]*:匹配任何不是冒号但不是换行符的字符的零个或多个。

<强>代码:

var names = inputData.transcript.match(/^\w[^:\n]*/mg) || [];

答案 1 :(得分:1)

正则表达式匹配任何字符直到第一个冒号:

/^.*?(?=:)/gm

https://regex101.com/r/3uyXMM/3

^:从行首开始匹配

.:匹配任何内容

*?:非贪婪的搜索,因此它会在第一个冒号处停止(参见下一行)

(?=:):正向前瞻意味着下一个字符应该是冒号,但它不会捕获

g:第一场比赛后不返回,返回所有比赛

m:为每一行运行正则表达式