使用Javascript在段落中查找对话框

时间:2018-04-19 20:49:43

标签: javascript arrays split nlp

如何使用javascript从段落中提取对话框,引号之间的句子并将结果存储在数组中?

var myParagraph = '
“Of course I’ll go Kate. You should get back to bed. Would you like some Nyquil or Tylenol?”
“Nyquil, please. Here are the questions and my mini-disc recorder. Just press record here. Make notes, I’ll transcribe it all.”
“I know nothing about him,” I murmur, trying and failing to suppress my rising panic. “The questions will see you through. Go. It’s a long drive. I don’t want you to be late.” “Okay, I’m going. Get back to bed. I made you some soup to heat up later.” I stare at her fondly. Only for you, Kate, would I do this.'

如何拆分myParagraph以返回如下数组:

paragraphArray = ["Of course I’ll go Kate. You should get back to bed. Would you like some Nyquil or Tylenol?",
"Nyquil, please. Here are the questions and my mini-disc recorder. Just press record here. Make notes, I’ll transcribe it all.",
"I know nothing about him,",
"The questions will see you through. Go. It’s a long drive. I don’t want you to be late.",
"Okay, I’m going. Get back to bed. I made you some soup to heat up later."]

感谢。

3 个答案:

答案 0 :(得分:1)

paragraphArray = myParagraph.slice(1, myParagraph.length-2).split("”“");

我认为这项工作。

答案 1 :(得分:0)

正则表达式在这里是个不错的选择。如评论中所述,请确保字符串中的引号是常规"引号,或者修改正则表达式以使用左引号和右引号。所以:

/"(.*?)"/常规报价或:

/“(.*?)”/用于定向引号。

然后,正则表达式"(.*?)"只是说要捕获两个引号字符之间的所有内容,并进行非贪婪的搜索。最后,将g标志添加到正则表达式,以便它将获得所有匹配,而不仅仅是第一个匹配。

字符串方法.match接受正则表达式并返回匹配数组。数组的确切格式会根据正则表达式是否具有g标志而更改。由于我们使用该标志,它返回每个完整匹配的数组(包括封闭的引号),因此您可能希望从每个结果中删除引号。

这是一个有效的例子:

var myParagraph = `
"Of course I’ll go Kate. You should get back to bed. Would you like some Nyquil or Tylenol?"
"Nyquil, please. Here are the questions and my mini-disc recorder. Just press record here. Make notes, I’ll transcribe it all."
"I know nothing about him," I murmur, trying and failing to suppress my rising panic. "The questions will see you through. Go. It’s a long drive. I don’t want you to be late." "Okay, I’m going. Get back to bed. I made you some soup to heat up later." I stare at her fondly. Only for you, Kate, would I do this.`

const rgx = /"(.*?)"/g;

const dialogue = myParagraph
    .match(rgx) // Match using our regex
    .map(result => result.replace(/"/g, "")) // Remove quotes from each result, remove this line if you want to keep the enclosing quotes

console.log(dialogue)

答案 2 :(得分:0)

如果您的段落中不包含newline个字符,则可以执行此操作。

var myParagraph = ' “Of course I’ll go Kate. You should get back to bed. Would you like some Nyquil or Tylenol?” “Nyquil, please. Here are the questions and my mini-disc recorder. Just press record here. Make notes, I’ll transcribe it all.” “I know nothing about him,” I murmur, trying and failing to suppress my rising panic. “The questions will see you through. Go. It’s a long drive. I don’t want you to be late.” “Okay, I’m going. Get back to bed. I made you some soup to heat up later.” I stare at her fondly. Only for you, Kate, would I do this.';

// captures everything between “ and ” except for newline character
// '?' stands for nongreedy search
const regex = /“[^\n]*?”/g;
const result = [];
let match;

// while there is anything to capture, push it to result
while (match = regex.exec(myParagraph)) {
  result.push(match[0]);
}

console.log(result);

可选择如果您要像在问题中一样删除。您可以创建一个捕获组并使用其内容将其推送到结果中。

var myParagraph = ' “Of course I’ll go Kate. You should get back to bed. Would you like some Nyquil or Tylenol?” “Nyquil, please. Here are the questions and my mini-disc recorder. Just press record here. Make notes, I’ll transcribe it all.” “I know nothing about him,” I murmur, trying and failing to suppress my rising panic. “The questions will see you through. Go. It’s a long drive. I don’t want you to be late.” “Okay, I’m going. Get back to bed. I made you some soup to heat up later.” I stare at her fondly. Only for you, Kate, would I do this.';

// here `(, )` defines a capture group
const regex = /“([^\n]*?)”/g;
const result = [];
let match;
while (match = regex.exec(myParagraph)) {
  // note the change: 0 => 1
  result.push(match[1]);
}

console.log(result);