我有多个文件,每个文件都有不同的标题,我想从每个文件中提取标题名称。这是一个文件的示例
//Basic
function x() {
var promise = new Promise(function(resolve, reject) {
setTimeout(function() {
resolve("done!");
});
});
return promise;
}
async function y() {
var y = await x();
console.log("y", y);
}
y();
//Implementation
var azure = require('azure-storage');
const fs = require('fs');
var fileService = azure.createFileService('microsoftdata');
var test = new Promise(function(resolve, reject) {
fileService.getFileToStream('sharename', '', filename, fs.createWriteStream(filename), async function(error, result, response) {
if (!error) {
console.log('result ' + JSON.stringify(result, null, 4));
var bitmap = await fs.readFileSync(filename);
resolve(bitmap.toString('base64'));
} else {
console.log('error - ' + JSON.stringify(error, null, 4));
}
});
});
console.log('test - ' + test);
提取的预期标题为
[1] "<START" "ID=\"CMP-001\"" "NO=\"1\">"
[4] "<NAME>Plasma-derived" "vaccine" "(PDV)"
[7] "versus" "placebo" "by"
[10] "intramuscular" "route</NAME>" "<DIC"
[13] "CHI2=\"3.6385\"" "CI_END=\"0.6042\"" "CI_START=\"0.3425\""
[16] "CI_STUDY=\"95\"" "CI_TOTAL=\"95\"" "DF=\"3.0\""
[19] "TOTAL_1=\"0.6648\"" "TOTAL_2=\"0.50487622\"" "BLE=\"YES\""
.
.
.
[789] "TOTAL_2=\"39\"" "WEIGHT=\"300.0\"" "Z=\"1.5443\">"
[792] "<NAME>Local" "adverse" "events"
[795] "after" "each" "injection"
[798] "of" "vaccine</NAME>" "<GROUP_LABEL_1>PDV</GROUP_LABEL_1>"
[801] "</GROUP_LABEL_2>" "<GRAPH_LABEL_1>" "PDV</GRAPH_LABEL_1>"
请注意,每个文件的标题长度都不同。
答案 0 :(得分:0)
这是使用stringr
的解决方案。这首先将向量折叠成一个长字符串,然后捕获每对\n
和"<NAME>"
之间不是换行符"</NAME>"
的所有单词/字符。将来,如果您创建了reproducible example(例如,使用dput()
),人们将能够更轻松地为您提供帮助。希望这会有所帮助!
注意:如果仅是第一个标题,则可以使用str_match()
代替str_match_all()
。
library(stringr)
str_match_all(paste0(string, collapse = " "), "<NAME>(.*?)</NAME>")[[1]][,2]
[1] "Plasma-derived vaccine (PDV) versus placebo by intramuscular route"
[2] "Local adverse events after each injection of vaccine"
数据:
string <- c("<START", "ID=\"CMP-001\"", "NO=\"1\">", "<NAME>Plasma-derived", "vaccine", "(PDV)", "versus", "placebo", "by", "intramuscular", "route</NAME>", "<DIC", "CHI2=\"3.6385\"", "CI_END=\"0.6042\"", "CI_START=\"0.3425\"", "CI_STUDY=\"95\"", "CI_TOTAL=\"95\"", "DF=\"3.0\"", "TOTAL_1=\"0.6648\"", "TOTAL_2=\"0.50487622\"", "BLE=\"YES\"",
"TOTAL_2=\"39\"", "WEIGHT=\"300.0\"", "Z=\"1.5443\">", "<NAME>Local", "adverse", "events", "after", "each", "injection", "of", "vaccine</NAME>", "<GROUP_LABEL_1>PDV</GROUP_LABEL_1>", "</GROUP_LABEL_2>", "<GRAPH_LABEL_1>", "PDV</GRAPH_LABEL_1>")