我有一个数据集,看起来像下面的一千行:
<?php
for ($i = 0; $i < 7; $i++) {
$runtime = new \parallel\Runtime();
$runtimes[] = $runtime;
echo "starting thread $i from main thread" . PHP_EOL;
$future = $runtime->run(function($i){
$nbtot = 0;
echo "I am thread $i " . PHP_EOL;
for ($j = 0; $j < 5; $j++) {
echo "thread $i in loop $j " . PHP_EOL;
$nbsec = rand(0, 10);
$nbtot = $nbtot + $nbsec;
sleep($nbsec);
}
return array($i, $nbtot); //returning an array to the main thread
}, array($i)); //passing argument to the closure
$futures[] = $future;
}
$ct = count($futures);
while ( $ct > 0 ) {
echo "$ct active threads" . PHP_EOL;
foreach ($futures as $key => $future) {
if ($future->done()) {
print_r($future->value());
unset($futures[$key]);
}
}
sleep(2);
$ct = count($futures);
}
?>
dat = c("Speaker 1: ONE TWO THREE | Speaker 2: FOUR FIVE SIX SEVEN | Speaker 1: EIGHT NINE TEN | Speaker 2: ELEVEN* TWELVE THIRTEEN | Speaker 1: FOURTEEN FIFTEEN","Speaker 1: ONE TWO")
我正在尝试使其看起来像以下内容:
dat = tolower(dat) #lowercase
dat = gsub("\\*","",dat) #strip asterisks
也就是说,我想删除说话者1的所有内容,删除星号,更改句子大小写的其余内容,并在每个语句的末尾加句号。
我们将为您提供任何帮助,尤其是如果此解决方案存在于此并且我找不到它。
答案 0 :(得分:1)
使用基数R您可以:
a = gsub(".*?2:\\s*([^|]*)\\b|(?:(?!Speaker 2).)*","\\L\\1. ", dat, perl = T)
b = gsub("\\*", "", sub("(?|(?<=^)|(?<=\\W))\\W*$", '', a, perl = T))
`is.na<-`(b,nchar(b)==0)
[1] "four five six seven. eleven twelve thirteen."
[2] NA
答案 1 :(得分:0)
由于您需要在同一个对象上执行许多操作,并且需要str_trim
函数才能最好地使用tidyverse
:
library(tidyverse)
dat = tolower(dat) #lowercase
dat = gsub("\\*","",dat) #strip asterisks
res2 <- strsplit(dat, "\\|") %>%
lapply(function(elt) str_trim(elt[!grepl("speaker 1", elt)])) %>%
lapply(gsub, pattern = "speaker +[[:digit:]] *: *", replacement ="") %>%
lapply(function(elt) if (length(elt)) paste0(elt, ".")) %>%
lapply(str_trim) %>%
lapply(paste0, collapse = " ") # split every string in the vector on
# the occurrences of "|" and make transformations
res2[nchar(res2) == 0] <- NA_character_ # set empty strings NA
resvec <- unlist(res2) #turn the result into a vector again
resvec
[1] "four five six seven. eleven twelve thirteen." NA