Question

我有两个数据框。

第一个叫：句子

structure(list(Text = c("This is a pen", "this is a sword", "pen is mightier than a sword"
)), .Names = "Text", row.names = c(NA, -3L), class = "data.frame")

看起来像：

                          Text
1                This is a pen
2              this is a sword
3 pen is mightier than a sword

第二个叫：单词

structure(list(wordvec = c("pen", "sword"), value = c(1, 2)), .Names = c("wordvec", 
"value"), row.names = c(NA, -2L), class = "data.frame")

看起来像：

  wordvec value
1     pen     1
2   sword     2

我必须在句子中搜索wordvec中出现的单词，如果它们存在，我必须返回单词的总和。

所需的输出如下：

                          Text   Value
1                This is a pen      1
2              this is a sword      2
3 pen is mightier than a sword      3

我首先尝试提取句子中的单词$ Text与单词$ wordvec匹配并制作一个向量。我成功了。

library(stringi)

sentence$words <- sapply(stri_extract_all(sentence[[1]],regex='(#?)\\w+'),function(x) paste(x[x %in% words[[1]]],collapse=','))

作为下一步，我尝试获取单词的总和，并创建一个矢量句$ value。我尝试了以下代码

sentence$value <- sum(words$value)[match(sentence$words, words$wordvec)]

Answer 1

我们list＆＃39; wordvec＆＃39;作为单个字符串，然后从＆＃39;文本＆＃39;中提取单词。与match中的模式匹配的列，sum与＆＃39; wordvec＆＃39;获取位置的向量，基于我们得到相应的值＆＃39;来自＆＃39;字样＆＃39;然后我们做library(stringr) sapply(str_extract_all(sentence$Text, paste0('\\b(',paste(words$wordvec, collapse='|'), ')\\b')), function(x) sum(words$value[match(x, words$wordvec)])) #[1] 1 2 3。

strsplit

另一种选择是在转换＆＃39;句子后使用setDT(sentence,..)。 data.frame到data.table（match），sum分词的向量与＆＃39; wordvec＆＃39;，得到相应的＆＃39;值＆＃39;并执行library(data.table) setDT(sentence, keep.rownames=TRUE)[, sum(words$value[match(strsplit(Text, '\\s')[[1]], words$wordvec, nomatch=0)]), by = rn]$V1 #[1] 1 2 3。

<?php
// Your array
$pid = array("id"=>array(
    "098"=> array(
        array("size"=>25,"variant"=>"0925","qty"=>1),
        array("size"=>26,"variant"=>"0926","qty"=>2)
    ),
    "099"=> array(
        array("size"=>25,"variant"=>"0726","qty"=>1)
    )
) );

// The relevant code
foreach ($pid as $id => $items) {
    echo $id . ' = ' . count($items) . '<br />';
    foreach ($items as $key1 => $item) {
        echo 'size of ' . $key1 . ' = ' . count($item) . ' <br />';
    }
}
?>

Answer 2

这是使用for循环的另一个简单解决方案。但是性能可能是个问题。您的数据框：

sentence<-structure(list(Text = c("This is a pen", "this is a sword", "pen is mightier than a sword"
)), .Names = "Text", row.names = c(NA, -3L), class = "data.frame")

words<-structure(list(wordvec = c("pen", "sword"), value = c(1, 2)), .Names = c("wordvec", 
"value"), row.names = c(NA, -2L), class = "data.frame")

创建一个空数据框，其中nrow为wordvec中每个单词的计数。

a<-data.frame(matrix(0, ncol=1, nrow=nrow(sentence)))

现在使用for循环，浏览words中的每个单词，并使用str_count中的stringr在句子中找到它。使用cbind，您可以存储在数据框中重复该单词的次数，以供将来参考。在这种情况下a

for (i in 1:nrow(words))
a<-cbind(a,data.frame(count=str_count(sentence$Text,words$wordvec[i]))*words$value[i])

现在只需使用rowSums

添加行的总和

    data.frame(Text=sentence$Text,Value=rowSums(a))

你会得到：

                          Text Value
1                This is a pen     1
2              this is a sword     2
3 pen is mightier than a sword     3
>

尝试一下：）

匹配来自两个数据帧的文本向量并返回第三个向量的和

2 个答案: