我正在努力学习如何在R中对句子进行分类。
我有一个包含以下格式句子的文本文件:
dataset$text = When Anna left Inspector Aziz , she was much happier
dataset$label = happy
我打算用以下方式标记句子:
rl <- readLines('sentences.txt')
我想提取句子并用情感标记它们。我该怎么做呢?我知道我应该在正则表达式中使用分组,但我不知道如何在R中执行此操作。我是新手并且正在学习。
public List<T> FilterPagerGroup<T, TType>(Expression<Func<TEntity, bool>> where, Expression<Func<TEntity, T>> select, int skip, int take, Expression<Func<TEntity, TType>> orderBy, Expression<Func<TEntity, TType>> groupBy)
{
List<T> result;
result = EntitySet.Where(where).GroupBy(groupBy).OrderBy(orderBy).Skip(skip).Take(take).Select(select).ToList();
return result;
}
答案 0 :(得分:0)
目前这是格式错误的XML,如
\h
解析为转义字符,除非您添加额外的反斜杠以逃避反斜杠本身。 paste
)。 如果您的实际数据格式不正确,那么您可以使用xml2或XML包进行解析。我喜欢purrr::map_df
迭代节点并将结果强制转换为data.frame,但如果您愿意,可以在基数R中执行相同的操作。
library(xml2)
library(purrr)
'<happy>
This did the trick : the boys now have a more distant friendship and David is much happier .
</happy>
<happy>
When Anna left Inspector Aziz , she was much happier .
</happy>' %>%
paste('<sent>', ., '</sent>') %>% # add enclosing tags
read_xml() %>%
xml_find_all('//text()/parent::*') %>% # select nodes that are parents of text
map_df(~list(text = xml_text(.x, trim = TRUE),
emotion = xml_name(.x)))
## # A tibble: 2 × 2
## text emotion
## <chr> <chr>
## 1 This did the trick : the boys now have a more distant friendship and David is much happier . happy
## 2 When Anna left Inspector Aziz , she was much happier . happy