根据dplyr中的正则表达式分组

时间:2017-01-19 10:35:56

标签: r

我有一些数据如下

                          X2   Prop
 eosinophilicoesophagitisscop  0.7
                       furrow  7
                       oedema 16
                      oedemat  1
        oesophagealtrachealis  0
                   oesophagit 25
           oesophagitisbiopsi  0.2
       oesophagitisendoscopic  0
             oesophagitiseros  0
            oesophagitisgastr  0
                        plaqu 16

我想分组符合以下内容的字词:

 myNotableWords<-c("oesophagit","oedema","furrow","plaq")

所以我最终得到:

oesophagit  25.9
furrow      7
oedema      17
plaq        16

我尝试了以下

library(dplyr)
mywords<-foo %>%
    group_by(foo[grepl(paste(myNotableWords, collapse='|'), X2,perl=TRUE),])
    summarise(n=n())

但是我收到了错误:

 Error: wrong result size (3), expected 11 or 1

2 个答案:

答案 0 :(得分:3)

我们可以循环显着的单词grep,然后获取“道具”列的sum

v1 <- sapply(myNotableWords, function(x) sum(df1$Prop[grep(x, df1$X2)]))
data.frame(words = names(v1), val = as.vector(v1))
#      words  val
#1 oesophagit 25.9
#2     oedema 17.0
#3     furrow  7.0
#4       plaq 16.0

使用dplyr

提取相关字词后,也可以使用str_extract完成此操作
library(stringr)
library(dplyr)
df1 %>%
   group_by(grp = str_extract(X2, paste(myNotableWords, collapse="|"))) %>% 
   summarise(Prop = sum(Prop)) %>%
   na.omit()
# A tibble: 4 × 2
#        grp  Prop
#       <chr> <dbl>
#1     furrow   7.0
#2     oedema  17.0
#3 oesophagit  25.9
#4       plaq  16.0

data.table

的类似选项
library(data.table)
na.omit(setDT(df1)[, .(Prop = sum(Prop)), 
        .(grp = str_extract(X2, paste(myNotableWords, collapse="|")))])

答案 1 :(得分:2)

import java.awt.*; import java.util.Objects; import javax.swing.*; public class DummyWhiteSpaceTest { public JComponent makeUI(String dummy) { JPanel mainView = new JPanel(new GridBagLayout()); JPanel contents = new JPanel(new GridBagLayout()); GridBagConstraints gbc = new GridBagConstraints(); gbc.insets = new Insets(1, 3, 3, 3); gbc.gridx = 0; gbc.gridy = 0; gbc.ipady = 2; gbc.anchor = GridBagConstraints.EAST; /* Text labels. */ JLabel text1 = new JLabel("Some text: "); contents.add(text1, gbc); gbc.gridy++; JLabel text2 = new JLabel("More text: "); contents.add(text2, gbc); gbc.gridy++; JLabel text3 = new JLabel("Third line: "); contents.add(text3, gbc); gbc.gridx++; gbc.gridy = 0; JTextField textField1 = new JTextField(10); contents.add(textField1, gbc); gbc.gridx++; gbc.gridy++; gbc.gridx--; JTextField textField2 = new JTextField(10); contents.add(textField2, gbc); gbc.gridy++; //@see javax.swing.plaf.basic.BasicComboBoxUI#getDefaultSize() //JLabel sitePass = new JLabel(" "); JLabel sitePass = new JLabel(dummy); sitePass.setFont(new Font("Monospaced", Font.PLAIN, 14)); contents.add(sitePass, gbc); mainView.add(contents); (new Timer(1000, e -> { if (Objects.equals(sitePass.getText(), dummy)) { sitePass.setText("Pushup time"); } else { sitePass.setText(dummy); } })).start(); return mainView; } public static void main(String... args) { EventQueue.invokeLater(() -> { DummyWhiteSpaceTest test = new DummyWhiteSpaceTest(); JPanel p = new JPanel(new GridLayout(1, 2)); p.add(test.makeUI("")); p.add(test.makeUI(" ")); JFrame f = new JFrame(); f.setDefaultCloseOperation(WindowConstants.EXIT_ON_CLOSE); f.getContentPane().add(p); f.setSize(640, 240); f.setLocationRelativeTo(null); f.setVisible(true); }); } } 解决方案,其中包含值得注意的单词作为起始数据,并使用嵌套数据框。不需要分组。

purrr