我的数据框中有一列如下
Col1
----------------------------------------------------------------------------
Center for Animal Control, Division of Hypertension, Department of Medicine
Department of Surgery, Division of Primary Care, Center for Animal Control
Department of Internal Medicine, Division of Hypertension, Center for Animal Control
如何计算以逗号分隔的字符串数量,换句话说,我想要完成的内容如下所示
Affiliation Freq
------------------------------------------
Center for Animal Control 3
Division of Hypertension 2
Department of Medicine 1
Department of Surgery 1
Division of Primary Care 1
Department of Internal Medicine 1
有人可以帮我解决这个问题吗?
答案 0 :(得分:1)
这是一种方法。同时用逗号替换'\n'
,因为文本中有一些新行。
df <- data.frame(col1 = rep("Center for Animal Control, Division of Hypertension, Department of Medicine, Department of Surgery, Division of Primary Care, Center for Animal Control, Department of Internal Medicine, Division of Hypertension, Center for Animal Control", 1), stringsAsFactors = FALSE)
df$col1 <- gsub('\\n', ', ', df$col1)
as.data.frame(table(unlist(strsplit(df$col1, ', '))))
输出如下(原始数据):
Var1 Freq
1 Center for Animal Control 3
2 Department of Internal Medicine 1
3 Department of Medicine 1
4 Department of Surgery 1
5 Division of Hypertension 2
6 Division of Primary Care 1
答案 1 :(得分:1)
假设:Center for Animal Control, Division of Hypertension, Department of Medicine
为第1行的值,Department of Surgery, Division of Primary Care, Center for Animal Control
为第2行,依此类推。
df
是数据框。
aff_val <- trimws(unlist(strsplit(df$col1,",")))
ans <- data.frame(table(aff_val))
colnames(ans)[1] <- 'Affiliation'
答案 2 :(得分:1)
我使用scan
和trimws
进行这些文字处理任务。
inp <- " Center for Animal Control, Division of Hypertension, Department of Medicine
Department of Surgery, Division of Primary Care, Center for Animal Control
Department of Internal Medicine, Division of Hypertension, Center for Animal Control"
> table( trimws(scan(text=inp, what="", sep=",")))
Read 9 items
Center for Animal Control Department of Internal Medicine
3 1
Department of Medicine Department of Surgery
1 1
Division of Hypertension Division of Primary Care
2 1
还可以围绕该结果包装as.data.frame:
> as.data.frame(table( trimws(scan(text=inp, what="", sep=","))))
Read 9 items
Var1 Freq
1 Center for Animal Control 3
2 Department of Internal Medicine 1
3 Department of Medicine 1
4 Department of Surgery 1
5 Division of Hypertension 2
6 Division of Primary Care 1
答案 3 :(得分:0)
/srv/shiny-server/MyShinyApp
答案是
library(stringr)
a<-"Center for Animal Control, Division of Hypertension, Department of Medicine
Department of Surgery, Division of Primary Care, Center for Animal Control
Department of Internal Medicine, Division of Hypertension, Center for Animal Control"
con<-textConnection(a)
tbl<-read.table(con,sep=",")
vec<-str_trim(unlist(tbl))
as.data.frame(table(vec))
答案 4 :(得分:0)
text = "Center for Animal Control, Division of Hypertension, Department of Medicine
Department of Surgery, Division of Primary Care, Center for Animal Control
Department of Internal Medicine, Division of Hypertension, Center for Animal Control"
library(stringi)
library(dplyr)
library(tidyr)
data_frame(text = text) %>%
mutate(line = text %>% stri_split_fixed("\n") ) %>%
unnest(line) %>%
mutate(phrase = line %>% stri_split_fixed(", ") ) %>%
unnest(phrase) %>%
count(phrase)