我有5个数据帧,我只需要分析第一列。从这些中,我必须获得其常用词的频率表(不一定是所有数据帧的频率表,例如,一个词可以仅出现在两个或多个数据帧中)。
然后我必须获得所有数据帧的通用词频表
我刚刚尝试过for循环,但是我看起来非常复杂。此外,数据帧具有不同的尺寸。我找不到任何有用的功能。
然后我尝试做
lst1 <- list(a,b,c,d,e)
newdat <- stack(setNames(lapply(lst1, "[", 1), seq_along(lst1)))[2:1]
library(dplyr)
newdat %>% group_by(val) %>% filter(uniqueN(ind) > 1) %>% count(val)
但这给我一个错误
> stack(setNames(lapply(lst1, "[", 1), seq_along(lst1)))
Error in stack.default(setNames(lapply(lst1, "[", 1), seq_along(lst1))):
at least one vector element is required
谢谢
答案 0 :(得分:0)
这是我使用purrr和dplyr的解决方案:
library(purrr)
library(dplyr)
lst1 <- list(mtcars=mtcars, iris=iris, chick=chickwts, cars=cars, airqual=airquality)
lst1 %>%
map_dfr(select, value=1, .id="df") %>% # select first column of every dataframe and name it "value"
group_by(value) %>%
summarise(freq=n(), # frequency over all dataframes
n_df=n_distinct(df), # number of dataframes this value ocurrs
dfs = paste(unique(df), collapse=",")) %>%
filter(n_df > 1) %>%
filter(n_df == 5) # if value has to be in all 5 dataframes