I need to create a corpus from a huge dataframe (or any python equivalent to the r dataframe) by splitting it in so many dataframes as the usernames.
For example I start from a dataframe like this:
username search_term
name_1 "some_text_1"
name_1 "some_text_2"
name_2 "some_text_3"
name_2 "some_text_4"
name_3 "some_text_5"
name_3 "some_text_6"
name_3 "some_text_1"
[...]
name_n "some_text_n-1"
And I want to obtain:
data frame 1
username search_term
name_1 "some_text_1"
name_1 "some_text_2"
data frame 2
username search_term
name_2 "some_text_3"
name_2 "some_text_4"
And so on..
I already asked this question for R, but now I realised that using the python NLTK could be an advantage for me. I found out that in R i can create a virtual corpus. Is it the same in python? Or is there another way to solve this problem in python?
To see how I solved this problem in R see:
Split a huge dataframe in many smaller dataframes to create a corpus in r
答案 0 :(得分:-1)
这是你在R
中的解决方案我创建了一个类似的data.frame df
df <- data.frame(group = rep(1:6, each = 2) , value = 1:12)
以下是未来小型data.frames的组和名称索引
idx <- unique(df$group)
nms <- paste0('df', idx)
接下来,在for
循环中,我创建了这些小型data.frames
for(i in idx){
df_tmp <- df[df$group == i, ]
do.call('<-', list(nms[i], df_tmp))
}