具有这样的数据框:
df <- data.frame(id = c(1,2,3,4,5), keywords = c("google, yahoo, air, cookie", "cookie, air", "air, cookie", "google", "yahoo, google"))
如何提取表
df_binary_exist <- data.frame(id = c(1,2,3,4,5), google = c(1,0,0,1,1), yahoo = c(1,0,0,0,1), air = c(1,1,1,0,0), cookie = c(1,1,1,0,0))
df_binary_exist
id google yahoo air cookie
1 1 1 1 1 1
2 2 0 0 1 1
3 3 0 0 1 1
4 4 1 0 0 0
5 5 1 1 0 0
从这张表中找到最频繁的夫妻?
df_frequency <- data.frame(couple = c("yahoo-google", "cookie-air"), freq = c(2,3))
df_frequency
couple freq
1 yahoo-google 2
2 cookie-air 3
答案 0 :(得分:2)
第一部分可以通过使用E/webviewchromiumloader: Failed to open relro file /data/misc/shared_relro/libwebviewchromium64.relro: No such file or directory
E/dex2oat: Failed to create oat file: /data/dalvik-cache/arm64/data@app@com.google.android.webview-1@base.apk@classes.dex: Permission denied
E/cr_LibraryLoader: Unable to load library: webviewchromium
E/WebViewFactory: error instantiating provider
Binary XML file line #7: Binary XML file line #7: Error inflating class android.webkit.WebView
,separate_rows
和count
spread
第二部分,我使用了基本的R方法,首先我们基于每两个元素的library(dplyr)
library(tidyr)
df1 <- df %>% separate_rows(keywords)
df1 %>%
dplyr::count(id, keywords) %>%
spread(keywords, n, fill = 0)
# id air cookie google yahoo
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 1 1 1 1
#2 2 1 1 0 0
#3 3 1 1 0 0
#4 4 0 0 1 0
#5 5 0 0 1 1
,split
组合keywords
id
,然后使用{{ 1}}。
paste
答案 1 :(得分:2)
一种tidyverse
可能是:
df %>%
mutate(keywords = strsplit(keywords, ", ", fixed = TRUE)) %>%
unnest() %>%
full_join(df %>%
mutate(keywords = strsplit(keywords, ", ", fixed = TRUE)) %>%
unnest(), by = c("id" = "id")) %>%
filter(keywords.x != keywords.y) %>%
count(keywords.x, keywords.y) %>%
transmute(keywords = paste(pmax(keywords.x, keywords.y), pmin(keywords.x, keywords.y), sep = "-"),
n) %>%
distinct(keywords, .keep_all = TRUE)
keywords n
<chr> <int>
1 cookie-air 3
2 google-air 1
3 yahoo-air 1
4 google-cookie 1
5 yahoo-cookie 1
6 yahoo-google 2
它首先在,
上拆分“关键字”列,然后对其进行完全连接。其次,它过滤掉值与OP对值对相同的行。第三,它计算成对出现的次数。最后,它创建成对的有序变量,并仅基于该变量保留不同的行。
或使用separate_rows()
相同:
df %>%
separate_rows(keywords) %>%
full_join(df %>%
separate_rows(keywords), by = c("id" = "id")) %>%
filter(keywords.x != keywords.y) %>%
count(keywords.x, keywords.y) %>%
transmute(keywords = paste(pmax(keywords.x, keywords.y), pmin(keywords.x, keywords.y), sep = "-"),
n) %>%
distinct(keywords, .keep_all = TRUE)
答案 2 :(得分:1)
我们可以轻松地做到这一点
library(qdapTools)
cbind(df[1], mtabulate(strsplit(as.character(df$keywords), ", ")))
# id air cookie google yahoo
#1 1 1 1 1 1
#2 2 1 1 0 0
#3 3 1 1 0 0
#4 4 0 0 1 0
#5 5 0 0 1 1