Question

我有两个数据帧，一个带有我的数据（data），另一个带有查询表（lookup）。数据包括名为claims的列；其单元格中填充着一个或多个代码，用于标识在特定案件中提出的法律要求的类型（每一行代表一个案件）。多种类型的索赔用分号分隔。

lookup数据帧具有三列：code，category和so_category。 code列列出了claims的{{1}}列中使用的每个唯一声明代码。 data包含我分配给该声明的类别，category分配了特定so_category所适合的更高级别的类别。

我想做的是向category和data的{{1}}中添加一列，根据是否有{{1} }（分别对应于category和so_category）。

以下是我的数据框的示例：

claims

category

所以我想以编程方式生成的内容是：

so_category

我对R还是很陌生，很茫然地想出如何做到这一点-任何指导都将不胜感激！

Answer 1

在基数R中，我们可以找到所有需要匹配的unique so_category（all_category）。将claims和;上的match与code中的lookup分开，得到相应的so_category并给出1/0值基于all_category中类别的存在/不存在。

all_category <- unique(lookup$so_category)

data[all_category] <- t(sapply(strsplit(data$claims, ";"), function(x)
          as.integer(all_category %in% lookup$so_category[match(x, lookup$code)])))

data
#  Case                     claims f_statute st_statute common_law
#1    1              wiretap;fdcpa         1          0          1
#2    2              ca_ucl;comlaw         0          1          1
#3    3 tort;comlaw;wiretap;ca_ucl         1          1          1

数据

data <- structure(list(Case = 1:3, claims = c("wiretap;fdcpa", 
"ca_ucl;comlaw", "tort;comlaw;wiretap;ca_ucl")), 
row.names = c(NA, -3L), class = "data.frame")

lookup <- structure(list(code = c("wiretap", "fdcpa", "ca_ucl", "comlaw", 
"tort"), category = c("f_wiretap", "f_con_prot", "st_con_prot", 
"com_law", "com_law"), so_category = c("f_statute", "f_statute", 
"st_statute", "common_law", "common_law")), row.names = c(NA, 
-5L), class = "data.frame")

Answer 2

这是tidyverse的一个选项，其中我们用;分隔定界符separate_rows的'claims'列，然后使用（left_join）进行联接（spread） 'lookup'数据集，在获取distinct行并将其输出与原始数据集连接后，将其library(tidyverse) data %>% separate_rows(claims, sep=";") %>% left_join(lookup, by = c("claims" = "code")) %>% select(-claims, -category) %>% distinct(Case, so_category) %>% mutate(val = 1) %>% spread(so_category, val, fill = 0) %>% right_join(data) %>% select(names(data), everything()) # Case claims common_law f_statute st_statute #1 1 wiretap;fdcpa 0 1 0 #2 2 ca_ucl;comlaw 1 0 1 #3 3 tort;comlaw;wiretap;ca_ucl 1 1 1转换为'wide'格式

data <- structure(list(Case = 1:3, claims = c("wiretap;fdcpa", 
"ca_ucl;comlaw", "tort;comlaw;wiretap;ca_ucl")), 
row.names = c(NA, -3L), class = "data.frame")

lookup <- structure(list(code = c("wiretap", "fdcpa", "ca_ucl", "comlaw", 
"tort"), category = c("f_wiretap", "f_con_prot", "st_con_prot", 
"com_law", "com_law"), so_category = c("f_statute", "f_statute", 
"st_statute", "common_law", "common_law")), row.names = c(NA, 
-5L), class = "data.frame")

数据

{{1}}

如何基于一列的部分与另一数据框中的值的匹配来填充R中的列

2 个答案:

数据