使用R

时间:2018-08-13 01:56:43

标签: r parsing

我正在从另一个来源将数据导入R中(即,我无法轻易更改传入的格式/值)。

变量中包括一个或多个以下可能值的变量:

  • 母亲(亲生母亲,养母,继母等)
  • 父亲(生物父亲,寄养父亲,继父父亲等)
  • 祖父母(生物学,寄养,继父等)
  • 18岁以上的兄弟
  • 18岁以上的姐妹
  • 其他成年人(阿姨,叔叔等)

全部都在同一个“单元格”中,因此可能的数据如下所示:

示例输入数据帧(df)

df <- read.table(text =
"row lives.with.whom
  1  'Mother (biological mother, foster mother, step mother, etc.), Father (biological father, foster father, step father, etc.), Grandparent(s) (biological, foster, step, etc.), Brother(s) older than 18, Sister(s) older than 18, Other adults (aunts, uncles, etc.)'
  2  ''
  3  'Mother (biological mother, foster mother, step mother, etc.), Sister(s) older than 18'
  4  'Mother (biological mother, foster mother, step mother, etc.), Father (biological father, foster father, step father, etc.)'", header = T)

R中,我如何有效地创建规则以将这些响应解析为单独的列,每种类型的家庭成员都划分为一列,以使输出看起来像这样:

示例输出数据帧

mother <- c(1,0,1,1)
father <- c(1,0,0,1)
adult.brother <- c(1,0,0,0)
adult.sister <- c(1,0,1,0)
grandparent <- c(1,0,0,0)
other.adult <- c(1,0,0,0)
output.df <- cbind(mother, father, adult.brother, adult.sister, grandparent, other.adult)
colnames(output.df) <- c("Mother", "Father", "Brother", "Sister", "Grandparent", "Other adult")
output.df

     Mother Father Brother Sister Grandparent Other adult
[1,]      1      1       1      1           1           1
[2,]      0      0       0      0           0           0
[3,]      1      0       0      1           0           0
[4,]      1      1       0      0           0           0

TIA

3 个答案:

答案 0 :(得分:1)

这是一个tidyverse选项,可以帮助您入门

library(tidyverse)
rel <- list("Mother", "Father", "Brother", "Sister", "Grandparent", "Other adult")
names(rel) <- unlist(rel)
bind_cols(df[, 1, drop = F], map(rel, ~+str_detect(tolower(df[, 2]), tolower(.x))))
#  row Mother Father Brother Sister Grandparent Other adult
#1   1      1      1       1      1           1           1
#2   2      0      0       0      0           0           0
#3   3      1      0       0      1           0           0
#4   4      1      1       0      0           0           0

样本数据

df <- read.table(text =
    "row lives.with.whom
  1  'Mother (biological mother, foster mother, step mother, etc.), Father (biological father, foster father, step father, etc.), Grandparent(s) (biological, foster, step, etc.), Brother(s) older than 18, Sister(s) older than 18, Other adults (aunts, uncles, etc.)'
  2  ''
  3  'Mother (biological mother, foster mother, step mother, etc.), Sister(s) older than 18'
  4  'Mother (biological mother, foster mother, step mother, etc.), Father (biological father, foster father, step father, etc.)'", header = T)

答案 1 :(得分:1)

尝试一下:

rel<-list("Mother", "Father", "Brother", "Sister", "Grandparent", "Other adult")

for(i in 1:6){
  df$i<-if_else(grepl(rel[[i]],df$lives.with.whom),1,0)
  colnames(df)[i+2]<-rel[[i]]
}

答案 2 :(得分:0)

嘿,欢迎来到Stack Overflow!以下是一些有关如何在Stack Overflow上提出更好的问题的链接,以便人们轻松地帮助您(向前)。

  1. how-to-make-a-great-r-reproducible-example
  2. How to create a Minimal, Complete, and Verifiable example
  3. What types of questions should I avoid asking?

谈到您的问题,我做了一些假设并试图解决它。正如莫里斯(Maurits)所提到的,您需要提供一个可复制的示例,以便有人可以给出具体的答案,在此之前,这是我能提出的最佳答案。

library(tidyr)
library(dplyr)
# create nested lists with names of mothers and fathers for two ppl
mother <- list(list("bio_1","step_1","foster_1"), list("bio_2", "stp_2", "foster_2"))
father <- list(list("bio_1", "foster_1", "other_1"), list("bio_2", "stp_2", "foster_2"))

# convert to data frame
test_object <- data_frame(person = c(1,2),mother,father)

# print 
test_object

# A tibble: 2 x 3
  person mother     father    
   <dbl> <list>     <list>    
1      1 <list [3]> <list [3]>
2      2 <list [3]> <list [3]>

# first unnest the lists and get to the inner list
# then convert from wide to long form data
# do another unnnest to get the actual data in the long format
test_object %>%
  unnest(.) %>%
    gather(data = ., key = relationship, value = name, -person) %>%
      unnest() -> test_object

    test_object
# A tibble: 12 x 3
   person relationship name    
    <dbl> <chr>        <chr>   
 1      1 mother       bio_1   
 2      1 mother       step_1  
 3      1 mother       foster_1
 4      2 mother       bio_2   
 5      2 mother       stp_2   
 6      2 mother       foster_2
 7      1 father       bio_1   
 8      1 father       foster_1
 9      1 father       other_1 
10      2 father       bio_2   
11      2 father       stp_2   
12      2 father       foster_2  

这里是指向tidyversedata.table的链接,其中包含许多用于解决大多数数据仓库/争用问题的软件包和功能。