在R中的单独表上联接每个数据框列

时间:2018-07-08 02:10:09

标签: r categorical-data survey

我的调查结果涵盖了大约90列和超过5K行。使用代码键入原始数据(例如1表示“是”,2表示“否”)。每列都有不同数量的因素级别:例如,家庭使用的语言,收入水平等。我如何用整个表上的实际答案替换原始代码?

这是原始数据的结构:

rawsurveydf <- data.frame(Q1_tenant = sample(c(1,2,9), 20, replace=TRUE),
    Q2_income = sample(c(1:9), 20, replace=TRUE), 
    Q3_satisfaction = sample(c(1:4,9), 20, replace=TRUE) )

每个代码的翻译:

Tenantcodes <- data.frame(code=c(1,2,9), Q1_tenant=c("Yes", "No", "Refusal"))
incomecodes <- data.frame(code=c(1:9), Q2_income=as.numeric(c(seq(30000, 110000, by=10^4))))
houssatiscodes <- data.frame(code=c(1:4,9), Q3_satisfaction=c("Strongly disagree", "Disagree", "Agree", "Strongly agree", "Refusal"))

3 个答案:

答案 0 :(得分:1)

您可以使用以下因素:

out <- rawsurveydf
out[] <- Map(function(x,y) factor(x,y$code,y[[2]]),
             rawsurveydf,
             list(Tenantcodes,incomecodes,houssatiscodes))

# out
# Q1_tenant Q2_income   Q3_satisfaction
# 1        Yes     60000 Strongly disagree
# 2         No     90000    Strongly agree
# 3         No     50000             Agree
# 4        Yes     80000           Refusal
# 5    Refusal     70000           Refusal
# 6         No    110000             Agree
# 7        Yes     60000    Strongly agree
# 8         No     40000          Disagree
# 9        Yes    110000 Strongly disagree
# 10       Yes    110000 Strongly disagree
# 11   Refusal     1e+05          Disagree
# 12       Yes     70000    Strongly agree
# 13   Refusal     60000 Strongly disagree
# 14       Yes     40000             Agree
# 15        No     1e+05           Refusal
# 16       Yes     90000           Refusal
# 17        No    110000    Strongly agree
# 18       Yes    110000 Strongly disagree
# 19        No     1e+05           Refusal
# 20        No     90000           Refusal

如果要字符列而不是因子列,请使用as.character(factor(x,y$code,y[[2]]))而不是factor(x,y$code,y[[2]])

答案 1 :(得分:0)

使用(原始)示例中的数据和代码:

Tenantcodes$Q1_tenant <- ifelse(Tenantcodes$Q1_tenant=="Yes",2,1)
rawsurveydf           <- merge(rawsurveydf$Q1_tenant, 
                               Tenantcodes, 
                               by="Q1_tenant", 
                               all.x=T)

当您需要将值映射到2个以上级别时,可以使用其他方法,例如带索引的赋值,gsub等。

如果所有列都具有相同的转换,则应将其转换为一个函数,然后使用applysapply该函数。如果每一列都有不同的自定义映射,那么显然您需要为每一列提供该逻辑。

答案 2 :(得分:0)

您可以尝试的另一种选择是:

#include <iostream>
#include <type_traits>

template <typename F>
struct is_void_int_int : std::false_type {};

// Free function
template <>
struct is_void_int_int<void(int, int)> : std::true_type {};

// Pointer to function
template <>
struct is_void_int_int<void (*)(int, int)> : std::true_type {};

// Reference to function
template <>
struct is_void_int_int<void (&)(int, int)> : std::true_type {};

// Pointer to member function
template <typename C>
struct is_void_int_int<void (C::*)(int, int)> : std::true_type {};

void dosomething(int x, int y);
void dosomethingwront(float x, float y);

struct A {
    void operator()(int, int) {}
};

struct B {
    void bar(int, int) {}
};

int main() {
    static_assert(is_void_int_int<decltype(dosomething)>::value, "!");
    static_assert(is_void_int_int<decltype((dosomething))>::value, "!");
    static_assert(is_void_int_int<decltype(&dosomething)>::value, "!");
    static_assert(is_void_int_int<decltype(&A::operator())>::value, "!");
    static_assert(is_void_int_int<decltype(&B::bar)>::value, "!");
  //static_assert(is_void_int_int<decltype(dosomethingwront)>::value, "!"); // BOOM!
}

这将在保持原始数据的同时为您提供所需的结果。

如果要将其设置为可在所有90列上运行,建议您以一种可以一次实现所有联接的方式设置“键”。这可以通过在要合并的列上给您调用的列library(tidyverse) rawsurveydf <- rawsurveydf %>% left_join(y = Tenantcodes, by = c("Q1_tenant" = "code"), suffix = c("", ".answer")) %>% left_join(y = incomecodes, by = c("Q2_income" = "code"), suffix = c("", ".answer")) %>% left_join(y = houssatiscodes, by = c("Q3_satisfaction" = "code"), suffix = c("", ".answer")) 指定相同的名称,并为答案命名不同的方式来实现。也许像:

code

像这样,我们可以一劳永逸地处理所有联接:

Tenantcodes<-data.frame(Q1_tenant=c(1,2,9), Q1_tenant_answer=c("Yes", "No", "Refusal"))
incomecodes<-data.frame(Q2_income=c(1:9), Q2_income_answer=as.numeric(c(seq(30000, 110000, by=10^4))))
houssatiscodes<-data.frame(Q3_satisfaction=c(1:4,9), Q3_satisfaction_answer=c("Strongly disagree", "Disagree", "Agree", "Strongly agree", "Refusal"))