我的调查结果涵盖了大约90列和超过5K行。使用代码键入原始数据(例如1表示“是”,2表示“否”)。每列都有不同数量的因素级别:例如,家庭使用的语言,收入水平等。我如何用整个表上的实际答案替换原始代码?
这是原始数据的结构:
rawsurveydf <- data.frame(Q1_tenant = sample(c(1,2,9), 20, replace=TRUE),
Q2_income = sample(c(1:9), 20, replace=TRUE),
Q3_satisfaction = sample(c(1:4,9), 20, replace=TRUE) )
每个代码的翻译:
Tenantcodes <- data.frame(code=c(1,2,9), Q1_tenant=c("Yes", "No", "Refusal"))
incomecodes <- data.frame(code=c(1:9), Q2_income=as.numeric(c(seq(30000, 110000, by=10^4))))
houssatiscodes <- data.frame(code=c(1:4,9), Q3_satisfaction=c("Strongly disagree", "Disagree", "Agree", "Strongly agree", "Refusal"))
答案 0 :(得分:1)
您可以使用以下因素:
out <- rawsurveydf
out[] <- Map(function(x,y) factor(x,y$code,y[[2]]),
rawsurveydf,
list(Tenantcodes,incomecodes,houssatiscodes))
# out
# Q1_tenant Q2_income Q3_satisfaction
# 1 Yes 60000 Strongly disagree
# 2 No 90000 Strongly agree
# 3 No 50000 Agree
# 4 Yes 80000 Refusal
# 5 Refusal 70000 Refusal
# 6 No 110000 Agree
# 7 Yes 60000 Strongly agree
# 8 No 40000 Disagree
# 9 Yes 110000 Strongly disagree
# 10 Yes 110000 Strongly disagree
# 11 Refusal 1e+05 Disagree
# 12 Yes 70000 Strongly agree
# 13 Refusal 60000 Strongly disagree
# 14 Yes 40000 Agree
# 15 No 1e+05 Refusal
# 16 Yes 90000 Refusal
# 17 No 110000 Strongly agree
# 18 Yes 110000 Strongly disagree
# 19 No 1e+05 Refusal
# 20 No 90000 Refusal
如果要字符列而不是因子列,请使用as.character(factor(x,y$code,y[[2]]))
而不是factor(x,y$code,y[[2]])
。
答案 1 :(得分:0)
使用(原始)示例中的数据和代码:
Tenantcodes$Q1_tenant <- ifelse(Tenantcodes$Q1_tenant=="Yes",2,1)
rawsurveydf <- merge(rawsurveydf$Q1_tenant,
Tenantcodes,
by="Q1_tenant",
all.x=T)
当您需要将值映射到2个以上级别时,可以使用其他方法,例如带索引的赋值,gsub
等。
如果所有列都具有相同的转换,则应将其转换为一个函数,然后使用apply
或sapply
该函数。如果每一列都有不同的自定义映射,那么显然您需要为每一列提供该逻辑。
答案 2 :(得分:0)
您可以尝试的另一种选择是:
#include <iostream>
#include <type_traits>
template <typename F>
struct is_void_int_int : std::false_type {};
// Free function
template <>
struct is_void_int_int<void(int, int)> : std::true_type {};
// Pointer to function
template <>
struct is_void_int_int<void (*)(int, int)> : std::true_type {};
// Reference to function
template <>
struct is_void_int_int<void (&)(int, int)> : std::true_type {};
// Pointer to member function
template <typename C>
struct is_void_int_int<void (C::*)(int, int)> : std::true_type {};
void dosomething(int x, int y);
void dosomethingwront(float x, float y);
struct A {
void operator()(int, int) {}
};
struct B {
void bar(int, int) {}
};
int main() {
static_assert(is_void_int_int<decltype(dosomething)>::value, "!");
static_assert(is_void_int_int<decltype((dosomething))>::value, "!");
static_assert(is_void_int_int<decltype(&dosomething)>::value, "!");
static_assert(is_void_int_int<decltype(&A::operator())>::value, "!");
static_assert(is_void_int_int<decltype(&B::bar)>::value, "!");
//static_assert(is_void_int_int<decltype(dosomethingwront)>::value, "!"); // BOOM!
}
这将在保持原始数据的同时为您提供所需的结果。
如果要将其设置为可在所有90列上运行,建议您以一种可以一次实现所有联接的方式设置“键”。这可以通过在要合并的列上给您调用的列library(tidyverse)
rawsurveydf <- rawsurveydf %>%
left_join(y = Tenantcodes, by = c("Q1_tenant" = "code"), suffix = c("", ".answer")) %>%
left_join(y = incomecodes, by = c("Q2_income" = "code"), suffix = c("", ".answer")) %>%
left_join(y = houssatiscodes, by = c("Q3_satisfaction" = "code"), suffix = c("", ".answer"))
指定相同的名称,并为答案命名不同的方式来实现。也许像:
code
像这样,我们可以一劳永逸地处理所有联接:
Tenantcodes<-data.frame(Q1_tenant=c(1,2,9), Q1_tenant_answer=c("Yes", "No", "Refusal"))
incomecodes<-data.frame(Q2_income=c(1:9), Q2_income_answer=as.numeric(c(seq(30000, 110000, by=10^4))))
houssatiscodes<-data.frame(Q3_satisfaction=c(1:4,9), Q3_satisfaction_answer=c("Strongly disagree", "Disagree", "Agree", "Strongly agree", "Refusal"))