假设我的数据框看起来像这样:
df1=structure(list(Name = structure(1:6, .Label = c("N1", "N2", "N3",
"N4", "N5", "N6", "N7"), class = "factor"), sector = structure(c(4L,
4L, 4L, 3L, 3L, 2L), .Label = c("other stuff", "Private for-profit, 4-year or above",
"Private not-for-profit, 4-year or above", "Public, 4-year or above"
), class = "factor"), flagship = c(1, 0, 0, 0, 0, 0)), .Names = c("Name",
"sector", "flagship"), row.names = c(NA, 6L), class = "data.frame")
我想创建一个新的因子变量“Sector”。我可以用很多行代码完成它,但我确信有一种更有效的方法。
现在这就是我正在做的事情:
df1$PublicFlag=0
df1$PublicFlag[df1$sector=="Public, 4-year or above" & df1$flagship==1]=1
df1$Public=0
df1$Public[df1$sector=="Public, 4-year or above" & df1$flagship==0]=1
df1$PrivateNP=0
df1$PrivateNP[df1$sector=="Private not-for-profit"]=1
df1$Private4P=0
df1$Private4P[df1$sector=="Private for-profit, 4-year or above"]=1
library(reshape)
df2 = melt(df1, id=c("Name", "sector", "flagship"))
df2 = df2[df2$value==1,c("Name", "sector", "flagship", "variable")]
library(plyr)
df2 = rename(df2, c("variable"="Sector"))
感谢您的帮助!
答案 0 :(得分:3)
这是一个古老的帖子,但我经常偶然发现它。这就是我想提供最新答案的原因。 Version 0.5.0 of dplyr引入了许多有用的向量函数来解决这个问题。
使用case_when()避免ifelse-nesting(从而使许多小猫保持活着):
df1 %>%
mutate(Sector = case_when(
sector=="Public, 4-year or above" & flagship==1 ~ "PublicFlag",
sector=="Public, 4-year or above" & flagship==0 ~ "Public",
sector=="Private not-for-profit" ~ "PrivateNP",
sector=="Private for-profit, 4-year or above" ~ "Private4P"),
Sector = factor(Sector, levels=c("Public","PublicFlag","PrivateNP","Private4P"))
)
使用recode_factor()的字符(或数字)变量生成因子:
df1 %>%
mutate(Sector = recode_factor(sector,
"Public, 4-year or above" = "Public",
"Private not-for-profit" = "PrivateNP",
"Private for-profit, 4-year or above" = "Private4P"))
答案 1 :(得分:2)
尝试:
df1$Sector <- with(df1, c("Private4P", NA, "Public",
"PublicFlag")[as.numeric(factor(1+2*as.numeric(sector)+4*flagship))])
subset(df1, !is.na(Sector))
# Name sector flagship Sector
#1 N1 Public, 4-year or above 1 PublicFlag
#2 N2 Public, 4-year or above 0 Public
#3 N3 Public, 4-year or above 0 Public
#6 N6 Private for-profit, 4-year or above 0 Private4P
答案 2 :(得分:1)
你甚至不需要dplyr
:
df1$Sector <- factor(ifelse(df1$sector=="Public, 4-year or above" & df1$flagship==1, "PublicFlag",
ifelse(df1$sector=="Public, 4-year or above" & df1$flagship==0, "Public",
ifelse(df1$sector=="Private not-for-profit", "PrivateNP",
ifelse(df1$sector=="Private for-profit, 4-year or above", "Private4P", NA)))))
df1
## Name sector flagship Sector
## 1 N1 Public, 4-year or above 1 PublicFlag
## 2 N2 Public, 4-year or above 0 Public
## 3 N3 Public, 4-year or above 0 Public
## 4 N4 Private not-for-profit, 4-year or above 0 <NA>
## 5 N5 Private not-for-profit, 4-year or above 0 <NA>
## 6 N6 Private for-profit, 4-year or above 0 Private4P
如果需要,您可以将NA
替换为最终可能的因子级别
答案 3 :(得分:0)
选择的答案不适用于我正在处理的特定问题,因为我在 TypeError: Cannot read property 'uid' of undefined
at Object.<anonymous> (/user_code/lib/index.js:291:29)
at next (native)
at /user_code/lib/index.js:7:71
at __awaiter (/user_code/lib/index.js:3:12)
at exports.test.functions.firestore.document.onCreate (/user_code/lib/index.js:289:34)
at cloudFunctionNewSignature (/user_code/node_modules/firebase-functions/lib/cloud-functions.js:105:23)
at cloudFunction (/user_code/node_modules/firebase-functions/lib/cloud-functions.js:135:20)
at /var/tmp/worker/worker.js:769:24
at process._tickDomainCallback (internal/process/next_tick.js:135:7);
中分配了数值,并试图为其指定字符级别。我想补充一下我为解决特定问题所做的工作,以备不时之需,以防将来有人发现它有用。
case_when()