我有一个数据框(df),我在两个不同的公司(公司ID)和他们各自的性别(M或F)中有两年(2006年和2007年)的董事(DirectorID)。
df <-
CompanyID Name Country ISIN Director_2006 Gender_2006 Director_2007 Gender_2007
25830 BANKxxx Austria AT000504 11734844255 M 11734844255 M
25830 BANKxxx Austria AT000504 187836811559 F 5524344997 F
25830 BANKxxx Austria AT000504 5524344997 F 5524354997 M
25830 BANKxxx Austria AT000504 5524354997 M 5742347684 M
25830 BANKxxx Austria AT000504 6613115791 M 40160443378 M
12339 BANKyyy Belgium AT034003 5524344997 M 5524344997 M
12339 BANKyyy Belgium AT034003 5524354997 M 5524354997 M
我想在每个性别列之后添加更多5列,即&#34; Gender_2006&#34;和&#34; Gender_2007&#34;,并提供以下信息:
df_final是我预期的最终输出。
df_final <-
CompanyID Name Country ISIN Director_2006 Gender_2006 F2006 M2006 Findex2006 Fperce2006 Blauindex2006 Director_2007 Gender_2007 F2007 M2007 Findex2007 Fperce2007 Blauindex2007
25830 BANKxxx Austria AT000504 11734844255 M 2 3 1 0.4 0.25 11734844255 M 1 4 1 0.25 0.07
25830 BANKxxx Austria AT000504 187836811559 F NA NA NA NA NA 5524344997 F NA NA NA NA NA
25830 BANKxxx Austria AT000504 5524344997 F NA NA NA NA NA 5524354997 M NA NA NA NA NA
25830 BANKxxx Austria AT000504 5524354997 M NA NA NA NA NA 5742347684 M NA NA NA NA NA
25830 BANKxxx Austria AT000504 6613115791 M NA NA NA NA NA 40160443378 M NA NA NA NA NA
12339 BANKyyy Belgium AT034003 5524344997 M 0 2 0 0 0 5524344997 M 0 2 0 0 0
12339 BANKyyy Belgium AT034003 5524354997 M NA NA NA NA NA 5524354997 M NA NA NA NA NA
拜托,有人可以告诉我吗?感谢。
我的数据
df <- read.table(text =
"CompanyID Name Country ISIN Director_2006 Gender_2006 Director_2007 Gender_2007
25830 BANKxxx Austria AT000504 11734844255 M 11734844255 M
25830 BANKxxx Austria AT000504 187836811559 F 5524344997 F
25830 BANKxxx Austria AT000504 5524344997 F 5524354997 M
25830 BANKxxx Austria AT000504 5524354997 M 5742347684 M
25830 BANKxxx Austria AT000504 6613115791 M 40160443378 M
12339 BANKyyy Belgium AT034003 5524344997 M 5524344997 M
12339 BANKyyy Belgium AT034003 5524354997 M 5524354997 M",
header = T, stringsAsFactors = F)
答案 0 :(得分:1)
dplyr
group_by
子句中的以下内容表示您正在分组的内容,在本例中为companyID。 mutate
将根据您指定的条件创建新行。 select
只是改变了排序。
library(dplyr)
df %>% group_by(CompanyID) %>%
mutate(F2006 = sum(Gender_2006 == "F", na.rm = T),
M2006 = sum(Gender_2006 == "M", na.rm = T),
Findex2006 = as.integer(sum(Gender_2006 == "F", na.rm = T)>0),
Fperce2006 = F2006/(F2006+M2006),
F2007 = sum(Gender_2007 == "F", na.rm = T),
M2007 = sum(Gender_2007 == "M", na.rm = T),
Findex2007 = as.integer(sum(Gender_2007 == "F", na.rm = T)>0),
Fperce2007 = F2007/(F2007+M2007)) %>%
select(-matches("2006|2007"),matches("2006"), matches("2007"))
# A tibble: 8 x 16
# Groups: CompanyID [2]
# CompanyID Name Country ISIN Director_2006 Gender_2006 F2006 M2006 Findex2006 Fperce2006 Director_2007 Gender_2007
# <int> <fct> <fct> <fct> <dbl> <fct> <int> <int> <int> <dbl> <dbl> <fct>
# 1 25830 BANKxxx Austria AT000504 11734844255 M 2 3 1 0.400 11734844255 M
# 2 25830 BANKxxx Austria AT000504 187836811559 F 2 3 1 0.400 5524344997 F
# 3 25830 BANKxxx Austria AT000504 5524344997 F 2 3 1 0.400 5524354997 M
# 4 25830 BANKxxx Austria AT000504 5524354997 M 2 3 1 0.400 5742347684 M
# 5 25830 BANKxxx Austria AT000504 6613115791 M 2 3 1 0.400 40160443378 M
# 6 12339 BANKyyy Belgium AT034003 5524344997 M 0 2 0 0 5524344997 M
# 7 12339 BANKyyy Belgium AT034003 5524354997 M 0 2 0 0 5524354997 M
# 8 12339 BANKyyy Belgium AT034003 NA <NA> 0 2 0 0 NA <NA>
如果您需要除第一行之外的所有NA,您可以将mutate更改为:
F2006 = ifelse(row_number()==1,sum(Gender_2006 == "F", na.rm = T),NA),