我想弄清楚如何在R中生成一个新列,说明政治家是否"我"对于特定的立法机构而言仍然存在于同一方或缺陷中#14; l"。这些政治家和政党因索引而得到认可。以下是我的数据最初的示例:
## example of data
names <- c("Jesus Martinez", "Anrita blabla", "Paco Pico", "Reiner Steingress", "Jesus Martinez Porras")
Parti.affiliation <- c("Winner","Winner","Winner", "Loser", NA)#NA, "New party", "Loser", "Winner", NA
Legislature <- c(rep(1, 5), rep(2,5), rep(3,5), rep(4,5), rep(5,5), rep(6,5))
selection <- c(rep("majority", 15), rep("PR", 15))
sex<- c("Male", "Female", "Male", "Female", "Male")
Election<- c(rep(1955, 5), rep(1960, 5), rep(1965, 5), rep(1970,5), rep(1975,5), rep(1980,5))
d<- data.frame(names =factor(rep(names, 6)), party.affiliation = c(rep(Parti.affiliation,5), NA, "New party", "Loser", "Winner", NA), legislature = Legislature, selection = selection, gender =rep(sex, 6), Election.date = Election)
## genrating id for politician and party.affiliation
d$id_pers<- paste(d$names, sep="")
d <- arrange(d, id_pers)
d <- transform(d, id_pers = as.numeric(factor(id_pers)))
d$party.affiliation1<- as.numeric(d$party.affiliation)
预期结果应显示以下内容:如果政治家(通过专栏&#34; id_pers&#34;显示)在#34; party.affiliation1&#34;列中更改了他们的值,则值1将在一个名为&#34; switch&#34;的新列中分配,否则为0.对数据集中的每个政治家都应该执行相同的程序,因此预期结果应该是这样的:
d["switch"]<- c(1, rep(0,4), NA, rep(0,6), rep(NA, 6),1, rep(0,5), rep (0,5),1) # 0= remains in the same party / 1= switch party affiliation.
例如,你可以在这个数据框架中看到,第一位被称为“#34; Anrita blabla&#34;”的政治家是该党的候选人。从第1至第5届立法机关。但是,我们可以观察到&#34; Anrita&#34;改变了她在第六届立法机构中的党派关系,因此她是该党的候选人#2;&#39;。因此,新栏&#34;开关&#34;应该包含值&#39; 1&#39;反映Anrita改变党派关系,以及&#39; 0&#39;表明&#34; Anrita&#34;没有改变她在前5个立法机构中的党派关系。
我尝试了几种方法(例如循环)。我发现这个策略最简单,但它不起作用:(
## add a new column based on raw values
ind <- c(FALSE, party.affiliation1[-1L]!= party.affiliation1[-length(party.affiliation1)] & party.affiliation1!= 'Null')
d <- d %>% group_by(id_pers) %>% mutate(this = ifelse(ind, 1, 0))
我希望你能清楚地找到这个解释。在此先感谢!!!
答案 0 :(得分:1)
我认为你可以做到:
library(tidyverse)
d%>%
group_by(id_pers)%>%
mutate(switch=as.numeric((party.affiliation1-lag(party.affiliation1)!=0)))
第一个条目将是NA,因为我们没有关于他们之前(如果有)党派关系是否不同的信息。
修改:我们使用嵌套default=
的{{1}} lag()
参数来区分第一个值。
ifelse()
答案 1 :(得分:1)
另一种方法,使用data.table
:
library(data.table)
# Convert to data.table
d <- as.data.table(d)
# Order by election date
d <- d[order(Election.date)]
# Get the previous affiliation, for each id_pers
d[, previous_party_affiliation := shift(party.affiliation), by = id_pers]
# If the current affiliation is different from the previous one, set to 1
d[, switch := ifelse(party.affiliation != previous_party_affiliation, 1, 0)]
# Remove the column
d[, previous_party_affiliation := NULL]
正如Haboryme指出的那样,由于缺乏先前选举的信息,每个人的第一次入境将是NA。结果就是这样:
names party.affiliation legislature selection gender Election.date id_pers party.affiliation1 switch
1: Anrita blabla Winner 1 majority Female 1955 1 NA NA
2: Anrita blabla Winner 2 majority Female 1960 1 NA 0
3: Anrita blabla Winner 3 majority Female 1965 1 NA 0
4: Anrita blabla Winner 4 PR Female 1970 1 NA 0
5: Anrita blabla Winner 5 PR Female 1975 1 NA 0
6: Anrita blabla New party 6 PR Female 1980 1 NA 1
(...)
EDITED
为了识别政治联盟的第一个条目并为其分配值99,您可以使用此修改版本:
# Note the "fill" parameter passed to the function shift
d[, previous_party_affiliation := shift(party.affiliation, fill = "First"), by = id_pers]
# Set 99 to the first occurrence
d[, switch := ifelse(party.affiliation != previous_party_affiliation, ifelse(previous_party_affiliation == "First", 99, 1), 0)]