问:如何根据R

时间:2017-01-14 10:15:25

标签: r loops group-by dataset

我想弄清楚如何在R中生成一个新列,说明政治家是否"我"对于特定的立法机构而言仍然存在于同一方或缺陷中#14; l"。这些政治家和政党因索引而得到认可。以下是我的数据最初的示例:

## example of data

names <- c("Jesus Martinez", "Anrita blabla", "Paco Pico", "Reiner Steingress", "Jesus Martinez Porras")
Parti.affiliation <- c("Winner","Winner","Winner", "Loser", NA)#NA, "New party", "Loser", "Winner", NA
Legislature <- c(rep(1, 5), rep(2,5), rep(3,5), rep(4,5), rep(5,5), rep(6,5))
selection <- c(rep("majority", 15), rep("PR", 15))
sex<- c("Male", "Female", "Male", "Female", "Male")
Election<- c(rep(1955, 5), rep(1960, 5), rep(1965, 5), rep(1970,5), rep(1975,5), rep(1980,5))

d<- data.frame(names =factor(rep(names, 6)), party.affiliation = c(rep(Parti.affiliation,5), NA, "New party", "Loser", "Winner", NA), legislature = Legislature, selection = selection, gender =rep(sex, 6), Election.date = Election)

## genrating id for politician and party.affiliation

d$id_pers<- paste(d$names, sep="")
d <- arrange(d, id_pers)
d <- transform(d, id_pers = as.numeric(factor(id_pers)))
d$party.affiliation1<- as.numeric(d$party.affiliation)

预期结果应显示以下内容:如果政治家(通过专栏&#34; id_pers&#34;显示)在#34; party.affiliation1&#34;列中更改了他们的值,则值1将在一个名为&#34; switch&#34;的新列中分配,否则为0.对数据集中的每个政治家都应该执行相同的程序,因此预期结果应该是这样的:

d["switch"]<- c(1, rep(0,4), NA, rep(0,6), rep(NA, 6),1, rep(0,5), rep (0,5),1) # 0= remains in the same party / 1= switch party affiliation.

例如,你可以在这个数据框架中看到,第一位被称为“#34; Anrita blabla&#34;”的政治家是该党的候选人。从第1至第5届立法机关。但是,我们可以观察到&#34; Anrita&#34;改变了她在第六届立法机构中的党派关系,因此她是该党的候选人#2;&#39;。因此,新栏&#34;开关&#34;应该包含值&#39; 1&#39;反映Anrita改变党派关系,以及&#39; 0&#39;表明&#34; Anrita&#34;没有改变她在前5个立法机构中的党派关系。

我尝试了几种方法(例如循环)。我发现这个策略最简单,但它不起作用:(

## add a new column based on raw values 
  ind <- c(FALSE, party.affiliation1[-1L]!= party.affiliation1[-length(party.affiliation1)] & party.affiliation1!= 'Null')
  d <- d %>% group_by(id_pers) %>% mutate(this = ifelse(ind, 1, 0)) 

我希望你能清楚地找到这个解释。在此先感谢!!!

2 个答案:

答案 0 :(得分:1)

我认为你可以做到:

library(tidyverse)
d%>%
  group_by(id_pers)%>%
  mutate(switch=as.numeric((party.affiliation1-lag(party.affiliation1)!=0)))

第一个条目将是NA,因为我们没有关于他们之前(如果有)党派关系是否不同的信息。

修改:我们使用嵌套default=的{​​{1}} lag()参数来区分第一个值。

ifelse()

答案 1 :(得分:1)

另一种方法,使用data.table

library(data.table)

# Convert to data.table
d <- as.data.table(d)

# Order by election date
d <- d[order(Election.date)]

# Get the previous affiliation, for each id_pers
d[, previous_party_affiliation := shift(party.affiliation), by = id_pers]

# If the current affiliation is different from the previous one, set to 1
d[, switch := ifelse(party.affiliation != previous_party_affiliation, 1, 0)] 

# Remove the column
d[, previous_party_affiliation := NULL]

正如Haboryme指出的那样,由于缺乏先前选举的信息,每个人的第一次入境将是NA。结果就是这样:

                    names party.affiliation legislature selection gender Election.date id_pers party.affiliation1 switch
 1:         Anrita blabla            Winner           1  majority Female          1955       1                 NA     NA
 2:         Anrita blabla            Winner           2  majority Female          1960       1                 NA      0
 3:         Anrita blabla            Winner           3  majority Female          1965       1                 NA      0
 4:         Anrita blabla            Winner           4        PR Female          1970       1                 NA      0
 5:         Anrita blabla            Winner           5        PR Female          1975       1                 NA      0
 6:         Anrita blabla         New party           6        PR Female          1980       1                 NA      1

(...)

EDITED

为了识别政治联盟的第一个条目并为其分配值99,您可以使用此修改版本:

# Note the "fill" parameter passed to the function shift
d[, previous_party_affiliation := shift(party.affiliation, fill = "First"), by = id_pers]

# Set 99 to the first occurrence
d[, switch := ifelse(party.affiliation != previous_party_affiliation, ifelse(previous_party_affiliation == "First", 99, 1), 0)]