我正在尝试使用' Synth'在R中打包以探讨某些政变对其所在国家的经济增长的影响,但我对我无法理解的错误感到困惑。当我尝试运行dataprep()
时,我得到以下内容:
Error in dataprep(foo = World, predictors = c("rgdpe.pc", "population.ln", :
unit.variable not found as numeric variable in foo.
令人费解,因为我的数据框World
确实包含一个名为" idno"的数字ID。在dataprep()
的调用中指定。
这是我正在使用的脚本。它使用来自GitHub的必要数据摄取.csv。最后一步 - 对dataprep()
的调用---是出现错误的地方。我很感激帮助弄清楚为什么会出现这个错误以及如何避免错误,以便我可以继续synth()
部分跟进。
library(dplyr)
library(Synth)
# DATA INGESTION AND TRANSFORMATION
World <- read.csv("https://raw.githubusercontent.com/ulfelder/coups-and-growth/master/data.raw.csv", stringsAsFactors=FALSE)
World$rgdpe.pc = World$rgdpe/World$pop # create per capita version of GDP (PPP)
World$idno = as.numeric(as.factor(World$country)) # create numeric country id
World$population.ln = log(World$population/1000) # population size in 1000s, logged
World$trade.ln = log(World$trade) # trade as % of GDP, logged
World$civtot.ln = log1p(World$civtot) # civil conflict scale, +1 and logged
World$durable.ln = log1p(World$durable) # political stability, +1 and logged
World$polscore = with(World, ifelse(polity >= -10, polity, NA)) # create version of Polity score that's missing for -66, -77, and -88
World <- World %>% # create clocks counting years since last coup (attempt) or 1950, whichever is most recent
arrange(countrycode, year) %>%
mutate(cpt.succ.d = ifelse(cpt.succ.n > 0, 1, 0),
cpt.any.d = ifelse(cpt.succ.n > 0 | cpt.fail.n > 0, 1, 0)) %>%
group_by(countrycode, idx = cumsum(cpt.succ.d == 1L)) %>%
mutate(cpt.succ.clock = row_number()) %>%
ungroup() %>%
select(-idx) %>%
group_by(countrycode, idx = cumsum(cpt.any.d == 1L)) %>%
mutate(cpt.any.clock = row_number()) %>%
ungroup() %>%
select(-idx) %>%
mutate(cpt.succ.clock.ln = log1p(cpt.succ.clock), # include +1 log versions
cpt.any.clock.ln = log1p(cpt.any.clock))
# THAILAND 2006
THI.coup.year = 2006
THI.years = seq(THI.coup.year - 5, THI.coup.year + 5)
# Get names of countries that had no coup attempts during window analysis will cover. If you wanted to restrict the comparison to a
# specific region or in any other categorical way, this would be the place to do that as well.
THI.controls <- World %>%
filter(year >= min(THI.years) & year <= max(THI.years)) %>% # filter to desired years
group_by(idno) %>% # organize by country
summarise(coup.ever = sum(cpt.any.d)) %>% # get counts by country of years with coup attempts during that period
filter(coup.ever==0) %>% # keep only the ones with 0 counts
select(idno) # cut down to country names
THI.controls = unlist(THI.controls) # convert that data frame to a vector
names(THI.controls) = NULL # strip the vector of names
THI.synth.dat <- dataprep(
foo = World,
predictors = c("rgdpe.pc", "population.ln", "trade.ln", "fcf", "govfce", "energy.gni", "polscore", "durable.ln", "cpt.any.clock.ln", "civtot.ln"),
predictors.op = "mean",
time.predictors.prior = seq(from = min(THI.years), to = THI.coup.year - 1),
dependent = "rgdpe.pc",
unit.variable = "idno",
unit.names.variable = "country",
time.variable = "year",
treatment.identifier = unique(World$idno[World$country=="Thailand"]),
controls.identifier = THI.controls,
time.optimize.ssr = seq(from = THI.coup.year, to = max(THI.years)),
time.plot = THI.years
)
答案 0 :(得分:3)
评论太长了。
您的dplyr
声明:
World <- World %>% ...
将World
从data.frame
转换为tbl_df
对象(阅读dplyr
上的文档)。不幸的是,这导致mode(World[,"idno"])
返回list
,而不是numeric
,而数字unit.variable
的测试失败。
您可以使用
解决此问题`World <- as.data.frame(World)`
在致电dataprep(...)
之前。
不幸的是(再次)你现在得到一个不同的错误,这可能是由于你的dplyr语句的逻辑。