Question

我有以下国家/地区的面板数据：https://docs.google.com/spreadsheets/d/1ZB5po_f9srk-u8OGTA6O5P23XtbZCUGyjrLU6I2glrg/edit?usp=sharing

基本上：

country  year   x   y    z 
a        1991  ##   ##  ##
b        1991  ##   ##  ##
c        1991  ##   ##  ##
d        1991  ##   ##  ##
a        1992  ##   ##  ##
b        1992  ##   ##  ##

我想基于对所有观察重复的国家 c 值 x 创建一个新变量，理想情况是：

country  year   x   y    z  new
a        1991  ##   ##  ##  1
b        1991  ##   ##  ##  1
c        1991  1    ##  ##  1
d        1991  ##   ##  ##  1
a        1992  ##   ##  ##  2
b        1992  ##   ##  ##  2
c        1992  2    ##  ##  2

我已经使用mutate创建了变量，目前我有类似的东西：

df <- df %>%
  mutate(new = country %in% ifelse("c", x, )

但是我找不到正确的语法。如果有任何建议，我将不胜感激。我最初尝试创建一个新的数据框并使用left_join;但是，它创造了许多新的观察结果。如果可以的话，我也会有兴趣。

谢谢！

更新：

我能够使用以下方法解决变通方法：

Panel <- Panel %>%
  mutate(China_NGDP_bnYuan1 = ifelse(Country == "China", Nominal_gdp, 0)) %>%
  group_by(Year) %>%
  mutate(China_NGDP_bnYuan = sum(China_NGDP_bnYuan1, na.rm = TRUE)) %>%
  ungroup()

但是，可能会有一些更干净的方法来达到相同的结果。

第二次更新 看起来我也可以通过使用join来获得所需的结果

首先创建一个仅包含国家c值的新df：

c_x <- df %>%
  filter(Country == "c")
c_x <- c_x %>% select(Year, x)

然后使用left_join：

library(tidyverse)
library(dplyr)

newdf <- left_join(df, c_x, by = "Year")

Answer 1

我们可以使用fill。如果{x}列中的##是NA，则按'year'和fill进行分组，其中.direction指定为“ updown”

library(dplyr)
library(tidyr)
df %>%
      mutate(new = x) %>%
      group_by(year) %>%
      fill(new, .direction = "updown")

Answer 2

您可以将数据ExpansionTile(tilePadding: EdgeInsets.zero, title: Row(children: [ // Icon View, // SizedBox of desired width, // Text View ]), children: ... ,) arrange和year country并在每次数据中的第一个值出现时递增计数。

library(dplyr)

Panel %>%
  arrange(year, country) %>%
  mutate(new = cumsum(country == first(country))) -> Panel
Panel

R：创建新变量，该变量等于对另一个变量（特定年份）的特定观测值

2 个答案: