Question

我有一个数据集，我称之为sam.data：

dput(sam.data)
structure(list(idn = c(1L, 2L, 3L, 4L, 5L, 6L, 66L, 62L, 7L, 
81L, 68L, 72L), n1 = c(1L, 2L, 3L, 4L, 5L, 6L, 6L, 6L, 7L, 7L, 
7L, 7L), x = c(9.95228, 11.4186, 10.3735, 10.5453, 10.7364, 9.85219, 
9.73307, 9.86304, 9.74097, 9.57359, 9.70899, 9.75185)), .Names = c("idn", 
"n1", "x"), row.names = c(NA, 12L), class = "data.frame")

sam.data

    idn n1     x
1    1  1  9.95228
2    2  2 11.41860
3    3  3 10.37350
4    4  4 10.54530
5    5  5 10.73640
6    6  6  9.85219
7   66  6  9.73307
8   62  6  9.86304
9    7  7  9.74097
10  81  7  9.57359
11  68  7  9.70899
12  72  7  9.75185

对于idn不等于n1，我想创建一个新变量y，其值为x，对应n1，否则我希望它被指定为缺失。

预期输出应如下所示：

   idn n1        x        y
1    1  1  9.95228  
2    2  2 11.41860 
3    3  3 10.37350 
4    4  4 10.54530 
5    5  5 10.73640 
6    6  6  9.85219 
7   66  6  9.73307  9.85219
8   62  6  9.86304  9.85219
9    7  7  9.74097 
10  81  7  9.57359  9.74097
11  68  7  9.70899  9.74097
12  72  7  9.75185  9.74097

我能够在R中生成一个紧密的解决方案：

library(plyr)
sam.data2<-ddply(sam.data,.(n1),transform, y=x[which.min(idn)])
sam.data2
 sam.data2
   idn n1        x        y
1    1  1  9.95228  9.95228
2    2  2 11.41860 11.41860
3    3  3 10.37350 10.37350
4    4  4 10.54530 10.54530
5    5  5 10.73640 10.73640
6    6  6  9.85219  9.85219
7   66  6  9.73307  9.85219
8   62  6  9.86304  9.85219
9    7  7  9.74097  9.74097
10  81  7  9.57359  9.74097
11  68  7  9.70899  9.74097
12  72  7  9.75185  9.74097

但是，我更愿意拥有更优雅的解决方案。

我也在寻找Stata的解决方案。

Answer 1

我不知道你想要的是什么，但只是使用你的输出你可以通过看到x等于y并用""替换它来使它看起来像你想要的输出：

sam.data2$y[sam.data2$x == sam.data2$y] <- ""
sam.data2

## > sam.data2
##    idn n1        x       y
## 1    1  1  9.95228        
## 2    2  2 11.41860        
## 3    3  3 10.37350        
## 4    4  4 10.54530        
## 5    5  5 10.73640        
## 6    6  6  9.85219        
## 7   66  6  9.73307 9.85219
## 8   62  6  9.86304 9.85219
## 9    7  7  9.74097        
## 10  81  7  9.57359 9.74097
## 11  68  7  9.70899 9.74097
## 12  72  7  9.75185 9.74097

这方法很少，取决于你想要使用它的方法取决于采取哪种方法。如果它纯粹是为了美学，那么上面的内容很简单，但现在这个列是字符而不是数字。

Answer 2

使用基础包中的by的另一个选项。

dat$y <- unlist(by(dat,dat$n1,  FUN=    
      function(x){
        res <- ifelse(x$idn==x$n1,
               NA,
               x$x[which.min(x$idn)])
        }))

请注意，结果与所需输出略有不同，因为我使用NA（数字）而不是``这是字符串。

  idn n1        x       y
1    1  1  9.95228      NA
2    2  2 11.41860      NA
3    3  3 10.37350      NA
4    4  4 10.54530      NA
5    5  5 10.73640      NA
6    6  6  9.85219      NA
7   66  6  9.73307 9.85219
8   62  6  9.86304 9.85219
9    7  7  9.74097      NA
10  81  7  9.57359 9.74097
11  68  7  9.70899 9.74097
12  72  7  9.75185 9.74097

Answer 3

Stata解决方案：

capture net install xfill, from(http://www.sealedenvelope.com/)
bys n1: gen y2=x/(idn==n1) 
xfill y2, i(n1) 
replace y2=. if n1==idn

Answer 4

Stata代码可能只是

sort n1, stable
by n1: gen y2 = x[1] if idn != n1

（这是经修订的建议。）

Answer 5

@ Nick的Stata解决方案实际上可以使用bysort：

在一行中完成

clear

input idn n1 x y
1  1  9.95228  9.95228
2  2 11.41860 11.41860
3  3 10.37350 10.37350
4  4 10.54530 10.54530
5  5 10.73640 10.73640
6  6  9.85219  9.85219
66  6  9.73307  9.85219
62  6  9.86304  9.85219
7  7  9.74097  9.74097
81  7  9.57359  9.74097
68  7  9.70899  9.74097
72  7  9.75185  9.74097
end

bysort n1: gen y2 = x[1] if idn != n1

list

     +----------------------------------------+
     | idn   n1         x         y        y2 |
     |----------------------------------------|
  1. |   1    1   9.95228   9.95228         . |
  2. |   2    2   11.4186   11.4186         . |
  3. |   3    3   10.3735   10.3735         . |
  4. |   4    4   10.5453   10.5453         . |
  5. |   5    5   10.7364   10.7364         . |
     |----------------------------------------|
  6. |   6    6   9.85219   9.85219         . |
  7. |  66    6   9.73307   9.85219   9.85219 |
  8. |  62    6   9.86304   9.85219   9.85219 |
  9. |   7    7   9.74097   9.74097         . |
 10. |  81    7   9.57359   9.74097   9.74097 |
     |----------------------------------------|
 11. |  68    7   9.70899   9.74097   9.74097 |
 12. |  72    7   9.75185   9.74097   9.74097 |
     +----------------------------------------+

输入R和Stata中的值

5 个答案: