如何使用索引的另一个变量在数据框中添加变量?

时间:2016-05-16 18:43:31

标签: r

数据

我有一个带有变量“Tim”的数据框“veh”:

> dput(veh$Tim)
c(169.7, 169.8, 169.9, 170, 170.1, 170.2, 170.3, 170.4, 170.5, 
170.6, 170.7, 170.8, 170.9, 171, 171.1, 171.2, 171.3, 171.4, 
171.5, 171.6, 171.7, 171.8, 171.9, 172, 172.1, 172.2, 172.3, 
172.4, 172.5, 172.6, 172.7, 172.8, 172.9, 173, 173.1, 173.2, 
173.3, 173.4, 173.5, 173.6, 173.7, 173.8, 173.9, 174, 174.1, 
174.2, 174.3, 174.4, 174.5, 174.6, 174.7, 174.8, 174.9, 175, 
175.1, 175.2, 175.3, 175.4, 175.5, 175.6, 175.7, 175.8, 175.9, 
176, 176.1, 176.2, 176.3, 176.4, 176.5, 176.6, 176.7, 176.8, 
176.9, 177, 177.1, 177.2, 177.3, 177.4, 177.5, 177.6, 177.7, 
177.8, 177.9, 178, 178.1, 178.2, 178.3, 178.4, 178.5, 178.6, 
178.7, 178.8, 178.9, 179, 179.1, 179.2, 179.3, 179.4, 179.5, 
179.6, 179.7, 179.8, 179.9, 180, 180.1, 180.2, 180.3, 180.4, 
180.5, 180.6, 180.7, 180.8, 180.9, 181, 181.1, 181.2, 181.3, 
181.4, 181.5, 181.6, 181.7, 181.8, 181.9, 182, 182.1, 182.2, 
182.3, 182.4, 182.5, 182.6, 182.7, 182.8, 182.9, 183, 183.1, 
183.2, 183.3, 183.4, 183.5, 183.6, 183.7, 183.8, 183.9, 184, 
184.1, 184.2, 184.3, 184.4, 184.5, 184.6, 184.7, 184.8, 184.9, 
185, 185.1, 185.2)

另外,我有一个矢量“slopezz”:

> slopezz
 [1] -2.1920  0.7034  0.6113 -1.2540  0.7513  2.3250  0.0791 -0.9713  1.1010  1.9490
[11] -1.4290  2.2500  0.8775

和另一个单列数据框,“x”:

> x
            psi
psi1.Tim  171.4
psi2.Tim  171.8
psi3.Tim  175.1
psi4.Tim  175.7
psi5.Tim  176.3
psi6.Tim  177.8
psi7.Tim  178.7
psi8.Tim  180.1
psi9.Tim  181.5
psi10.Tim 182.4
psi11.Tim 183.8
psi12.Tim 184.8

目标

“slopezz”中有13个值,x$psi中有12个值。在数据框“veh”中,我想添加一个新列“斜率”,其中包含来自“slopezz”但来自x$psi的索引的值。

实施例

“slopezz”中的第一个值是-2.1920,而x$psi中的第一个值是171.4。 x$psi对应veh$Tim。因此,在169.7(veh$Time中的第一个值)和171.4之间,“slope”变量包含第一个值-2.1920。然后,在171.4和171.8之间的第二个斜率值,0.7034。等等。

我尝试了什么

我可以使用ifelse成功创建新列,并手动输入x$psi和“slopezz”的值。

##示例:

library(dplyr)
veh <- veh %>% 
  mutate(slope = ifelse(Tim<=171.4,slopezz[1], 
                           ifelse(Tim>171.4 & Tim<=171.8, slopezz[2], ....

代码很长,所以不要把整个事情放在这里。

但有没有更好的方法,我不必手动设置从Tim获取的x$psi值?

4 个答案:

答案 0 :(得分:3)

dput()使用veh$Tim,你有正确的想法;如果您将它用于slopezzx,那会有所帮助。

这是一个双线解决方案(其中ix是临时索引变量):

ix <- sapply(veh$Time, function(z) which.max(z <= c(x$psi, Inf)))
veh$slope <- slopezz[ix]

例如,当slopezz等于171.4时,您对veh$Tim的值有些含糊不清。上面的代码使用右边的间隔。

答案 1 :(得分:2)

您需要加入,例如tidyr::fill

library(dplyr)
library(tidyr)

x %>% mutate(slopezz = slopezz[1:n()]) %>% 
    right_join(veh, by = c('psi' = 'Tim')) %>% 
    fill(slopezz, .direction = 'up')
#     psi slopezz
# 1 169.7 -2.1920
# 2 169.8 -2.1920
# 3 169.9 -2.1920
# 4 170.0 -2.1920
# 5 170.1 -2.1920
# 6 170.2 -2.1920
# .   ...     ...

请注意,这会将最后四个值保留为NA,因为您需要fill。如果您想fill向下,只需添加%>% fill(slopezz)

数据

x <- structure(list(psi = c(171.4, 171.8, 175.1, 175.7, 176.3, 177.8, 
               178.7, 180.1, 181.5, 182.4, 183.8, 184.8)), .Names = "psi", class = "data.frame", row.names = c(NA, -12L))

slopezz <- c(-2.192, 0.7034, 0.6113, -1.254, 0.7513, 2.325, 0.0791, -0.9713, 
             1.101, 1.949, -1.429, 2.25, 0.8775)

veh <- structure(list(Tim = c(169.7, 169.8, 169.9, 170, 170.1, 170.2, 
                 170.3, 170.4, 170.5, 170.6, 170.7, 170.8, 170.9, 171, 171.1, 
                 171.2, 171.3, 171.4, 171.5, 171.6, 171.7, 171.8, 171.9, 172, 
                 172.1, 172.2, 172.3, 172.4, 172.5, 172.6, 172.7, 172.8, 172.9,  
                 173, 173.1, 173.2, 173.3, 173.4, 173.5, 173.6, 173.7, 173.8, 
                 173.9, 174, 174.1, 174.2, 174.3, 174.4, 174.5, 174.6, 174.7, 
                 174.8, 174.9, 175, 175.1, 175.2, 175.3, 175.4, 175.5, 175.6, 
                 175.7, 175.8, 175.9, 176, 176.1, 176.2, 176.3, 176.4, 176.5, 
                 176.6, 176.7, 176.8, 176.9, 177, 177.1, 177.2, 177.3, 177.4, 
                 177.5, 177.6, 177.7, 177.8, 177.9, 178, 178.1, 178.2, 178.3, 
                 178.4, 178.5, 178.6, 178.7, 178.8, 178.9, 179, 179.1, 179.2, 
                 179.3, 179.4, 179.5, 179.6, 179.7, 179.8, 179.9, 180, 180.1, 
                 180.2, 180.3, 180.4, 180.5, 180.6, 180.7, 180.8, 180.9, 181, 
                 181.1, 181.2, 181.3, 181.4, 181.5, 181.6, 181.7, 181.8, 181.9,  
                 182, 182.1, 182.2, 182.3, 182.4, 182.5, 182.6, 182.7, 182.8, 
                 182.9, 183, 183.1, 183.2, 183.3, 183.4, 183.5, 183.6, 183.7, 
                 183.8, 183.9, 184, 184.1, 184.2, 184.3, 184.4, 184.5, 184.6, 
                 184.7, 184.8, 184.9, 185, 185.1, 185.2)), .Names = "Tim", row.names = c(NA, 
                 -156L), class = "data.frame")

答案 2 :(得分:2)

这是使用基础R的剪切功能的解决方案。 数据:

veh<-data.frame(Tim=c(169.7, 169.8, 169.9, 170, 170.1, 170.2, 170.3, 170.4, 170.5, 
                      170.6, 170.7, 170.8, 170.9, 171, 171.1, 171.2, 171.3, 171.4, 
                      171.5, 171.6, 171.7, 171.8, 171.9, 172, 172.1, 172.2, 172.3, 
                      172.4, 172.5, 172.6, 172.7, 172.8, 172.9, 173, 173.1, 173.2, 
                      173.3, 173.4, 173.5, 173.6, 173.7, 173.8, 173.9, 174, 174.1, 
                      174.2, 174.3, 174.4, 174.5, 174.6, 174.7, 174.8, 174.9, 175, 
                      175.1, 175.2, 175.3, 175.4, 175.5, 175.6, 175.7, 175.8, 175.9, 
                      176, 176.1, 176.2, 176.3, 176.4, 176.5, 176.6, 176.7, 176.8, 
                      176.9, 177, 177.1, 177.2, 177.3, 177.4, 177.5, 177.6, 177.7, 
                      177.8, 177.9, 178, 178.1, 178.2, 178.3, 178.4, 178.5, 178.6, 
                      178.7, 178.8, 178.9, 179, 179.1, 179.2, 179.3, 179.4, 179.5, 
                      179.6, 179.7, 179.8, 179.9, 180, 180.1, 180.2, 180.3, 180.4, 
                      180.5, 180.6, 180.7, 180.8, 180.9, 181, 181.1, 181.2, 181.3, 
                      181.4, 181.5, 181.6, 181.7, 181.8, 181.9, 182, 182.1, 182.2, 
                      182.3, 182.4, 182.5, 182.6, 182.7, 182.8, 182.9, 183, 183.1, 
                      183.2, 183.3, 183.4, 183.5, 183.6, 183.7, 183.8, 183.9, 184, 
                      184.1, 184.2, 184.3, 184.4, 184.5, 184.6, 184.7, 184.8, 184.9, 
                      185, 185.1, 185.2))
slopezz<-c(-2.1920,  0.7034,  0.6113, -1.2540,  0.7513,  2.3250,  0.0791, -0.9713,
           1.1010,  1.9490, -1.4290,  2.2500,  0.8775)
x<-c(171.4, 171.8,  175.1,  175.7,  176.3,  177.8,  178.7,  180.1,  181.5,
      182.4, 183.8, 184.8)

现在将x定义为包含Tim的整个范围:

x<-c(0,x,200)
veh$slope<-slopezz[cut(veh$Tim, breaks=x)]

此示例的最终数据帧将是Tim列和新列斜率。

答案 3 :(得分:1)

蛮力的方式是

veh$slope = rep(slopes[1], length(veh$Tim))
for (j in 1:12) veh$slope[ veh$Tim>x$psi[j] ] = slopes[j+1]