创建新列,按每个ID或每个ID组合来自不同列的值

时间:2019-08-03 23:17:47

标签: r dataframe group-by data-analysis id

我想合并不同列中的一些值,这些组合必须根据另一列中的每个ID(Ptt

我尝试了一些尝试,但没有成功

我想在data.frame area中使用以下组合创建一个新列df,每种组合在此新列{{1}中确定一个名称(AR,AM或AA) }。需要对每个ID(area)进行此组合

我的Ptt的示例:

df
Ptt      bat$depth       Latitude    Longitude
88734    -500           -18.0490      -38.9485
88734    -750           -19.4095      -39.4320
88734    -800           -19.8043      -40.5436
88734    -490           -20.0543      -40.9095
88734    -300           -21.4085      -41.0954
129041   -1500          -25.0954      -50.4350
129041   -2400          -26.4095      -51.0954
129041   -1200          -27.5309      -51.9053
129041   -1190          -27.7953      -52.5403 
129041   -1606          -28.0904      -51.9504
120941   -2000          -29.4985      -52.0590

我所期望的(这个值不是真实的),只是一个例子:

x <- plyr::ddply(by(df, df["Ptt"],
                  df$area[df$bat$depth >= -500 &  df$Latitude >= -20.0000] <- "AR"
                  df$area[df$bat$depth <= -500 &  df$Latitude >= -20.0000] <- "AR"
                  df$area[df$bat$depth <= -500 &  df$Latitude <= -20.0000] <- "AM"
                  df$area[df$bat$depth >= -500 & df$Latitude <= -20.0000] <- "AM"
                  df$area[df$Latitude <= -51.0000] <- "AA"))

x <- plyr::ddply(df, ~Ptt, function(d){
  d$area <- NA
  d$area[d$bat$depth >= -500 &  d$Latitude >= -20.0000] <- "AR"
  d$area[d$bat$depth <= -500 &  d$Latitude >= -20.0000] <- "AR"
  d$area[d$bat$depth <= -500 &  d$Latitude <= -20.0000] <- "AM"
  d$area[d$bat$depth >= -500 &  d$Latitude <= -20.0000] <- "AM"
  d$area[d$Latitude <= -51.0000] <- "AA" 
})

x <- dplyr::group_by(df,Ptt)%>%
df$area[df$bat$depth >= -500 &  df$Latitude >= -20.0000] <- "AR"
df$area[df$bat$depth <= -500 &  df$Latitude >= -20.0000] <- "AR"
df$area[df$bat$depth <= -500 &  df$Latitude <= -20.0000] <- "AM"
df$area[df$bat$depth >= -500 & df$Latitude <= -20.0000] <- "AM"
df$area[df$Latitude <= -51.0000] <- "AA"        

x <- df%>%
  dplyr::group_by(Ptt)%>%
df$area[df$bat$depth >= -500 &  df$Latitude >= -20.0000] <- "AR"
df$area[df$bat$depth <= -500 &  df$Latitude >= -20.0000] <- "AR"
df$area[df$bat$depth <= -500 &  df$Latitude <= -20.0000] <- "AM"
df$area[df$bat$depth >= -500 & df$Latitude <= -20.0000] <- "AM"
df$area[df$Latitude <= -51.0000] <- "AA"        

library(data.table)
x <- df[,. df$area[df$bat$depth >= -500 &  df$Latitude >= -20.0000] <- "AR"
        df$area[df$bat$depth <= -500 &  df$Latitude >= -20.0000] <- "AR"
        df$area[df$bat$depth <= -500 &  df$Latitude <= -20.0000] <- "AM"
        df$area[df$bat$depth >= -500 & df$Latitude <= -20.0000] <- "AM"
        df$area[df$Latitude <= -51.0000] <- "AA" , by = "Ptt"]

谢谢!

1 个答案:

答案 0 :(得分:1)

您可以使用case_when入门,并且可以随意修改和链接条件。给定条件,确保该区域中的结果有效。使用names(df)colnames(df)查看列在数据框中的表示方式。

df %>%
group_by(Ptt) %>%
  mutate(area = case_when(
    (bat.depth >= -500 & Latitude >= -20.0000) ~ "AR",
    (bat.depth <= -500 & Latitude >= -20.0000) ~ "AR",
    (bat.depth <= -500 & Latitude <= -20.0000) ~ "AM",
    (bat.depth >= -500 & Latitude <= -20.0000) ~ "AR",
    (Latitude <= -51.0000) ~ "AA"

  ))
# -------------------------------------------------------------------------
# A tibble: 11 x 5
# Groups:   Ptt [3]
# Ptt bat.depth Latitude Longitude area 
# <int>     <int>    <dbl>     <dbl> <chr>
# 1  88734      -500    -18.0     -38.9 AR   
# 2  88734      -750    -19.4     -39.4 AR   
# 3  88734      -800    -19.8     -40.5 AR   
# 4  88734      -490    -20.1     -40.9 AR   
# 5  88734      -300    -21.4     -41.1 AR   
# 6 129041     -1500    -25.1     -50.4 AM   
# 7 129041     -2400    -26.4     -51.1 AM   
# 8 129041     -1200    -27.5     -51.9 AM   
# 9 129041     -1190    -27.8     -52.5 AM   
# 10 129041     -1606    -28.1     -52.0 AM   
# 11 120941     -2000    -29.5     -52.1 AM
相关问题