通过变量的唯一组合将变量添加到组数据

时间:2019-02-19 17:01:48

标签: r dplyr tidyr data-manipulation mutate

我有一个如下数据框:

        <plugin>
            <groupId>org.apache.felix</groupId>
            <artifactId>maven-bundle-plugin</artifactId>
            <configuration>
                <instructions>
                    <Bundle-Activator>com.foo.common.service.Activator</Bundle-Activator>
                    <Export-Package>com.foo.common.service.*;com.google.zxing.*;com.akamai.edgegrid.*;version=${project.version}</Export-Package>
                    <Embed-Dependency>commons-lang3,ooxml-schemas,jackson-core,gson,sitemapgen4j,edgegrid-signer-apache-http-client</Embed-Dependency>
                    <!-- <Sling-Bundle-Resources>/var/classes</Sling-Bundle-Resources> <Sling-Initial-Content>SLINGINF/sling/servlets/;overwrite:=true;uninstall:=true;path:=/apps/sling/servlets,SLING-INF/public/;overwrite:=true;uninstall:=true;path:=/apps/public</Sling-Initial-Content> -->
                </instructions>
            </configuration>
        </plugin>

我正在尝试引入一个新的变量'Combo',它将代表'Date'和'Location'变量的每个唯一组合,这样,具有相同日期和位置的任何观测值行都将具有相同的'组合的价值。我希望它看起来像这样:

df <- data.frame(cbind((c(2018,2018,2018,2018,2018,2017,2017,2016)), 
        (c('Ohio','Ohio','Arizona','Arizona','Nebraska','Nebraska','New Mexico','Idaho')), 
        (c('A','B','C','D','E','F','G','H')), (c(1,2,3,4,5,6,7,8))))
colnames(df) <- c('Date', 'Location', 'Var1', 'Var2')


      Date   Location     Var1 Var2
      2018   Ohio         A    1 
      2018   Ohio         B    2 
      2018   Arizona      C    3 
      2018   Arizona      D    4 
      2018   Nebraska     E    5 
      2017   Nebraska     F    6 
      2017   New Mexico   G    7 
      2016   Idaho        H    8 

使日期和位置具有相同组合的所有行共享每个组合值,而不管该行中的其他变量如何。

我尝试使用 Date Location Var1 Var2 Combo 2018 Ohio A 1 1 2018 Ohio B 2 1 2018 Arizona C 3 2 2018 Arizona D 4 2 2018 Nebraska E 5 3 2017 Nebraska F 6 4 2017 New Mexico G 7 5 2016 Idaho H 8 6 mutate()的组合,但没有成功。我希望有一个简单的解决方案,类似于:

有人对此有任何想法吗?我尝试过在distinct()distinct()的文档中寻找想法,但没有运气。

非常感谢您的帮助!

2 个答案:

答案 0 :(得分:1)

按“日期”,“位置”分组后,我们可以使用.GRP中的data.table

library(data.table)
setDT(df)[, Combo := .GRP, .(Date, Location)]
df
#   Date   Location Var1 Var2 Combo
#1: 2018       Ohio    A    1     1
#2: 2018       Ohio    B    2     1
#3: 2018    Arizona    C    3     2
#4: 2018    Arizona    D    4     2
#5: 2018   Nebraska    E    5     3
#6: 2017   Nebraska    F    6     4
#7: 2017 New Mexico    G    7     5
#8: 2016      Idaho    H    8     6

或使用rleid

setDT(df)[, Combo := rleid(Date, Location)]

答案 1 :(得分:1)

两者

df <- mutate(df,Combo = as.integer(interaction(Date,Location,drop = TRUE)))

df <- mutate(df,Combo = as.integer(factor(paste0(Date,Location))))

是选项,尽管它们对级别的排序顺序与数据中出现的顺序不同。