循环并比较向量和数据帧(R中)

时间:2018-11-02 14:22:04

标签: r dataframe vector

Rhrhrjrjdjxjxjdiebejejdjdjddhdbdbd dbdbdbbddb

2 个答案:

答案 0 :(得分:1)

如果我正确地理解了您的问题,那么您可以省去for循环,因为R在您的工具列表中可以实现矢量安全。使用tidyverse,您的代码应如下所示:

# load tidyverse
library(tidyverse)

# set vector of instruments
instru = c("Accordian", "Clarinet", "Trumpet", "DoubleBass", "Oboe", "Piano", "Saxophone", "Violin", "Cello", "Tuba", "Viola", "Bassoon", "EnglishHorn", "French horn", "Flute", "Piccolo", "SynthBass", "Trombone")

# create dummy train data.frame (more exactly a "tibble")
train <- tibble(mix1_instrument = c("a", "b", "Clarinet"),
                mix2_instrument = c("a", "Clarinet", "c"),
                xxx = c("Clarinet", "b", "c"))

#> train
## A tibble: 3 x 3
#mix1_instrument mix2_instrument xxx     
#<chr>           <chr>           <chr>   
#1 a               a               Clarinet
#2 b               Clarinet        b       
#3 Clarinet        c               c       


# add column "instruments" to train
train <- train %>% 
  mutate(instruments = case_when(
    mix1_instrument %in% instru ~ "1",
    mix2_instrument %in% instru ~ "1",
    TRUE ~"0"
  ))

#>     train
## A tibble: 3 x 4
# mix1_instrument mix2_instrument xxx      instruments
# <chr>           <chr>           <chr>    <chr>      
#1 a               a               Clarinet 0          
#2 b               Clarinet        b        1          
#3 Clarinet        c               c        1       

答案 1 :(得分:0)

如果您熟悉dplyr,则可以使用mutate完成此操作。

instru = c("Accordian", "Clarinet", "Trumpet", "DoubleBass", "Oboe", "Piano", "Saxophone", "Violin", "Cello", "Tuba", "Viola",
           "Bassoon", "EnglishHorn", "French horn", "Flute", "Piccolo", "SynthBass", "Trombone")

mix1_instruments = c("Accordion", "Trumpet", "Violin", "Cello", "Triangle")
mix2_instruments = c("Bassoon", "Saxophone", "Flute", "French horn", "Washboard")

train = data.frame(mix1_instruments, mix2_instruments)

train <- train %>%
  mutate(instruments = (mix1_instruments %in% instru) | (mix2_instruments %in% instru))

输出为TRUEFALSE,但它们也可以转换为0或1。

train$instruments <- as.numeric(train$instruments)

编辑:刚才看到我在写响应时被挖出了(好得多!),但是存在可伸缩性问题。

以下内容将插入名称为<old_column_name>_instruments的新列,并为其添加逻辑,以确保该列中的每个条目是否都在instru中,然后将它们合并到一个包含 any <中逻辑值的列中/ em>列在instru中包含一个条目:

instru = c("Accordian", "Clarinet", "Trumpet", "DoubleBass", "Oboe", "Piano", "Saxophone", "Violin", "Cello", "Tuba", "Viola",
           "Bassoon", "EnglishHorn", "French horn", "Flute", "Piccolo", "SynthBass", "Trombone")

mix1_instruments = c("Clarinet", "Flute", "Clarinet", "English Horn", "Washboard", "Saxophone", "Washboard")
mix2_instruments = c("French Horn", "French Horn", "French Horn", "Flute", "Flute", "Triangle", "Triangle")

train = data.frame(mix1_instruments, mix2_instruments)

train %<>%
  mutate_all(funs(instruments = . %in% instru)) %>%
  unite(col = instruments,
        ends_with('_instruments_instruments'), # optional, iterates only over columns added by unite in this particular dataset
        remove=T) %>%
  mutate(instruments = as.numeric(grepl('TRUE', instruments)))

输出:

train
#  mix1_instruments mix2_instruments instruments
#1         Clarinet      French Horn           1
#2            Flute      French Horn           1
#3         Clarinet      French Horn           1
#4     English Horn            Flute           1
#5        Washboard            Flute           1
#6        Saxophone         Triangle           1
#7        Washboard         Triangle           0

注意:%<>%来自magrittr,并且仅替换了x <- x %>% ...语法

您可以output a dataframe with the write.x functions作为CSV输出:

write.csv(train, "/path/to/dir/filename.csv", row.names=F)