如何在循环中选择具有条件的字符串模式[r]

时间:2016-12-09 09:22:47

标签: r string dataframe apply stringr

我希望在r数据帧中的某些行中选择字符串的一部分时请求一些帮助。我在下面(floyd)模拟了一些虚拟数据来说明。

第一个数据帧行每列只有一个单词(它的数字是,但我将所有数字都视为字符/单词),但第2行到第4行有多个单词。我想根据命名向量cool_floyd_position传递给它的位置,选择每行/单元格中的数字。

# please NB need stringr installed for my solution attempt!

# some scenario data
floyd = data.frame(people = c("roger", "david", "rick", "nick"),
               spec1 = c("1", "3 5 75 101", "3 65 85", "12 2"),
               spec2 = c("45", "75 101 85 12", "45 65 8", "45 87" ),
               spec3 = c("1", "3 5 75 101", "75 98 5", "65 32"))

# tweak my data
rownames(floyd) = floyd$people
floyd$people = NULL

# ppl of interest
cool_floyd = rownames(floyd)[2:4]

# ppl string position criteria
cool_floyd_position = c(2,3,1)
names(cool_floyd_position) = c("david", "rick", "nick")

# my solution attempt
for(i in 1:length(cool_floyd))
{
select_ppl = cool_floyd[i]
string_select = cool_floyd_position[i]

floyd[row.names(floyd) == select_ppl,] = apply(floyd[row.names(floyd) == select_ppl], 1, 
                     function(x) unlist(stringr::str_split(x, " ")[string_select]))
        }

我试图让我的floyd数据框看起来如下所示,其中为所有david列选择第二个单词,所有rick列的第三个单词和所有nick列的第一个单词(roger列必须保留)原样)

my_target_df = data.frame(people = c("roger", "david", "rick", "nick"),
                      spec1 = c("1", "5", "85", "12"),
                      spec2 = c("45", "101", "8", "45" ),
                      spec3 = c("1", "5", "5", "65"))

row.names(my_target_df) = my_target_df$people
my_target_df$people = NULL

非常感谢提前!

3 个答案:

答案 0 :(得分:3)

以下是使用database.php

的其他选项
mapply

其中

library(stringr)
#convert the factor columns to character
floyd[] <- lapply(floyd, as.character)
#transpose the floyd, subset the columns, convert to data.frame
# use mapply to extract the `word` specified in the corresponding c1
#transpose and assign it back to the row in 'floyd'
floyd[names(c1),] <- t(mapply(function(x,y) word(x, y), 
        as.data.frame(t(floyd)[, names(c1)], stringsAsFactors=FALSE), c1))
floyd
#      spec1 spec2 spec3
#roger     1    45     1
#david     5   101     5
#rick     85     8     5
#nick     12    45    65

答案 1 :(得分:2)

您可以尝试组合使用sapply来迭代数据框,并mapply从每列中提取第n个word。即,

library(stringr)
df1 <- rbind(df[1,-1], sapply(df[-1,-1], function(i) mapply(word, i, cool_floyd_position)))
rownames(df1) <- df$people
df1
#      spec1 spec2 spec3
#roger     1    45     1
#david     5   101     5
#rick     85     8     5
#nick     12    45    65

此解决方案的唯一缺点是people显示为rownames而不是单个列。有很多方法可以使它成为一个列,即

df1$people <- rownames(df1)
rownames(df1) <- NULL
df1[c(ncol(df1), 1:ncol(df1)-1)]
#  people spec1 spec2 spec3
#1  roger     1    45     1
#2  david     5   101     5
#3   rick    85     8     5
#4   nick    12    45    65

答案 2 :(得分:1)

Tidyverse解决方案:

library(stringi) # you have this installed if you have stringr
library(tidyverse)

pick_pos <- function(who, x, lkp) {
  if (who %in% names(lkp)) {
    map_chr(x, ~stri_split_fixed(., " ")[[1]][lkp[[who]]])
  } else { 
    x
  }
}

rownames_to_column(floyd, "people") %>% 
  mutate_all(funs(as.character)) %>% # necessary since you have factors
  group_by(people) %>% 
  mutate_all(funs(pick_pos(people, ., cool_floyd_position))) %>% 
  data.frame() %>% 
  column_to_rownames("people")