我希望在r数据帧中的某些行中选择字符串的一部分时请求一些帮助。我在下面(floyd)模拟了一些虚拟数据来说明。
第一个数据帧行每列只有一个单词(它的数字是,但我将所有数字都视为字符/单词),但第2行到第4行有多个单词。我想根据命名向量cool_floyd_position
传递给它的位置,选择每行/单元格中的数字。
# please NB need stringr installed for my solution attempt!
# some scenario data
floyd = data.frame(people = c("roger", "david", "rick", "nick"),
spec1 = c("1", "3 5 75 101", "3 65 85", "12 2"),
spec2 = c("45", "75 101 85 12", "45 65 8", "45 87" ),
spec3 = c("1", "3 5 75 101", "75 98 5", "65 32"))
# tweak my data
rownames(floyd) = floyd$people
floyd$people = NULL
# ppl of interest
cool_floyd = rownames(floyd)[2:4]
# ppl string position criteria
cool_floyd_position = c(2,3,1)
names(cool_floyd_position) = c("david", "rick", "nick")
# my solution attempt
for(i in 1:length(cool_floyd))
{
select_ppl = cool_floyd[i]
string_select = cool_floyd_position[i]
floyd[row.names(floyd) == select_ppl,] = apply(floyd[row.names(floyd) == select_ppl], 1,
function(x) unlist(stringr::str_split(x, " ")[string_select]))
}
我试图让我的floyd数据框看起来如下所示,其中为所有david列选择第二个单词,所有rick列的第三个单词和所有nick列的第一个单词(roger列必须保留)原样)
my_target_df = data.frame(people = c("roger", "david", "rick", "nick"),
spec1 = c("1", "5", "85", "12"),
spec2 = c("45", "101", "8", "45" ),
spec3 = c("1", "5", "5", "65"))
row.names(my_target_df) = my_target_df$people
my_target_df$people = NULL
非常感谢提前!
答案 0 :(得分:3)
以下是使用database.php
mapply
其中
library(stringr)
#convert the factor columns to character
floyd[] <- lapply(floyd, as.character)
#transpose the floyd, subset the columns, convert to data.frame
# use mapply to extract the `word` specified in the corresponding c1
#transpose and assign it back to the row in 'floyd'
floyd[names(c1),] <- t(mapply(function(x,y) word(x, y),
as.data.frame(t(floyd)[, names(c1)], stringsAsFactors=FALSE), c1))
floyd
# spec1 spec2 spec3
#roger 1 45 1
#david 5 101 5
#rick 85 8 5
#nick 12 45 65
答案 1 :(得分:2)
您可以尝试组合使用sapply
来迭代数据框,并mapply
从每列中提取第n个word
。即,
library(stringr)
df1 <- rbind(df[1,-1], sapply(df[-1,-1], function(i) mapply(word, i, cool_floyd_position)))
rownames(df1) <- df$people
df1
# spec1 spec2 spec3
#roger 1 45 1
#david 5 101 5
#rick 85 8 5
#nick 12 45 65
此解决方案的唯一缺点是people
显示为rownames而不是单个列。有很多方法可以使它成为一个列,即
df1$people <- rownames(df1)
rownames(df1) <- NULL
df1[c(ncol(df1), 1:ncol(df1)-1)]
# people spec1 spec2 spec3
#1 roger 1 45 1
#2 david 5 101 5
#3 rick 85 8 5
#4 nick 12 45 65
答案 2 :(得分:1)
Tidyverse解决方案:
library(stringi) # you have this installed if you have stringr
library(tidyverse)
pick_pos <- function(who, x, lkp) {
if (who %in% names(lkp)) {
map_chr(x, ~stri_split_fixed(., " ")[[1]][lkp[[who]]])
} else {
x
}
}
rownames_to_column(floyd, "people") %>%
mutate_all(funs(as.character)) %>% # necessary since you have factors
group_by(people) %>%
mutate_all(funs(pick_pos(people, ., cool_floyd_position))) %>%
data.frame() %>%
column_to_rownames("people")