Question

我有一个宽格式的数据框，每列代表一个问卷项，针对特定时间点的问卷的特定版本 < / strong>（重复措施设计）。

我的数据如下所示：

df <- data.frame(id = c(1:5), t1_QOL_child_Q1 = c(5, 3, 6, 2, 7), t1_QOL_child_Q2 = c(5, 2, 3, 7, 1), t1_QOL_child_Q3 = c(7, 7, 6, 2, 5), t1_QOL_child_joy = c(9,9, 5, 3, 6), t1_QOL_teen_Q1 = c(5, 3, 6, 2, 7), t1_QOL_teen_Q2 = c(5, 2, 3, 7, 1), t1_QOL_teen_Q3 = c(7, 7, 6, 2, 5), t1_QOL_teen_joy = c(5, 7, 4, 7, 9), t1_QOL_adult_Q1 = c(5, 3, 6, 2, 7), t1_QOL_adult_Q2 = c(5, 2, 3, 7, 1), t1_QOL_adult_Q3 = c(7, 7, 6, 2, 5), t1_QOL_adult_joy = c(6, 5, 3, 3, 2), t2_QOL_child_Q1 = c(5, 3, 6, 2, 7), t2_QOL_child_Q2 = c(5, 2, 3, 7, 1), t2_QOL_child_Q3 = c(7, 7, 6, 2, 5), t2_QOL_child_joy = c(9,9, 5, 3, 6), t2_QOL_teen_Q1 = c(5, 3, 6, 2, 7), t2_QOL_teen_Q2 = c(5, 2, 3, 7, 1), t2_QOL_teen_Q3 = c(7, 7, 6, 2, 5), t2_QOL_teen_joy = c(5, 7, 4, 7, 9), t2_QOL_adult_Q1 = c(5, 3, 6, 2, 7), t2_QOL_adult_Q2 = c(5, 2, 3, 7, 1), t2_QOL_adult_Q3 = c(7, 7, 6, 2, 5), t2_QOL_adult_joy = c(6, 5, 3, 3, 2))

例如，列t1_QOL_child_Q1将表示生活质量（QOL）问卷的子版本（子）的问题1（Q1），时间点为1（t1）数据。

我想只选择后缀标注不同的分量表/列。在上面的示例数据中，它将是以“joy”结尾的列。

我有超过3000列和更多后缀，使用以下内容会很痛苦：

select(df, ends_with("joy"), ends_with(<another suffix>), ends_with(<another suffix>))

我曾考虑将所有潜在的后缀放在字符串向量中，并使用向量作为ends_with函数的输入，但ends_with只能使用单个字符串而不是字符串向量。

我在Stackoverflow上搜索过并找到了一个solution，它可以容纳一小段字符串，如下所示：

select(df, sapply(vector_of_strings, starts_with))

但是，我的字符串向量中有太多后缀，并且由此产生以下错误消息：错误：sapply(vector_of_strings, ends_with)必须解析为整数列位置，而不是列表

帮助表示感谢。谢谢！

Answer 1

我们可以使用单个matches，其中多个模式由|分隔，以匹配字符串末尾的子字符串（$）

df %>% 
    select(matches("(joy|Q2)$"))

使用ends_with helper和字符串名称向量

1 个答案: