我有一个来自问卷的数据库。这个数据库有一些复杂的长文本,为了我的目的,我还必须在我的分析后使用它们作为变量。
我分析的数据帧类型的示例如下:
cnt <-as.factor(c("Country 1", "Country 2", "Country 3", "Country 1", "Country 2", "Country 3" ))
bnk <-as.factor(c("bank 1", "bank 2", "bank 3", "bank 1", "bank 2", "bank 3" ))
qst <-as.factor(c(" Q.1 - some long question?", " Q.1 - some long question?", " Q.1 - some long question?", "Q.27 <U+FFFD> another long question?","Q.27 <U+FFFD> another long question?","Q.27 <U+FFFD> another long question?" ))
ans <-as.numeric(c(1,1,2,1,2,3))
df <-data.frame(cnt, bnk, qst,ans)
names(df) <- c("Country", "Institute", "Question", "Answer")
head(df)
Country Institute Question Answer
1 Country 1 bank 1 Q.1 - some long question? 1
2 Country 2 bank 2 Q.1 - some long question? 1
3 Country 3 bank 3 Q.1 - some long question? 2
4 Country 1 bank 1 Q.27 <U+FFFD> another long question? 1
5 Country 2 bank 2 Q.27 <U+FFFD> another long question? 2
6 Country 3 bank 3 Q.27 <U+FFFD> another long question? 3
正如您在变量&#34;问题&#34;中看到的那样,无论问题是什么,都有一种模式:所有文本都以Q.number开头
仅供参考,不同问题的数量为49.
我想在这里做几件事(或步骤):
df&lt; -mutate(df,qs = c(&#34; q1&#34;,&#34; q1&#34;,&#34; q1&#34;,&#34; q27&#34;, &#34; q27&#34;,&#34; q27&#34;))
Country Institute Question Answer qs
1 Country 1 bank 1 Q.1 - some long question? 1 q1
2 Country 2 bank 2 Q.1 - some long question? 1 q1
3 Country 3 bank 3 Q.1 - some long question? 2 q1
4 Country 1 bank 1 Q.27 <U+FFFD> another long question? 1 q27
5 Country 2 bank 2 Q.27 <U+FFFD> another long question? 2 q27
6 Country 3 bank 3 Q.27 <U+FFFD> another long question? 3 q27
因此,最终数据框必须如下所示:
Country Institute Question Answer qs qs_inx labels
1 Country 1 bank 1 Q.1 - some long question? 1 q1 1 some long question?
2 Country 2 bank 2 Q.1 - some long question? 1 q1 1 some long question?
3 Country 3 bank 3 Q.1 - some long question? 2 q1 1 some long question?
4 Country 1 bank 1 Q.27 <U+FFFD> another long question? 1 q2 2 another long question?
5 Country 2 bank 2 Q.27 <U+FFFD> another long question? 2 q2 2 another long question?
6 Country 3 bank 3 Q.27 <U+FFFD> another long question? 3 q2 2 another long question?
答案 0 :(得分:1)
如果理解正确,您需要df$Question
的两份副本,但每份副本中都使用不同的标签。
df$qs_inx <- df$Question
df$labels <- df$Question
levels(df$qs_inx) <- sub('[ ]*Q\\.([0-9]+).*', 'q\\1', levels(df$Question))
levels(df$labels) <- sub('[ ]*Q\\.(.*)', '\\1', levels(df$Question))