如何用字母顺序改变单词的顺序

时间:2017-11-15 10:05:49

标签: r alphabetical

我得到了一个包含关键字列表的数据集(1个关键字/行)。

  1. 我正在寻找一种基于KEYWORD列创建新列(ALPHABETICAL)的方法。应根据关键字自动生成ALPHABETICAL列的值,但应按字母顺序排序。
  2. 像这样:

     | KEYWORD            | ALPHABETICAL       |
     | house blue         | blue house         | 
     | blue house         | blue house         | 
     | my blue house      | blue house my      | 
     | this house is blue | blue house is this | 
     | sky orange         | orange sky         | 
     | orange sky         | orange sky         | 
     | the orange sky     | orange sky the     | 
    

    感谢您的帮助!

3 个答案:

答案 0 :(得分:6)

迭代行以" "strsplit)拆分,排序并折叠回来:

# Generate data
df <- data.frame(KEYWORD = c(paste(sample(letters, 3), collapse = " "), 
                             paste(sample(letters, 3), collapse = " ")))
#  KEYWORD
#   z e s
#   d a u

df$ALPHABETICAL  <- apply(df, 1, function(x) paste(sort(unlist(strsplit(x, " "))),
                                                   collapse = " "))
#  KEYWORD ALPHABETICAL
#   z e s        e s z
#   d a u        a d u

答案 1 :(得分:1)

df$ALPHABETICAL <- sapply(strsplit(df$KEYWORD," "),function(x) paste(sort(x),collapse=" "))

df
#              KEYWORD       ALPHABETICAL
# 1         house blue         blue house
# 2         blue house         blue house
# 3      my blue house      blue house my
# 4 this house is blue blue house is this
# 5         sky orange         orange sky
# 6         orange sky         orange sky
# 7     the orange sky     orange sky the

数据

df <- data.frame(KEYWORD = c(
  'house blue',
  'blue house',
  'my blue house',
  'this house is blue',
  'sky orange',
  'orange sky',
  'the orange sky'),stringsAsFactors = FALSE)  

答案 2 :(得分:0)

使用dplyr + stringr的一个解决方案

library(dplyr)
library(stringr)
KEYWORDS  <- c('house blue','blue house','my blue house','this house is blue','sky orange','orange sky','the orange sky')

ALPHABETICAL <- KEYWORDS %>% str_split(., ' ') %>% lapply(., 'sort') %>%  lapply(., 'paste', collapse=' ') %>% unlist(.)

最后一行使用str_split()将KEYWORDS拆分为向量列表;然后将sort应用于每个列表元素;使用paste连接向量,最后将列表分解为向量。

结果是

> cbind(KEYWORDS, ALPHABETICAL)
     KEYWORDS             ALPHABETICAL        
[1,] "house blue"         "blue house"        
[2,] "blue house"         "blue house"        
[3,] "my blue house"      "blue house my"     
[4,] "this house is blue" "blue house is this"
[5,] "sky orange"         "orange sky"        
[6,] "orange sky"         "orange sky"        
[7,] "the orange sky"     "orange sky the"