如何将字符串的长度减少到R中的10个字?

时间:2014-03-13 16:27:07

标签: r csv

我有一个像

这样的csv文件
Identity,Keyword
23, The weather is perfect for good days football the players are healthy
45,  1 Locksmith services Locally Owned and Operated Fast response time Call Now

我想将关键字列中的字数减少到10

期望的输出

Identity,Keyword
23, The weather is perfect for good days football the players 
45,  1 Locksmith services Locally Owned and Operated Fast response time 

我正在使用代码

keyword <- sapply(record$Keyword,function(x) gsub("^((\\w+\\W+){9}\\w+).*","\\1",x))

第二个身份并没有将单词数减少到10。出了什么问题?任何帮助表示赞赏?

1 个答案:

答案 0 :(得分:4)

为了其他人给出不同答案的好处,我已经以复制/粘贴格式添加了您的数据...

# The data....
df <- read.table( text = "Identity,Keyword
23, \'The weather is perfect for good days football the players are healthy\'
45,  \'1 Locksmith services Locally Owned and Operated Fast response time Call Now\'" , header = TRUE , sep = "," , stringsAsFactors = FALSE)

# Strip out leading and trailing spaces (which were a problem for me)
df$Keyword <- gsub( "^ +| +$" , "" , df$Keyword )

# Split words on spaces, and select the first 10 elements of each
ll <- lapply( strsplit( df$Keyword , " " ) , `[` , 1:10 )

# Collapse to a single 10 word string and add to the orginal data.frame
df$Short <- sapply( ll , paste , collapse = " " )

#  Identity                                                                     Keyword                                                              Short
#1       23       The weather is perfect for good days football the players are healthy          The weather is perfect for good days football the players
#2       45 1 Locksmith services Locally Owned and Operated Fast response time Call Now 1 Locksmith services Locally Owned and Operated Fast response time