拆分和替换R中数据框中的字符变量

时间:2015-04-23 19:29:02

标签: r function loops transform strsplit

我有一个包含多个不同长度的字符变量的数据框,我想将每个变量转换为一个列表,每个元素包含每个单词,用空格分割。

说我的数据如下:

char <- c("This is a string of text", "So is this")
char2 <- c("Text is pretty sweet", "Bet you wish you had text like this")

df <- data.frame(char, char2)

# Convert factors to character
df <- lapply(df, as.character)

> df
$char
[1] "This is a string of text" "So is this"              

$char2
[1] "Text is pretty sweet"                "Bet you wish you had text like this"

现在我可以使用strsplit()按字分割每列:

df <- transform(df, "char" = strsplit(df[, "char"], " "))
> df$char
[[1]]
[1] "This"   "is"     "a"      "string" "of"     "text"  

[[2]]
[1] "So"   "is"   "this"

我想要做的是创建一个循环或函数,允许我一次为两个列执行此操作,如:

for (i in colnames(df) {
    df <- transform(df, i = strsplit(df[, i], " "))
}

然而,这会产生错误:

Error in data.frame(list(char = c("This is a string of text", "So is this",  : 
  arguments imply differing number of rows: 6, 8 

我也尝试过:

splitter <- function(colname) {
    df <- transform(df, colname = strsplit(df[, colname], " "))
}

分离器(colnames(DF))

告诉我:

Error in strsplit(df[, colname], " ") : non-character argument

我很困惑为什么变换调用适用于单个列,但在循环或函数中应用时却不适用。任何帮助将不胜感激!

1 个答案:

答案 0 :(得分:0)

我没有transform

就得到了所需的输出
char <- c("This is a string of text", "So is this")
char2 <- c("Text is pretty sweet", "Bet you wish you had text like this")
df <- data.frame(char, char2)
# Convert factors to character
df <- lapply(df, as.character)

我放入

lapply(df, strsplit, split= " ")

获得

$char
$char[[1]]
[1] "This"   "is"     "a"      "string" "of"     "text"  

$char[[2]]
[1] "So"   "is"   "this"


$char2
$char2[[1]]
[1] "Text"   "is"     "pretty" "sweet" 

$char2[[2]]
[1] "Bet"  "you"  "wish" "you"  "had"  "text" "like" "this"

正如Alex所说:通过将df <- lapply(df, as.character)更改为df <- data.frame(char, char2)

,可以消除代码df <- data.frame(char, char2, stringsAsFactors=FALSE)中的第一个lapply