我有一个包含多个不同长度的字符变量的数据框,我想将每个变量转换为一个列表,每个元素包含每个单词,用空格分割。
说我的数据如下:
char <- c("This is a string of text", "So is this")
char2 <- c("Text is pretty sweet", "Bet you wish you had text like this")
df <- data.frame(char, char2)
# Convert factors to character
df <- lapply(df, as.character)
> df
$char
[1] "This is a string of text" "So is this"
$char2
[1] "Text is pretty sweet" "Bet you wish you had text like this"
现在我可以使用strsplit()按字分割每列:
df <- transform(df, "char" = strsplit(df[, "char"], " "))
> df$char
[[1]]
[1] "This" "is" "a" "string" "of" "text"
[[2]]
[1] "So" "is" "this"
我想要做的是创建一个循环或函数,允许我一次为两个列执行此操作,如:
for (i in colnames(df) {
df <- transform(df, i = strsplit(df[, i], " "))
}
然而,这会产生错误:
Error in data.frame(list(char = c("This is a string of text", "So is this", :
arguments imply differing number of rows: 6, 8
我也尝试过:
splitter <- function(colname) {
df <- transform(df, colname = strsplit(df[, colname], " "))
}
分离器(colnames(DF))
告诉我:
Error in strsplit(df[, colname], " ") : non-character argument
我很困惑为什么变换调用适用于单个列,但在循环或函数中应用时却不适用。任何帮助将不胜感激!
答案 0 :(得分:0)
我没有transform
char <- c("This is a string of text", "So is this")
char2 <- c("Text is pretty sweet", "Bet you wish you had text like this")
df <- data.frame(char, char2)
# Convert factors to character
df <- lapply(df, as.character)
我放入
lapply(df, strsplit, split= " ")
获得
$char
$char[[1]]
[1] "This" "is" "a" "string" "of" "text"
$char[[2]]
[1] "So" "is" "this"
$char2
$char2[[1]]
[1] "Text" "is" "pretty" "sweet"
$char2[[2]]
[1] "Bet" "you" "wish" "you" "had" "text" "like" "this"
正如Alex所说:通过将df <- lapply(df, as.character)
更改为df <- data.frame(char, char2)
df <- data.frame(char, char2, stringsAsFactors=FALSE)
中的第一个lapply