Question

我有一个由列组成的R数据帧。一栏包含清单：即

DiffUtill

我想创建一个重要的列，这些列表的长度：

      Column
      1,2,4,7,9,0
      5,3,8,9,0
      3,4
      5.8,9,3.5
      6
      NA
      7,4,3

此外，有没有办法访问这些列表中的特定实例？即，只创建每个列表的第一个实例的新列？或每个的最后一个实例？

Answer 1

一种解决方案是使用ValueError: Index contains duplicate entries, cannot reshape拆分字符向量中的元素，并使用strsplit来获得所需的计数：

sapply

如果需要将df$count <- sapply(strsplit(df$Column, ","),function(x){ if(all(is.na(x))){ NA } else { length(x) } }) df # Column count # 1 1,2,4,7,9,0 6 # 2 5,3,8,9,0 5 # 3 3,4 2 # 4 5.8,9,3.5 3 # 5 6 1 # 6 <NA> NA # 7 7,4,3 3计为NA，则解决方案可能更简单：

数据：

df$count <- sapply(strsplit(df$Column, ","),length)

Answer 2

count.fields用于文本文件的目的，也可以强制使用列：

df$Count <- count.fields(textConnection(df$Column), sep=",")
df$Count[is.na(df$Column)] <- NA

df
#       Column Count
#1 1,2,4,7,9,0     6
#2   5,3,8,9,0     5
#3         3,4     2
#4   5.8,9,3.5     3
#5           6     1
#6        <NA>    NA
#7       7,4,3     3

更一般地说，您可能最好将列转换为列表，或将数据堆叠为长格式，以便更轻松地使用：

df$Column <- strsplit(df$Column, ",")
lengths(df$Column)
#[1] 6 5 2 3 1 1 3
sapply(df$Column, `[`, 1)
#[1] "1"   "5"   "3"   "5.8" "6"   NA    "7"  

stack(setNames(df$Column, seq_along(df$Column)))
#   values ind
#1       1   1
#2       2   1
#3       4   1
#4       7   1
#5       9   1
#6       0   1
#7       5   2
#8       3   2
#9       8   2
# etc

Answer 3

这是获得相同结果的一种稍微快一点的方法：

df$Count <- nchar(gsub('[^,]', '', df$Column)) + 1

这个通过计算有多少逗号并添加1来起作用。

访问dataframe列中列表中的特定实例，并计算列表长度 - R.

3 个答案: