Question

也许这个问题已被提出，但由于我的数据模式，我无法找到可靠的答案 - 希望答案很简单。我的轮询数据包含与此类似的列：

Sample
1000 RV
456 LV
678 A

我想取下这些字母，将它们放在一个单元格中，将数字放在另一个单元格中，使它看起来像这样：

Sample    Type
1000      RV
456       LV
678       A

如何在不逐个细胞的情况下简单地完成这项工作？

Answer 1

有很多方法可以实现这一点。

gsub

sample <- c("123ABC", "234CBA", "999ETC")

a <- gsub("[[:digit:]]","",sample)
b <- gsub("[^[:digit:]]", "", my.data)

stringr

library(stringr)
a  <- as.numeric(str_extract(sample, "[0-9]+"))
b  <- str_extract(my.data, "[aA-zZ]+")

Psidom在评论中提到的方式（我没有测试过，但我相信他）

Answer 2

这可以实现带有数字Sample列和字符Type列的data.frame，如您的示例所示。正如其他人所提到的，有很多方法可以实现这一目标。

sample <- c('1000      RV',
            '456       LV',
            '678       A')

A <- strsplit(sample, '\\s+')                # Split by whitespace. Returns a list
B <- unlist(A)                               # Converts 1:3 list to a 6x1 character vector
C <- matrix(B, ncol = 2, byrow = T)          # Convert 6x1 character vector to 3x2 matrix
D <- as.data.frame(C, stringsAsFactors = F)  # Convert matrix to data.frame so columns can be different types

# All together...
D <- as.data.frame(matrix(unlist(strsplit(sample, '\\s+')), ncol = 2, byrow = T),
                   stringsAsFactors = F)

D[ ,1] <- as.numeric(D[ ,1])         # Convert first column to numeric, second remains character
colnames(D) <- c('Sample', 'Type')   # Add column names

> D
  Sample Type
1   1000   RV
2    456   LV
3    678    A
> str(D)
'data.frame':   3 obs. of  2 variables:
 $ Sample: num  1000 456 678
 $ Type  : chr  "RV" "LV" "A"

Answer 3

我们可以使用sub

df1$Type <- sub("\\d+", "", df1$Sample)
df1$Type
#[1] "ABC" "CBA" "ETC"

如果我们需要它作为两列，可以使用tstrsplit中的data.table

library(data.table)
setDT(df1)[, setNames(tstrsplit(Sample, "\\s+"), c("Sample", "Type"))]

从R中的单元格中提取数字

3 个答案: