从字符串中提取所有单词并创建包含结果的列

时间:2016-09-24 22:04:56

标签: r string dataframe extract alphanumeric

我有一个数据框(data3),其中一列名为"收集器"。在此列中,我有字母数字字符。例如:" Ruiz和Galvis 650"。我需要分别提取字母字符和数字字符,并创建两个新列,一个包含该字符串的数字(ColID),另一个包含所有单词(Col):

INPUT:

Collector                       Times     Sample
Ruiz and Galvis 650             9         SP.1              
Smith et al 469                 8         SP.1

预期输出

Collector                       Times     Sample     ColID    Col
Ruiz and Galvis 650             9         SP.1        650     Ruiz and Galvis
Smith et al 469                 8         SP.1        469     Smith et al

我尝试了以下但是当我尝试保存文件时出现错误(错误在.External2(C_writetable,x,file,nrow(x),p,rnames,sep,eol,:   未实现的类型' list'在' EncodeElement')中:

regexp <- "[[:digit:]]+"
data3$colID<- NA
data3$colID <- str_extract (data3$Collector, regexp)

data3$Col<- NA
regexp <-"[[:alpha:]]+"
data3$Col <- (str_extract_all (data3$Collector, regexp))
write.table(data3, file = paste("borrar2",".csv", sep=""), quote=T, sep = ",", row.names = F)

2 个答案:

答案 0 :(得分:2)

问题是File dbFile = new File(PATH_DB); FileInputStream fileInputStream = new FileInputStream(PATH_DB); FileOutputStream outputStream = new FileOutputStream(PATH_BKP); byte[] s = Arrays.copyOf(KEY_DATABASE.getBytes(),16); SecretKeySpec sks = new SecretKeySpec(s, "AES"); Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5PADDING"); cipher.init(Cipher.ENCRYPT_MODE, sks); CipherOutputStream cos = new CipherOutputStream(outputStream, cipher); //Transferencia dos dados do inputfile para o output byte[] buffer = new byte[1024]; int length; while ((length = fileInputStream.read(buffer))!= -1) { cos.write(buffer,0,length); } //Fecha as streams cos.flush(); cos.close(); fileInputStream.close(); 找不到一个字符串,而是一个多个列表。例如:

FileInputStream fis = new FileInputStream(PATH_BKP);

FileOutputStream fos = new FileOutputStream(PATH_DB);

byte[] s = Arrays.copyOf(KEY_DATABASE.getBytes(),16);
SecretKeySpec sks = new SecretKeySpec(s, "AES");

Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5PADDING");
cipher.init(Cipher.DECRYPT_MODE, sks);

CipherInputStream cis = new CipherInputStream (fis, cipher);

byte[] buffer = new byte[1024];
int length;
while ((length = cis.read(buffer)) != -1) {
    fos.write(buffer, 0, length);
}

fos.flush();
fos.close();
cis.close();

具有嵌套元素的数据框(如上所示)显然无法保存到文件中。

但是,如果您更新正则表达式模式以匹配空格和字母,则可以返回使用str_extract_all代替:

> dput(str_extract_all("Ruiz and Galvis 650", "[[:alpha:]]+"))
list(c("Ruiz", "and", "Galvis"))

注意第二个正则表达式中的空格。这会将所有字母/空格匹配为一个字符串,并允许您将data.frame写入文件。

答案 1 :(得分:0)

如果您的数据与示例显示的一样统一,那么这是另一种选择:

library(stringi)
library(purrr)
library(dplyr)

df <- data.frame(Collector=c("Ruiz and Galvis 650", "Smith et al 469"),
                 Times=c(9, 8),
                 Sample=c("SP.1", "SP.1"),
                 stringsAsFactors=FALSE)

stri_match_first(df$Collector, regex="([[:alpha:][:space:]]+) ([[:digit:]]+)") %>% 
  as.data.frame(stringsAsFactors=FALSE) %>% 
  select(Col=V2, ColID=V3) %>% 
  bind_cols(df) %>% 
  select(-Collector)
##               Col ColID Times Sample
## 1 Ruiz and Galvis   650     9   SP.1
## 2     Smith et al   469     8   SP.1