Question

我试图将csv文件读入R.问题是该文件有2个分隔符，我不知道如何将其作为3列数据帧读取;即第一个，第二个和一年。这是文件的样子示例：

</xsl:stylesheet>

我已经使用[Alin Deutsch, Mary F. Fernandez, 1998], [Alin Deutsch, Daniela Florescu, 1998],和fread()尝试了sep="["功能，但它不起作用，R只是将行读作1列向量感谢

Answer 1

您可以使用sep=","阅读该文件，然后删除额外的括号：

df <- read.csv(file = textConnection("[Alin Deutsch, Mary F. Fernandez, 1998],  
[Alin Deutsch, Daniela Florescu, 1998],"),stringsAsFactors=FALSE,head=FALSE)

df <- df[,-4]

df$V1 <- gsub("\\[","",df$V1)
df$V3 <- gsub("\\]","",df$V3)

names(df) <- c("first","second","year")
df

输出

         first             second  year
1 Alin Deutsch  Mary F. Fernandez  1998
2 Alin Deutsch   Daniela Florescu  1998

Answer 2

1）read.table / sub 使用sep = ","和comment.char = "]"阅读。这将拆分字段并删除尾随]及其后的所有内容，然后我们可以[从V1移除sub：

Lines <- "[Alin Deutsch, Mary F. Fernandez, 1998],  
[Alin Deutsch, Daniela Florescu, 1998],"

DF <- read.table(text = Lines, sep = ",", comment.char = "]", as.is = TRUE,
          strip.white = TRUE, # might not need this one
          col.names = c("Name1", "Name2", "Year"))
DF <- transform(DF, Name1 = sub("[", "", Name1, fixed = TRUE))

，并提供：

> DF
         Name1             Name2 Year
1 Alin Deutsch Mary F. Fernandez 1998
2 Alin Deutsch  Daniela Florescu 1998

2）read.pattern 另一种可能性是在gsubfn中使用read.pattern。这种模式假定每行以[，有三个逗号开头，最后一个有一个]开头。这与问题中的内容相对应，但如果不是这种情况，则需要更改正则表达式。

library(gsubfn)

read.pattern(text = Lines, pattern = ".(.*?),(.*?),(.*?).,", as.is = TRUE,
        strip.white = TRUE, # might not need this one
        col.names = c("Name1", "Name2", "Year"))

给予同样的。

Answer 3

我已经尝试过这个解决方案并且有效。

首先，我删除了＃34; [＆＃34;和＆＃34;]＆＃34;使用gsub

从文件中

clean <- function(x){
gsub("\\[|\\]","",x)  
}

sample_clean <- sapply(sample,clean)
head(sample_clean)

然后，我使用str_split_fixed

将列向量分成3个coumns

data <- str_split_fixed(sample_clean,",",3)

从R中的多个分离的csv文件中读取数据

3 个答案: