Question

所以，我正在使用一些软件将数据作为csv文件输出，格式如下：

# Parameter 1
ID,Col1,Col2,Col3
1,a,b,c
2,d,e,f
3,g,h,i
[...]
j,x,y,z

# Parameter 2
ID,Col1,Col2,Col3
1,a,b,c
2,d,e,f
3,g,h,i
[...]
k,x,y,z

# Parameter 3
ID,Col1,Col2,Col3
1,a,b,c
2,d,e,f
3,g,h,i
[...]
n,x,y,z

如果我需要读取参数1的第10行，我会使用read.csv('file.csv', header=FALSE, skip=10, nrows=1)，这会给我我想要的东西。但是，如果我想读到参数2的第10次观察，我不知道要分配跳过什么整数，因为参数1中的观察数量可变。如果我能找出这条线，我可以解决这个问题与字符串"# Parameter 2"匹配的数字。我该怎么做？

Answer 1

您可以使用readLines

# Assuming that what indicates the
#  start of param2 is the follwing line
param2.indic <- "# Parameter 2"


# read in the raw file
lines <- readLines("path\to\file.csv")

# find the start of parameter 2
p2.start <- grep(param2.indic, lines)

# go down n+2 lines from p2.start
n <- 10  # which line to find
lines[p2.start + n + 2]

Answer 2

你可以阅读这些行，直到你找到匹配的那一行，然后从那里开始。

示例：我读了一行，直到我得到一个匹配。在这种情况下，我的文件有一个长的多行标题，我需要跳过，然后是一个普通的电子表格式csv。我正在寻找标题行，我知道以“Sample_ID”作为第一个元素。

csvreader = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in csvreader:
    if row[0].strip() == 'Sample_ID':
        header = row
        break

既然我已将行排队到标题行，我可以按照自己的意愿处理文件的其余部分：

sample_ids = []
for row in csvreader:
    sample_ids.append(row[0])

R：读入csv，找到第一行匹配模式

2 个答案: