如何在csv文件中查找特定数据

时间:2018-10-18 13:14:31

标签: r csv import

我有很多csv文件,我想在其中找到一些数据。因为每个文件中的数据位置都不同,所以我想知道如何在不同的csv文件中的红色框中找到数据。

在csv文件中,它包含与不同月份相同的数据,我有一个想法是在csv文件中找到EnergyconsumptionElectricallyNaturalgasmonthly,然后反馈位置,根据位置选择红框数据。

enter image description here

2 个答案:

答案 0 :(得分:0)

我会阅读csv的内容并子集化您想要的术语。假设它们都具有相同的文件结构并且包含在同一文件夹中,则可以执行以下操作:

library(data.table) # library needed for fread, you can just use read.csv if you prefer

# create a list of the files in the folder
folder <- 'address_to_folder' # skip the last "/"
files <- list.files(path = folder, pattern="*.csv")

# read the files into a list and then transform it into a data.frame
mycsv <- lapply(paste(folder, pattern, sep = '/'), fread)
mydata <- rbindlist(mycsv)

# This part will need interpretation of the data frame, 
# you have to see where the column you want is, 
# if it is correctly formatted and how you can search it
search_result <- mydata[ mydata$column = 'search term', ]

答案 1 :(得分:0)

使用 readLines 逐行读取文件:

con <- file("temp2Table.csv", "r")
x <- readLines(con)
close(con)

然后找到需要子集的行:

grep("EnergyConsumptionElectricityNaturalGasMonthly", x)
# [1] 16534

一旦我们知道了行号,我们就可以按照下面的16行进行子集化,并且 将其写到文件中:

write(x[ grep("EnergyConsumptionElectricityNaturalGasMonthly", x) + 4:20 ], "tempOut.csv")

然后我们可以像普通的csv一样读取文件:

dfClean <- read.csv("tempOut.csv")

以及我们需要的子集列:

dfClean[, 2:3]
#                      X.1 ELECTRICITY.FACILITY..kWh.
# 1                January                   11675.57
# 2               February                    9148.04
# 3                  March                   13862.50
# 4                  April                   16274.57
# 5                    May                   23918.16
# 6                   June                   29293.78
# 7                   July                   32953.04
# 8                 August                   34111.54
# 9              September                   24398.53
# 10               October                   14577.93
# 11              November                   13931.94
# 12              December                   12137.73
# 13                                               NA
# 14 Annual Sum or Average                  236283.34
# 15     Minimum of Months                    9148.04
# 16     Maximum of Months                   34111.54