我有很多csv文件,我想在其中找到一些数据。因为每个文件中的数据位置都不同,所以我想知道如何在不同的csv文件中的红色框中找到数据。
在csv文件中,它包含与不同月份相同的数据,我有一个想法是在csv文件中找到EnergyconsumptionElectricallyNaturalgasmonthly
,然后反馈位置,根据位置选择红框数据。
答案 0 :(得分:0)
我会阅读csv的内容并子集化您想要的术语。假设它们都具有相同的文件结构并且包含在同一文件夹中,则可以执行以下操作:
library(data.table) # library needed for fread, you can just use read.csv if you prefer
# create a list of the files in the folder
folder <- 'address_to_folder' # skip the last "/"
files <- list.files(path = folder, pattern="*.csv")
# read the files into a list and then transform it into a data.frame
mycsv <- lapply(paste(folder, pattern, sep = '/'), fread)
mydata <- rbindlist(mycsv)
# This part will need interpretation of the data frame,
# you have to see where the column you want is,
# if it is correctly formatted and how you can search it
search_result <- mydata[ mydata$column = 'search term', ]
答案 1 :(得分:0)
使用 readLines 逐行读取文件:
con <- file("temp2Table.csv", "r")
x <- readLines(con)
close(con)
然后找到需要子集的行:
grep("EnergyConsumptionElectricityNaturalGasMonthly", x)
# [1] 16534
一旦我们知道了行号,我们就可以按照下面的16行进行子集化,并且 将其写到文件中:
write(x[ grep("EnergyConsumptionElectricityNaturalGasMonthly", x) + 4:20 ], "tempOut.csv")
然后我们可以像普通的csv一样读取文件:
dfClean <- read.csv("tempOut.csv")
以及我们需要的子集列:
dfClean[, 2:3]
# X.1 ELECTRICITY.FACILITY..kWh.
# 1 January 11675.57
# 2 February 9148.04
# 3 March 13862.50
# 4 April 16274.57
# 5 May 23918.16
# 6 June 29293.78
# 7 July 32953.04
# 8 August 34111.54
# 9 September 24398.53
# 10 October 14577.93
# 11 November 13931.94
# 12 December 12137.73
# 13 NA
# 14 Annual Sum or Average 236283.34
# 15 Minimum of Months 9148.04
# 16 Maximum of Months 34111.54