当我使用read.table导入.csv文件时,调用df <- read.table("ModelSugar(new) real_thesis_experiment-table_1.csv", skip = 6, sep = ",", head = TRUE)
并检查我得到的数据摘要(仅显示45个前3列):
X.run.number. scenario configuration
Min. : 1 "pessimistic":999994 "central":999994
1st Qu.: 650
Median :1299
Mean :1299
3rd Qu.:1949
Max. :2600
使用这个数据帧,我可以制作漂亮的图形。但是,我有80个.csv文件,总大小为40 GB,所以我只想导入特定的列。
我认为使用fread
(来自data.table包)会更容易。所以我导入了5个列并将它们一起调整到一个数据帧中并调用
my.files <- list.files(pattern=".csv")
my.data <- lapply(my.files,fread, header = FALSE, select = c(1,2,3,25,29), sep=",")
df <- do.call("rbind", my.data)
该数据框的摘要如下所示(显示5列中的4列:
[run number] scenario configuration [step]
Length:999994 Length:999994 Length:999994 Length:999994
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
使用这个数据帧,我无法使用read.table制作图形。我想这与列的类有关&#39;值。
如何确保使用fread创建的数据帧与具有read.table的数据帧具有相同的特性,以便我可以制作我想要的图形?
修改
我发现当我第一次将excel中的.csv拆分成列然后使用sep =&#34 ;;&#34;而不是sep =&#34;,&#34;,它确实有效。奇怪......而且我不想手动将.csv文件转换为excel中的列。
答案 0 :(得分:0)
您可以做的是使用write.csv读取一个文件并将该文件的10行保存为模板,然后您可以执行以下操作 -
## Getting your files using fread
dfshort <- read.table("ModelSugar(new) real_thesis_experiment-table_1.csv", skip = 6, sep = ",", nrows = 10, head = TRUE)
df_needed<-dfshort[1:10]
template <- subset(df_needed,select=c(columns_required)) ##select whatever cols you need
##Read you large files using fread
my.files <- list.files(pattern=".csv")
my.data <- lapply(my.files,fread, header = FALSE, select = c(1,2,3,25,29), sep=",")
df <- do.call("rbind", my.data)
## changing cols types as per your template
result = data.frame(
lapply(setNames(,names(template)), function(x)
if (x %in% names(df)) as(df[[x]], class(template[[x]]))
else template[[x]][NA_integer_]
), stringsAsFactors = FALSE)
然后,您可以使用它来绘图,因为它将具有使用write.csv获得的相同类类型。
dfshort <- read.table("ModelSugar(new) real_thesis_experiment-table_1.csv", skip = 6, sep = ",", nrows = 10, head = TRUE)
template <- copy(dfshort)
my.files <- list.files(pattern=".csv")
my.data <- lapply(my.files,fread, header = FALSE, colClasses = c(1,2,3,25,29), sep=",")
df <- do.call("rbind", my.data)
result = data.frame(
lapply(setNames(,names(template)), function(x)
if (x %in% names(df)) as(df[[x]], class(template[[x]]))
else template[[x]][NA_integer_]
), stringsAsFactors = FALSE)
答案 1 :(得分:0)
dfshort的前5列(45列)如下所示:
X.run.number. scenario configuration biobased.chemical.industry
1 3 "pessimistic" "central" "modification-dominant"
2 2 "pessimistic" "central" "modification-dominant"
3 3 "pessimistic" "central" "modification-dominant"
4 4 "pessimistic" "central" "modification-dominant"
5 2 "pessimistic" "central" "modification-dominant"
6 1 "pessimistic" "central" "modification-dominant"
7 3 "pessimistic" "central" "modification-dominant"
8 3 "pessimistic" "central" "modification-dominant"
9 2 "pessimistic" "central" "modification-dominant"
10 4 "pessimistic" "central" "modification-dominant"
distributed.sugar.factory.investment.costs
1 70000000
2 70000000
3 70000000
4 70000000
5 70000000
6 70000000
7 70000000
模板看起来像这样:
run_number scenario configuration tick financial_balance_SU
1 3 "pessimistic" "central" 0 0
2 2 "pessimistic" "central" 0 0
3 3 "pessimistic" "central" 1 0
4 4 "pessimistic" "central" 0 0
5 2 "pessimistic" "central" 1 0
6 1 "pessimistic" "central" 0 0
df看起来像这样:
run_number scenario configuration tick financial_balance_SU
1: 23377 ""pessimistic"" ""mixed"" 200 6.079728695488823E9
2: 23377 ""pessimistic"" ""mixed"" 201 6.079728695488823E9
3: 23378 ""pessimistic"" ""mixed"" 192 9.10006561818864E9
4: 23377 ""pessimistic"" ""mixed"" 202 6.079728695488823E9
5: 23377 ""pessimistic"" ""mixed"" 203 6.079728695488823E9
6: 23378 ""pessimistic"" ""mixed"" 193 9.10006561818864E9
修改强>
STR(dfshort)
'data.frame': 10 obs. of 45 variables:
$ X.run.number. : int 3 2 3 4 2 1 3 3 2 4
$ scenario : Factor w/ 1 level "\"pessimistic\"": 1 1 1 1 1 1 1 1 1 1
$ configuration : Factor w/ 1 level "\"central\"": 1 1 1 1 1 1 1 1 1 1
$ biobased.chemical.industry : Factor w/ 1 level "\"modification-dominant\"": 1 1 1 1 1 1 1 1 1 1
$ distributed.sugar.factory.investment.costs : int 70000000 70000000 70000000 70000000 70000000 70000000 70000000 70000000 70000000 70000000
$ beet.syrups.factory.investment.costs : int 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000
$ ethanol.factory.investment.costs : int 1500000 1500000 1500000 1500000 1500000 1500000 1500000 1500000 1500000 1500000
$ market.share.beet.syrups.increase : num 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002
$ demand.beets.for.chemical.EU.increase : num 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
$ transport.costs : int 1 1 1 1 1 1 1 1 1 1
$ washing.at.farmer : Factor w/ 1 level "\"no\"": 1 1 1 1 1 1 1 1 1 1
$ beet.syrups.price.percentage.of.sugar.price : num 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3
$ CO2.tax. : Factor w/ 1 level "\"yes\"": 1 1 1 1 1 1 1 1 1 1
$ sugar.tax : num 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2
$ CO2.tax : int 13 13 13 13 13 13 13 13 13 13
$ market.share.increase.period : int 10 10 10 10 10 10 10 10 10 10
$ electricity.source : Factor w/ 1 level "\"conventional-mix\"": 1 1 1 1 1 1 1 1 1 1
$ white.sugar.price.EU.maximum : int 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000
$ white.sugar.price.EU.minimum : int 200 200 200 200 200 200 200 200 200 200
$ beet.syrups.price.EU.maximum : int 500 500 500 500 500 500 500 500 500 500
$ beet.syrups.price.EU.minimum : int 100 100 100 100 100 100 100 100 100 100
$ ethanol.price.EU.maximum : int 2 2 2 2 2 2 2 2 2 2
$ ethanol.price.EU.minimum : num 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3
$ years.taken.into.account : int 5 5 5 5 5 5 5 5 5 5
$ X.step. : int 0 0 1 0 1 0 2 3 2 1
$ financial.balance.farmers : num 0 0 0 0 0 ...
$ diesel.use.farmers : int 0 0 0 0 0 0 0 0 0 0
$ N.use.farmers : int 0 0 0 0 0 0 0 0 0 0
$ financial.balance.SU : num 0 0 0 0 0 ...
$ electricity.use.SU : int 0 0 0 0 0 0 0 0 0 0
$ financial.balance.central.sugar.factories : num 0 0 0 0 0 ...
$ electricity.use.central.sugar.factories : int 0 0 0 0 0 0 0 0 0 0
$ financial.balance.distributed.sugar.factories : num 0 0 0 0 0 ...
$ electricity.use.distributed.sugar.factories : int 0 0 0 0 0 0 0 0 0 0
$ financial.balance.beet.syrups.factories : int 0 0 0 0 0 0 0 0 0 0
$ electricity.use.beet.syrups.factories : int 0 0 0 0 0 0 0 0 0 0
$ financial.balance.ethanol.factories : int 0 0 0 0 0 0 0 0 0 0
$ electricity.use.ethanol.factories : int 0 0 0 0 0 0 0 0 0 0
$ transport.costs.yearly : num 0 0 0 0 0 ...
$ diesel.use.total.transport : num 0 0 0 0 0 ...
$ profit.per.tonne.sugar.beet.central.sugar.factory : num 0 0 0 0 0 ...
$ profit.per.tonne.sugar.beet.distributed.sugar.factory: num 0 0 0 0 0 ...
$ profit.per.tonne.sugar.beet.sugar.from.beet.syrups : int 0 0 0 0 0 0 0 0 0 0
$ profit.per.tonne.sugar.beet.beet.syrups.factory : int 0 0 0 0 0 0 0 0 0 0
$ profit.per.tonne.sugar.beet.ethanol.factory : num 0 0 0 0 0 ...
STR(DF)
Classes ‘data.table’ and 'data.frame': 19000000 obs. of 5 variables:
$ run_number : chr "23377" "23377" "23378" "23377" ...
$ scenario : chr "\"\"pessimistic\"\"" "\"\"pessimistic\"\"" "\"\"pessimistic\"\"" "\"\"pessimistic\"\"" ...
$ configuration : chr "\"\"mixed\"\"" "\"\"mixed\"\"" "\"\"mixed\"\"" "\"\"mixed\"\"" ...
$ tick : chr "200" "201" "192" "202" ...
$ financial_balance_SU: chr "6.079728695488823E9" "6.079728695488823E9" "9.10006561818864E9" "6.079728695488823E9" ...
- attr(*, ".internal.selfref")=<externalptr>
STR(模板)
'data.frame': 10 obs. of 5 variables:
$ run_number : int 3 2 3 4 2 1 3 3 2 4
$ scenario : Factor w/ 1 level "\"pessimistic\"": 1 1 1 1 1 1 1 1 1 1
$ configuration : Factor w/ 1 level "\"central\"": 1 1 1 1 1 1 1 1 1 1
$ tick : int 0 0 1 0 1 0 2 3 2 1
$ financial_balance_SU: num 0 0 0 0 0 ...