Question

当我使用read.table导入.csv文件时，调用df <- read.table("ModelSugar(new) real_thesis_experiment-table_1.csv", skip = 6, sep = ",", head = TRUE)并检查我得到的数据摘要（仅显示45个前3列）：

 X.run.number. scenario        configuration   
 Min.   :   1 "pessimistic":999994   "central":999994  
 1st Qu.: 650                                            
 Median :1299                                            
 Mean   :1299                                            
 3rd Qu.:1949                                            
 Max.   :2600

使用这个数据帧，我可以制作漂亮的图形。但是，我有80个.csv文件，总大小为40 GB，所以我只想导入特定的列。

我认为使用fread（来自data.table包）会更容易。所以我导入了5个列并将它们一起调整到一个数据帧中并调用

my.files <- list.files(pattern=".csv")
my.data <- lapply(my.files,fread, header = FALSE, select = c(1,2,3,25,29), sep=",") 
df <- do.call("rbind", my.data)

该数据框的摘要如下所示（显示5列中的4列：

[run number]         scenario         configuration         [step]         
 Length:999994      Length:999994      Length:999994      Length:999994     
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character

使用这个数据帧，我无法使用read.table制作图形。我想这与列的类有关＆＃39;值。

如何确保使用fread创建的数据帧与具有read.table的数据帧具有相同的特性，以便我可以制作我想要的图形？

修改

我发现当我第一次将excel中的.csv拆分成列然后使用sep =＆＃34 ;;＆＃34;而不是sep =＆＃34;，＆＃34;，它确实有效。奇怪......而且我不想手动将.csv文件转换为excel中的列。

Answer 1

您可以做的是使用write.csv读取一个文件并将该文件的10行保存为模板，然后您可以执行以下操作 -

## Getting your files using fread
dfshort <- read.table("ModelSugar(new) real_thesis_experiment-table_1.csv", skip = 6, sep = ",", nrows = 10, head = TRUE)
df_needed<-dfshort[1:10]
template <- subset(df_needed,select=c(columns_required)) ##select whatever cols you need

##Read you large files using fread
my.files <- list.files(pattern=".csv")
my.data <- lapply(my.files,fread, header = FALSE, select = c(1,2,3,25,29), sep=",") 
df <- do.call("rbind", my.data)

## changing cols types as per your template
result = data.frame(
  lapply(setNames(,names(template)), function(x) 
    if (x %in% names(df)) as(df[[x]], class(template[[x]])) 
    else template[[x]][NA_integer_]
  ), stringsAsFactors = FALSE)

然后，您可以使用它来绘图，因为它将具有使用write.csv获得的相同类类型。

试试这个

dfshort <- read.table("ModelSugar(new) real_thesis_experiment-table_1.csv", skip = 6, sep = ",", nrows = 10, head = TRUE)
    template <- copy(dfshort)
    my.files <- list.files(pattern=".csv")
    my.data <- lapply(my.files,fread, header = FALSE, colClasses = c(1,2,3,25,29), sep=",") 
    df <- do.call("rbind", my.data)

    result = data.frame(
      lapply(setNames(,names(template)), function(x) 
        if (x %in% names(df)) as(df[[x]], class(template[[x]])) 
        else template[[x]][NA_integer_]
      ), stringsAsFactors = FALSE)

Answer 2

dfshort的前5列（45列）如下所示：

   X.run.number.      scenario configuration biobased.chemical.industry
1              3 "pessimistic"     "central"    "modification-dominant"
2              2 "pessimistic"     "central"    "modification-dominant"
3              3 "pessimistic"     "central"    "modification-dominant"
4              4 "pessimistic"     "central"    "modification-dominant"
5              2 "pessimistic"     "central"    "modification-dominant"
6              1 "pessimistic"     "central"    "modification-dominant"
7              3 "pessimistic"     "central"    "modification-dominant"
8              3 "pessimistic"     "central"    "modification-dominant"
9              2 "pessimistic"     "central"    "modification-dominant"
10             4 "pessimistic"     "central"    "modification-dominant"
   distributed.sugar.factory.investment.costs
1                                    70000000
2                                    70000000
3                                    70000000
4                                    70000000
5                                    70000000
6                                    70000000
7                                    70000000

模板看起来像这样：

 run_number      scenario configuration tick financial_balance_SU
1          3 "pessimistic"     "central"    0                    0
2          2 "pessimistic"     "central"    0                    0
3          3 "pessimistic"     "central"    1                    0
4          4 "pessimistic"     "central"    0                    0
5          2 "pessimistic"     "central"    1                    0
6          1 "pessimistic"     "central"    0                    0

df看起来像这样：

   run_number        scenario configuration tick financial_balance_SU
1:      23377 ""pessimistic""     ""mixed""  200  6.079728695488823E9
2:      23377 ""pessimistic""     ""mixed""  201  6.079728695488823E9
3:      23378 ""pessimistic""     ""mixed""  192   9.10006561818864E9
4:      23377 ""pessimistic""     ""mixed""  202  6.079728695488823E9
5:      23377 ""pessimistic""     ""mixed""  203  6.079728695488823E9
6:      23378 ""pessimistic""     ""mixed""  193   9.10006561818864E9

修改

STR（dfshort）

'data.frame': 10 obs. of 45 variables: $ X.run.number. : int 3 2 3 4 2 1 3 3 2 4 $ scenario : Factor w/ 1 level "\"pessimistic\"": 1 1 1 1 1 1 1 1 1 1 $ configuration : Factor w/ 1 level "\"central\"": 1 1 1 1 1 1 1 1 1 1 $ biobased.chemical.industry : Factor w/ 1 level "\"modification-dominant\"": 1 1 1 1 1 1 1 1 1 1 $ distributed.sugar.factory.investment.costs : int 70000000 70000000 70000000 70000000 70000000 70000000 70000000 70000000 70000000 70000000 $ beet.syrups.factory.investment.costs : int 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 $ ethanol.factory.investment.costs : int 1500000 1500000 1500000 1500000 1500000 1500000 1500000 1500000 1500000 1500000 $ market.share.beet.syrups.increase : num 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 $ demand.beets.for.chemical.EU.increase : num 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 $ transport.costs : int 1 1 1 1 1 1 1 1 1 1 $ washing.at.farmer : Factor w/ 1 level "\"no\"": 1 1 1 1 1 1 1 1 1 1 $ beet.syrups.price.percentage.of.sugar.price : num 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 $ CO2.tax. : Factor w/ 1 level "\"yes\"": 1 1 1 1 1 1 1 1 1 1 $ sugar.tax : num 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 $ CO2.tax : int 13 13 13 13 13 13 13 13 13 13 $ market.share.increase.period : int 10 10 10 10 10 10 10 10 10 10 $ electricity.source : Factor w/ 1 level "\"conventional-mix\"": 1 1 1 1 1 1 1 1 1 1 $ white.sugar.price.EU.maximum : int 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 $ white.sugar.price.EU.minimum : int 200 200 200 200 200 200 200 200 200 200 $ beet.syrups.price.EU.maximum : int 500 500 500 500 500 500 500 500 500 500 $ beet.syrups.price.EU.minimum : int 100 100 100 100 100 100 100 100 100 100 $ ethanol.price.EU.maximum : int 2 2 2 2 2 2 2 2 2 2 $ ethanol.price.EU.minimum : num 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 $ years.taken.into.account : int 5 5 5 5 5 5 5 5 5 5 $ X.step. : int 0 0 1 0 1 0 2 3 2 1 $ financial.balance.farmers : num 0 0 0 0 0 ... $ diesel.use.farmers : int 0 0 0 0 0 0 0 0 0 0 $ N.use.farmers : int 0 0 0 0 0 0 0 0 0 0 $ financial.balance.SU : num 0 0 0 0 0 ... $ electricity.use.SU : int 0 0 0 0 0 0 0 0 0 0 $ financial.balance.central.sugar.factories : num 0 0 0 0 0 ... $ electricity.use.central.sugar.factories : int 0 0 0 0 0 0 0 0 0 0 $ financial.balance.distributed.sugar.factories : num 0 0 0 0 0 ... $ electricity.use.distributed.sugar.factories : int 0 0 0 0 0 0 0 0 0 0 $ financial.balance.beet.syrups.factories : int 0 0 0 0 0 0 0 0 0 0 $ electricity.use.beet.syrups.factories : int 0 0 0 0 0 0 0 0 0 0 $ financial.balance.ethanol.factories : int 0 0 0 0 0 0 0 0 0 0 $ electricity.use.ethanol.factories : int 0 0 0 0 0 0 0 0 0 0 $ transport.costs.yearly : num 0 0 0 0 0 ... $ diesel.use.total.transport : num 0 0 0 0 0 ... $ profit.per.tonne.sugar.beet.central.sugar.factory : num 0 0 0 0 0 ... $ profit.per.tonne.sugar.beet.distributed.sugar.factory: num 0 0 0 0 0 ... $ profit.per.tonne.sugar.beet.sugar.from.beet.syrups : int 0 0 0 0 0 0 0 0 0 0 $ profit.per.tonne.sugar.beet.beet.syrups.factory : int 0 0 0 0 0 0 0 0 0 0 $ profit.per.tonne.sugar.beet.ethanol.factory : num 0 0 0 0 0 ...

STR（DF）

Classes ‘data.table’ and 'data.frame': 19000000 obs. of 5 variables: $ run_number : chr "23377" "23377" "23378" "23377" ... $ scenario : chr "\"\"pessimistic\"\"" "\"\"pessimistic\"\"" "\"\"pessimistic\"\"" "\"\"pessimistic\"\"" ... $ configuration : chr "\"\"mixed\"\"" "\"\"mixed\"\"" "\"\"mixed\"\"" "\"\"mixed\"\"" ... $ tick : chr "200" "201" "192" "202" ... $ financial_balance_SU: chr "6.079728695488823E9" "6.079728695488823E9" "9.10006561818864E9" "6.079728695488823E9" ... - attr(*, ".internal.selfref")=<externalptr>

STR（模板）

'data.frame': 10 obs. of 5 variables: $ run_number : int 3 2 3 4 2 1 3 3 2 4 $ scenario : Factor w/ 1 level "\"pessimistic\"": 1 1 1 1 1 1 1 1 1 1 $ configuration : Factor w/ 1 level "\"central\"": 1 1 1 1 1 1 1 1 1 1 $ tick : int 0 0 1 0 1 0 2 3 2 1 $ financial_balance_SU: num 0 0 0 0 0 ...

导入fread与read.table和错误

2 个答案:

试试这个