导入fread与read.table和错误

时间:2018-02-01 15:59:33

标签: r import read.table

当我使用read.table导入.csv文件时,调用df <- read.table("ModelSugar(new) real_thesis_experiment-table_1.csv", skip = 6, sep = ",", head = TRUE)并检查我得到的数据摘要(仅显示45个前3列):

 X.run.number. scenario        configuration   
 Min.   :   1 "pessimistic":999994   "central":999994  
 1st Qu.: 650                                            
 Median :1299                                            
 Mean   :1299                                            
 3rd Qu.:1949                                            
 Max.   :2600  

使用这个数据帧,我可以制作漂亮的图形。但是,我有80个.csv文件,总大小为40 GB,所以我只想导入特定的列。

我认为使用fread(来自data.table包)会更容易。所以我导入了5个列并将它们一起调整到一个数据帧中并调用

my.files <- list.files(pattern=".csv")
my.data <- lapply(my.files,fread, header = FALSE, select = c(1,2,3,25,29), sep=",") 
df <- do.call("rbind", my.data)

该数据框的摘要如下所示(显示5列中的4列:

[run number]         scenario         configuration         [step]         
 Length:999994      Length:999994      Length:999994      Length:999994     
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character 

使用这个数据帧,我无法使用read.table制作图形。我想这与列的类有关&#39;值。

如何确保使用fread创建的数据帧与具有read.table的数据帧具有相同的特性,以便我可以制作我想要的图形?

修改

我发现当我第一次将excel中的.csv拆分成列然后使用sep =&#34 ;;&#34;而不是sep =&#34;,&#34;,它确实有效。奇怪......而且我不想手动将.csv文件转换为excel中的列。

2 个答案:

答案 0 :(得分:0)

您可以做的是使用write.csv读取一个文件并将该文件的10行保存为模板,然后您可以执行以下操作 -

## Getting your files using fread
dfshort <- read.table("ModelSugar(new) real_thesis_experiment-table_1.csv", skip = 6, sep = ",", nrows = 10, head = TRUE)
df_needed<-dfshort[1:10]
template <- subset(df_needed,select=c(columns_required)) ##select whatever cols you need

##Read you large files using fread
my.files <- list.files(pattern=".csv")
my.data <- lapply(my.files,fread, header = FALSE, select = c(1,2,3,25,29), sep=",") 
df <- do.call("rbind", my.data)

## changing cols types as per your template
result = data.frame(
  lapply(setNames(,names(template)), function(x) 
    if (x %in% names(df)) as(df[[x]], class(template[[x]])) 
    else template[[x]][NA_integer_]
  ), stringsAsFactors = FALSE)

然后,您可以使用它来绘图,因为它将具有使用write.csv获得的相同类类型。

试试这个

dfshort <- read.table("ModelSugar(new) real_thesis_experiment-table_1.csv", skip = 6, sep = ",", nrows = 10, head = TRUE)
    template <- copy(dfshort)
    my.files <- list.files(pattern=".csv")
    my.data <- lapply(my.files,fread, header = FALSE, colClasses = c(1,2,3,25,29), sep=",") 
    df <- do.call("rbind", my.data)

    result = data.frame(
      lapply(setNames(,names(template)), function(x) 
        if (x %in% names(df)) as(df[[x]], class(template[[x]])) 
        else template[[x]][NA_integer_]
      ), stringsAsFactors = FALSE)

答案 1 :(得分:0)

dfshort的前5列(45列)如下所示:

   X.run.number.      scenario configuration biobased.chemical.industry
1              3 "pessimistic"     "central"    "modification-dominant"
2              2 "pessimistic"     "central"    "modification-dominant"
3              3 "pessimistic"     "central"    "modification-dominant"
4              4 "pessimistic"     "central"    "modification-dominant"
5              2 "pessimistic"     "central"    "modification-dominant"
6              1 "pessimistic"     "central"    "modification-dominant"
7              3 "pessimistic"     "central"    "modification-dominant"
8              3 "pessimistic"     "central"    "modification-dominant"
9              2 "pessimistic"     "central"    "modification-dominant"
10             4 "pessimistic"     "central"    "modification-dominant"
   distributed.sugar.factory.investment.costs
1                                    70000000
2                                    70000000
3                                    70000000
4                                    70000000
5                                    70000000
6                                    70000000
7                                    70000000

模板看起来像这样:

 run_number      scenario configuration tick financial_balance_SU
1          3 "pessimistic"     "central"    0                    0
2          2 "pessimistic"     "central"    0                    0
3          3 "pessimistic"     "central"    1                    0
4          4 "pessimistic"     "central"    0                    0
5          2 "pessimistic"     "central"    1                    0
6          1 "pessimistic"     "central"    0                    0

df看起来像这样:

   run_number        scenario configuration tick financial_balance_SU
1:      23377 ""pessimistic""     ""mixed""  200  6.079728695488823E9
2:      23377 ""pessimistic""     ""mixed""  201  6.079728695488823E9
3:      23378 ""pessimistic""     ""mixed""  192   9.10006561818864E9
4:      23377 ""pessimistic""     ""mixed""  202  6.079728695488823E9
5:      23377 ""pessimistic""     ""mixed""  203  6.079728695488823E9
6:      23378 ""pessimistic""     ""mixed""  193   9.10006561818864E9

修改

STR(dfshort)

'data.frame':   10 obs. of  45 variables:
 $ X.run.number.                                        : int  3 2 3 4 2 1 3 3 2 4
 $ scenario                                             : Factor w/ 1 level "\"pessimistic\"": 1 1 1 1 1 1 1 1 1 1
 $ configuration                                        : Factor w/ 1 level "\"central\"": 1 1 1 1 1 1 1 1 1 1
 $ biobased.chemical.industry                           : Factor w/ 1 level "\"modification-dominant\"": 1 1 1 1 1 1 1 1 1 1
 $ distributed.sugar.factory.investment.costs           : int  70000000 70000000 70000000 70000000 70000000 70000000 70000000 70000000 70000000 70000000
 $ beet.syrups.factory.investment.costs                 : int  1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000
 $ ethanol.factory.investment.costs                     : int  1500000 1500000 1500000 1500000 1500000 1500000 1500000 1500000 1500000 1500000
 $ market.share.beet.syrups.increase                    : num  0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002
 $ demand.beets.for.chemical.EU.increase                : num  0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
 $ transport.costs                                      : int  1 1 1 1 1 1 1 1 1 1
 $ washing.at.farmer                                    : Factor w/ 1 level "\"no\"": 1 1 1 1 1 1 1 1 1 1
 $ beet.syrups.price.percentage.of.sugar.price          : num  0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3
 $ CO2.tax.                                             : Factor w/ 1 level "\"yes\"": 1 1 1 1 1 1 1 1 1 1
 $ sugar.tax                                            : num  0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2
 $ CO2.tax                                              : int  13 13 13 13 13 13 13 13 13 13
 $ market.share.increase.period                         : int  10 10 10 10 10 10 10 10 10 10
 $ electricity.source                                   : Factor w/ 1 level "\"conventional-mix\"": 1 1 1 1 1 1 1 1 1 1
 $ white.sugar.price.EU.maximum                         : int  1000 1000 1000 1000 1000 1000 1000 1000 1000 1000
 $ white.sugar.price.EU.minimum                         : int  200 200 200 200 200 200 200 200 200 200
 $ beet.syrups.price.EU.maximum                         : int  500 500 500 500 500 500 500 500 500 500
 $ beet.syrups.price.EU.minimum                         : int  100 100 100 100 100 100 100 100 100 100
 $ ethanol.price.EU.maximum                             : int  2 2 2 2 2 2 2 2 2 2
 $ ethanol.price.EU.minimum                             : num  0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3
 $ years.taken.into.account                             : int  5 5 5 5 5 5 5 5 5 5
 $ X.step.                                              : int  0 0 1 0 1 0 2 3 2 1
 $ financial.balance.farmers                            : num  0 0 0 0 0 ...
 $ diesel.use.farmers                                   : int  0 0 0 0 0 0 0 0 0 0
 $ N.use.farmers                                        : int  0 0 0 0 0 0 0 0 0 0
 $ financial.balance.SU                                 : num  0 0 0 0 0 ...
 $ electricity.use.SU                                   : int  0 0 0 0 0 0 0 0 0 0
 $ financial.balance.central.sugar.factories            : num  0 0 0 0 0 ...
 $ electricity.use.central.sugar.factories              : int  0 0 0 0 0 0 0 0 0 0
 $ financial.balance.distributed.sugar.factories        : num  0 0 0 0 0 ...
 $ electricity.use.distributed.sugar.factories          : int  0 0 0 0 0 0 0 0 0 0
 $ financial.balance.beet.syrups.factories              : int  0 0 0 0 0 0 0 0 0 0
 $ electricity.use.beet.syrups.factories                : int  0 0 0 0 0 0 0 0 0 0
 $ financial.balance.ethanol.factories                  : int  0 0 0 0 0 0 0 0 0 0
 $ electricity.use.ethanol.factories                    : int  0 0 0 0 0 0 0 0 0 0
 $ transport.costs.yearly                               : num  0 0 0 0 0 ...
 $ diesel.use.total.transport                           : num  0 0 0 0 0 ...
 $ profit.per.tonne.sugar.beet.central.sugar.factory    : num  0 0 0 0 0 ...
 $ profit.per.tonne.sugar.beet.distributed.sugar.factory: num  0 0 0 0 0 ...
 $ profit.per.tonne.sugar.beet.sugar.from.beet.syrups   : int  0 0 0 0 0 0 0 0 0 0
 $ profit.per.tonne.sugar.beet.beet.syrups.factory      : int  0 0 0 0 0 0 0 0 0 0
 $ profit.per.tonne.sugar.beet.ethanol.factory          : num  0 0 0 0 0 ...

STR(DF)

Classes ‘data.table’ and 'data.frame':  19000000 obs. of  5 variables:
 $ run_number          : chr  "23377" "23377" "23378" "23377" ...
 $ scenario            : chr  "\"\"pessimistic\"\"" "\"\"pessimistic\"\"" "\"\"pessimistic\"\"" "\"\"pessimistic\"\"" ...
 $ configuration       : chr  "\"\"mixed\"\"" "\"\"mixed\"\"" "\"\"mixed\"\"" "\"\"mixed\"\"" ...
 $ tick                : chr  "200" "201" "192" "202" ...
 $ financial_balance_SU: chr  "6.079728695488823E9" "6.079728695488823E9" "9.10006561818864E9" "6.079728695488823E9" ...
 - attr(*, ".internal.selfref")=<externalptr> 

STR(模板)

'data.frame':   10 obs. of  5 variables:
 $ run_number          : int  3 2 3 4 2 1 3 3 2 4
 $ scenario            : Factor w/ 1 level "\"pessimistic\"": 1 1 1 1 1 1 1 1 1 1
 $ configuration       : Factor w/ 1 level "\"central\"": 1 1 1 1 1 1 1 1 1 1
 $ tick                : int  0 0 1 0 1 0 2 3 2 1
 $ financial_balance_SU: num  0 0 0 0 0 ...