复杂列表到数据框

时间:2017-05-18 21:47:10

标签: r list dataframe dplyr rbind

这似乎是一个反复出现的问题,但我一直在阅读StackOverflow几个小时但无法找到解决方案,所以这就是:

我有一个名为 D2 的28个元素的列表,这是一般结构。它是飞行数据,因此每个元素中的观测数量和每个元素的变量数量不同。

> str(D2, max.level = 1)
List of 28
 $ flightStatuses: list()
 $ flightStatuses: list()
 $ flightStatuses:'data.frame': 5 obs. of  12 variables:
 $ flightStatuses:'data.frame': 4 obs. of  12 variables:
 $ flightStatuses:'data.frame': 1 obs. of  11 variables:
 $ flightStatuses:'data.frame': 3 obs. of  12 variables:
 $ flightStatuses:'data.frame': 10 obs. of  15 variables:
 $ flightStatuses:'data.frame': 1 obs. of  12 variables:
 $ flightStatuses: list()
 $ flightStatuses:'data.frame': 2 obs. of  11 variables:
etc.

我正在尝试将内容放入数据框中,然后将其保存到csv。

以下是列表第三个元素的结构,例如:

> str(D2[[3]])
'data.frame':   5 obs. of  12 variables:
 $ flightId              : int  891368844 889954328 889955975 891364679 891364678
 $ carrierFsCode         : chr  "4K" "4N" "5T" "6L" ...
 $ flightNumber          : chr  "901" "207" "444" "414" ...
 $ departureAirportFsCode: chr  "ZFM" "YDA" "YVQ" "ZFM" ...
 $ arrivalAirportFsCode  : chr  "YEV" "YEV" "YEV" "YEV" ...
 $ departureDate         :'data.frame': 5 obs. of  2 variables:
  ..$ dateLocal: chr  "2017-05-11T09:00:00.000" "2017-05-11T09:55:00.000" "2017-05-11T12:30:00.000" "2017-05-11T16:00:00.000" ...
  ..$ dateUtc  : chr  "2017-05-11T15:00:00.000Z" "2017-05-11T16:55:00.000Z" "2017-05-11T18:30:00.000Z" "2017-05-11T22:00:00.000Z" ...
 $ arrivalDate           :'data.frame': 5 obs. of  2 variables:
  ..$ dateLocal: chr  "2017-05-11T15:45:00.000" "2017-05-11T12:10:00.000" "2017-05-11T13:28:00.000" "2017-05-11T16:37:00.000" ...
  ..$ dateUtc  : chr  "2017-05-11T21:45:00.000Z" "2017-05-11T18:10:00.000Z" "2017-05-11T19:28:00.000Z" "2017-05-11T22:37:00.000Z" ...
 $ status                : chr  "L" "U" "L" "U" ...
 $ operationalTimes      :'data.frame': 5 obs. of  14 variables:
  ..$ scheduledGateDeparture    :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  "2017-05-11T09:00:00.000" "2017-05-11T09:55:00.000" "2017-05-11T12:30:00.000" "2017-05-11T16:00:00.000" ...
  .. ..$ dateUtc  : chr  "2017-05-11T15:00:00.000Z" "2017-05-11T16:55:00.000Z" "2017-05-11T18:30:00.000Z" "2017-05-11T22:00:00.000Z" ...
  ..$ estimatedRunwayDeparture  :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  "2017-05-11T15:24:00.000" "2017-05-11T09:53:00.000" "2017-05-11T12:27:00.000" NA ...
  .. ..$ dateUtc  : chr  "2017-05-11T21:24:00.000Z" "2017-05-11T16:53:00.000Z" "2017-05-11T18:27:00.000Z" NA ...
  ..$ actualRunwayDeparture     :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  "2017-05-11T15:24:00.000" "2017-05-11T09:53:00.000" "2017-05-11T12:27:00.000" NA ...
  .. ..$ dateUtc  : chr  "2017-05-11T21:24:00.000Z" "2017-05-11T16:53:00.000Z" "2017-05-11T18:27:00.000Z" NA ...
  ..$ estimatedRunwayArrival    :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  "2017-05-11T15:45:00.000" "2017-05-11T12:11:00.000" "2017-05-11T13:12:00.000" NA ...
  .. ..$ dateUtc  : chr  "2017-05-11T21:45:00.000Z" "2017-05-11T18:11:00.000Z" "2017-05-11T19:12:00.000Z" NA ...
  ..$ actualRunwayArrival       :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  "2017-05-11T15:45:00.000" NA "2017-05-11T13:12:00.000" NA ...
  .. ..$ dateUtc  : chr  "2017-05-11T21:45:00.000Z" NA "2017-05-11T19:12:00.000Z" NA ...
  ..$ publishedDeparture        :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  NA "2017-05-11T09:55:00.000" "2017-05-11T12:30:00.000" NA ...
  .. ..$ dateUtc  : chr  NA "2017-05-11T16:55:00.000Z" "2017-05-11T18:30:00.000Z" NA ...
  ..$ publishedArrival          :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  NA "2017-05-11T12:10:00.000" "2017-05-11T13:28:00.000" NA ...
  .. ..$ dateUtc  : chr  NA "2017-05-11T18:10:00.000Z" "2017-05-11T19:28:00.000Z" NA ...
  ..$ flightPlanPlannedDeparture:'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  NA "2017-05-11T10:05:00.000" "2017-05-11T12:40:00.000" "2017-05-11T16:15:00.000" ...
  .. ..$ dateUtc  : chr  NA "2017-05-11T17:05:00.000Z" "2017-05-11T18:40:00.000Z" "2017-05-11T22:15:00.000Z" ...
  ..$ scheduledGateArrival      :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  NA "2017-05-11T12:10:00.000" "2017-05-11T13:28:00.000" NA ...
  .. ..$ dateUtc  : chr  NA "2017-05-11T18:10:00.000Z" "2017-05-11T19:28:00.000Z" NA ...
  ..$ flightPlanPlannedArrival  :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  NA "2017-05-11T12:25:00.000" "2017-05-11T13:24:00.000" "2017-05-11T16:37:00.000" ...
  .. ..$ dateUtc  : chr  NA "2017-05-11T18:25:00.000Z" "2017-05-11T19:24:00.000Z" "2017-05-11T22:37:00.000Z" ...
  ..$ estimatedGateDeparture    :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  NA NA "2017-05-11T12:20:00.000" NA ...
  .. ..$ dateUtc  : chr  NA NA "2017-05-11T18:20:00.000Z" NA ...
  ..$ actualGateDeparture       :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  NA NA "2017-05-11T12:20:00.000" NA ...
  .. ..$ dateUtc  : chr  NA NA "2017-05-11T18:20:00.000Z" NA ...
  ..$ estimatedGateArrival      :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  NA NA "2017-05-11T13:15:00.000" NA ...
  .. ..$ dateUtc  : chr  NA NA "2017-05-11T19:15:00.000Z" NA ...
  ..$ actualGateArrival         :'data.frame':  5 obs. of  2 variables:
  .. ..$ dateLocal: chr  NA NA "2017-05-11T13:15:00.000" NA ...
  .. ..$ dateUtc  : chr  NA NA "2017-05-11T19:15:00.000Z" NA ...
 $ flightDurations       :'data.frame': 5 obs. of  8 variables:
  ..$ airMinutes             : int  21 NA 45 NA NA
  ..$ scheduledBlockMinutes  : int  NA 75 58 NA NA
  ..$ scheduledAirMinutes    : int  NA 80 44 22 26
  ..$ scheduledTaxiOutMinutes: int  NA 10 10 15 15
  ..$ blockMinutes           : int  NA NA 55 NA NA
  ..$ taxiOutMinutes         : int  NA NA 7 NA NA
  ..$ scheduledTaxiInMinutes : int  NA NA 4 NA NA
  ..$ taxiInMinutes          : int  NA NA 3 NA NA
 $ flightEquipment       :'data.frame': 5 obs. of  2 variables:
  ..$ actualEquipmentIataCode   : chr  "BE1" "HS7" "733" "DHT" ...
  ..$ scheduledEquipmentIataCode: chr  NA "HS7" "733" NA ...
 $ schedule              :'data.frame': 5 obs. of  4 variables:
  ..$ flightType    : chr  NA "J" "J" NA ...
  ..$ serviceClasses: chr  NA "RY" "RFJY" NA ...
  ..$ restrictions  : chr  NA "" "" NA ...
  ..$ uplines       :List of 5
  .. ..$ : NULL
  .. ..$ :'data.frame': 1 obs. of  2 variables:
  .. .. ..$ fsCode  : chr "YXY"
  .. .. ..$ flightId: int 889956597
  .. ..$ :'data.frame': 2 obs. of  2 variables:
  .. .. ..$ fsCode  : chr  "YEG" "YZF"
  .. .. ..$ flightId: int  889954472 889957614
  .. ..$ : NULL
  .. ..$ : NULL

如您所见,每个数据框中都有多个数据框,这些数据框是列表的一个元素。我已经阅读了这些帖子,试图将所有这些内容放到数据框中。

  1. R list to data frame但是这个列表的长度相等

  2. https://www.r-bloggers.com/concatenating-a-list-of-data-frames/但即使我在列表中的孤立元素上尝试它,就像上面的示例一样,我也会遇到如下错误:

    df<-ldply(D2[[3]], rbind)
    Error in list_to_dataframe(res, attr(.data, "split_labels"), .id, id_as_factor) : 
    Results must be all atomic, or all data frames
    
    > df<-do.call(rbind, D2[[3]])
    Error in rbind(deparse.level, ...) : 
      les nombres de colonnes des arguments ne correspondent pas _Number of columns doesn't correspond|
    
  3. Extracting from Nested list to data frame这个似乎很有希望,但在解释方式上过于复杂。我是R的初学者,所以我需要更多的人类语言。

  4. Converting nested list (unequal length) to data frame这个是带有命名的向量而不是数据帧。当我从@MrFlick尝试解决方案时,我明白了:

    > df <- rbind.fill(lapply(D2, function(x)as.data.frame(t(x))))
    Error in if (inherits(X[[j]], "data.frame") && ncol(xj) > 1L) X[[j]] <- as.matrix(X[[j]]) : 
      valeur manquante là où TRUE / FALSE est requis
    Called from: as.matrix.data.frame(x)
    
  5. Converting nested list (unequal length) to data frame当我尝试@ akrun的回答时,我得到了:

    > indx<-lengths(D2)
    > res<-as.data.frame(do.call(rbind, lapply(D2, `length<-`,max(indx))))
    Error in rbind(deparse.level, ...) : 
      liste d'arguments incorrecte : toutes les variables doivent avoir la même longueur
    > colnames(res)<-names(D2[[which.max(indx)]])
    
  6. List elements to dataframes in R当我尝试@David Arenburg回答:

    > lapply(D2, as.data.frame.list)
    Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
      les arguments impliquent des nombres de lignes différents : 0, 1
    
  7. 其他试验:

    rbind.fill(D2[[3]])
    

    它与

    具有相同的输出
    D2[[3]]
    

    当我尝试使用最后一个输出并将其写入csv时,认为它可能更容易处理,我得到了这个:

    > write.csv(D6, file = FlightStats.csv)_#D6=D2[[3]]_
    Error in if (inherits(X[[j]], "data.frame") && ncol(xj) > 1L) X[[j]] <- as.matrix(X[[j]]) : 
      valeur manquante là où TRUE / FALSE est requis
    

    同样在整个列表和列表的一个元素上尝试melt,结果相同:

    > melt(D2) _OR_ melt(D2[[3]]
    Using carrierFsCode, flightNumber, departureAirportFsCode, arrivalAirportFsCode, status as id variables
    Error in eval(substitute(expr), envir, enclos) : 
      Can't melt data.frames with non-atomic 'measure' columns
    De plus : Warning message:
    attributes are not identical across measure variables; they will be dropped
    

    as.data.framerbindliststack也会返回错误消息。

    我尝试将列表中的一个元素分配给一个变量,再次使用列表中的其他元素,然后使用{{1}将这两个元素组合在一起(仔细选择具有相同数量的变量)我仍然得到错误。

    rbind

    它告诉我行名称是重复的,但是:

    > rbind(D4, D5)
    Error in `row.names<-.data.frame`(`*tmp*`, value = value) : 
      les duplications dans 'row.names' ne sont pas autorisées
    De plus : Warning message:
    non-unique values when setting 'row.names': ‘1’, ‘2’ 
    

    基本上,我需要帮助!我如何在数据框中得到这个混乱?

    以下是一些具有代表性的数据(列表中的元素8到10)

        > rownames(D4)
        [1] "890867748" "889955650"
        > rownames(D5)
        [1] "891368844" "889954328" "889955975" "891364679" "891364678"
    

0 个答案:

没有答案