
时间:2017-10-27 18:31:05

标签: scala apache-spark apache-spark-sql

以下是 dfMainOutput 的Spark数据框输出。

4295858898,177,SelfSourcedPublic,INC,Cost of sales,Umsatzkosten,,ECOR,false,,,,,false,False,,,,505096,505074,505074,505096,505096,,505074,False,,3014830,,I|!|

现在,我想将,替换为|^|并删除一列 DataPartition


val dfMainOutputFinal = dfMainOutput.select($"DataPartition", $"StatementTypeCode",concat_ws("|^|", dfMainOutput.schema.fieldNames.filter(_ != "DataPartition").map(c => col(c)): _*).as("concatenated"))

val headerColumn = df.columns.filter(v => (!v.contains("^") && !v.contains("_c"))).toSeq

val header = headerColumn.dropRight(1).mkString("", "|^|", "|!|")

val dfMainOutputFinalWithoutNull = dfMainOutputFinal.withColumn("concatenated", regexp_replace(col("concatenated"), "null", "")).withColumnRenamed("concatenated", header)

  .option("nullValue", "")
  .option("header", "true")
  .option("codec", "gzip")


4295858898|^|177|^|INC|^|Cost of sales|^|Umsatzkosten|^|ECOR|^|false|^|false|^|False|^|505096|^|505074|^|505074|^|505096|^|505096|^|505074|^|False|^|3014830|^|I|!|


4295858898|^|177|^|INC|^|Cost of sales|^|Umsatzkosten|^||^|ECOR|^|False|^||^||^||^||^|False|^|False|^||^||^||^|505096|^|505074|^|505074|^|505096|^|505096|^||^|505074|^|False|^||^|3014830|^||^|I|!|

同样在数据框输出中我得到false我们想要的地方False 请帮助我,我错过了什么..


 |-- LineItem_organizationId: long (nullable = true)
 |-- LineItem_lineItemId: integer (nullable = true)
 |-- DataPartition: string (nullable = true)
 |-- StatementTypeCode: string (nullable = true)
 |-- LineItemName: string (nullable = true)
 |-- LocalLanguageLabel: string (nullable = true)
 |-- FinancialConceptLocal: string (nullable = true)
 |-- FinancialConceptGlobal: string (nullable = true)
 |-- IsDimensional: boolean (nullable = true)
 |-- InstrumentId: string (nullable = true)
 |-- LineItemSequence: string (nullable = true)
 |-- PhysicalMeasureId: string (nullable = true)
 |-- FinancialConceptCodeGlobalSecondary: string (nullable = true)
 |-- IsRangeAllowed: boolean (nullable = true)
 |-- IsSegmentedByOrigin: string (nullable = true)
 |-- SegmentGroupDescription: string (nullable = true)
 |-- SegmentChildDescription: string (nullable = true)
 |-- SegmentChildLocalLanguageLabel: string (nullable = true)
 |-- LocalLanguageLabel_languageId: string (nullable = true)
 |-- LineItemName_languageId: string (nullable = true)
 |-- SegmentChildDescription_languageId: string (nullable = true)
 |-- SegmentChildLocalLanguageLabel_languageId: string (nullable = true)
 |-- SegmentGroupDescription_languageId: string (nullable = true)
 |-- SegmentMultipleFundbDescription: string (nullable = true)
 |-- SegmentMultipleFundbDescription_languageId: string (nullable = true)
 |-- IsCredit: string (nullable = true)
 |-- FinancialConceptLocalId: string (nullable = true)
 |-- FinancialConceptGlobalId: string (nullable = true)
 |-- FinancialConceptCodeGlobalSecondaryId: string (nullable = true)
 |-- FFAction: string (nullable = true)

在数据DataPartition=SelfSourcePublic and StatementTypeCode=INC


|LineItem_organizationId|LineItem_lineItemId|DataPartition    |StatementTypeCode|LineItemName                                                                              |LocalLanguageLabel                                                                                    |FinancialConceptLocal|FinancialConceptGlobal|IsDimensional|InstrumentId|LineItemSequence|PhysicalMeasureId|FinancialConceptCodeGlobalSecondary|IsRangeAllowed|IsSegmentedByOrigin|SegmentGroupDescription|SegmentChildDescription|SegmentChildLocalLanguageLabel|LocalLanguageLabel_languageId|LineItemName_languageId|SegmentChildDescription_languageId|SegmentChildLocalLanguageLabel_languageId|SegmentGroupDescription_languageId|SegmentMultipleFundbDescription|SegmentMultipleFundbDescription_languageId|IsCredit|FinancialConceptLocalId|FinancialConceptGlobalId|FinancialConceptCodeGlobalSecondaryId|FFAction|
|4295858898             |707                |SelfSourcedPublic|INC              |Revenue from long-term construction contracts                                             |Erlöse aus langfristigen Fertigungsaufträgen                                                          |null                 |ROBR                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |True    |null                   |3015278                 |null                                 |I|!|    |
|4295858898             |3289               |SelfSourcedPublic|INC              |Balancing Item - Net Income available to Controlling Interest                             |null                                                                                                  |null                 |IIII                  |false        |null        |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |True    |null                   |3014960                 |null                                 |I|!|    |
|4295858922             |808                |SelfSourcedPublic|INC              |Income Taxes - Total                                                                      |Ertragsteuern                                                                                         |null                 |XTAX                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |False   |null                   |3019589                 |null                                 |I|!|    |
|4295858922             |1507               |SelfSourcedPublic|INC              |Balancing Item - Operating Expenses                                                       |null                                                                                                  |null                 |IIII                  |false        |null        |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |True    |null                   |3014960                 |null                                 |I|!|    |
|4295858951             |1574               |SelfSourcedPublic|INC              |Admin/General Expenses                                                                    |null                                                                                                  |null                 |ESGA                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505074                       |505074                 |505074                            |505074                                   |505074                            |null                           |505074                                    |False   |null                   |3018991                 |null                                 |I|!|    |
|4295859007             |1645               |SelfSourcedPublic|INC              |Exploration Expenses - Balancing value                                                    |null                                                                                                  |null                 |EEXP                  |false        |null        |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505074                       |505074                 |505074                            |505074                                   |505074                            |null                           |505074                                    |False   |null                   |3018916                 |null                                 |I|!|    |
|4295859038             |954                |SelfSourcedPublic|INC              |Sale Investments                                                                          |null                                                                                                  |null                 |EGFA                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |True    |null                   |3018929                 |null                                 |I|!|    |
|4295859038             |1967               |SelfSourcedPublic|INC              |Restructuring Charges/Provisions                                                          |Ergebnis aus Umstrukturierungen                                                                       |null                 |ERES                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |False   |null                   |3018980                 |null                                 |I|!|    |
|4295859038             |1996               |SelfSourcedPublic|INC              |Diluted Weighted Average Shares on Instrument Level multiplied to its Participation Factor|null                                                                                                  |null                 |DWASEPFI              |false        |8590926849  |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |True    |null                   |1002023919              |null                                 |I|!|    |
|4295859045             |864                |SelfSourcedPublic|INC              |Results of valuation gains/losses and disposals of non-current securities                 |Ergebnis aus Kursänderungen und Abgängen von Wertpapieren des langfristigen Finanzvermögens („@FVTPL“)|null                 |EGIT                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |True    |null                   |3018932                 |null                                 |I|!|    |
|4295859045             |1092               |SelfSourcedPublic|INC              |Excep. Depreciation                                                                       |null                                                                                                  |null                 |EGLO                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |True    |null                   |3018938                 |null                                 |I|!|    |
|4295859071             |1840               |SelfSourcedPublic|INC              |Other Operating Expense                                                                   |null                                                                                                  |null                 |EOOE                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505074                       |505074                 |505074                            |505074                                   |505074                            |null                           |505074                                    |False   |null                   |3018974                 |null                                 |I|!|    |
|4295859078             |914                |SelfSourcedPublic|INC              |Balancing Item - Non Operating Income/(Expense), net                                      |null                                                                                                  |null                 |IIII                  |false        |null        |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |True    |null                   |3014960                 |null                                 |I|!|    |
|4295859106             |514                |SelfSourcedPublic|INC              |Personnel Expenses                                                                        |null                                                                                                  |null                 |ELAS                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505074                       |505074                 |505074                            |505074                                   |505074                            |null                           |505074                                    |False   |null                   |3018944                 |null                                 |I|!|    |
|4295859106             |903                |SelfSourcedPublic|INC              |Balancing Item - Non Operating Income/(Expense), net                                      |null                                                                                                  |null                 |IIII                  |false        |null        |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505074                       |505074                 |505074                            |505074                                   |505074                            |null                           |505074                                    |True    |null                   |3014960                 |null                                 |I|!|    |
|4295859216             |499                |SelfSourcedPublic|INC              |BC - Depreciation of Fixed Assets                                                         |null                                                                                                  |null                 |BCDEP                 |false        |null        |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505084                       |505074                 |505074                            |505084                                   |505084                            |null                           |505074                                    |False   |null                   |1002023928              |null                                 |I|!|    |
|4295859236             |172                |SelfSourcedPublic|INC              |Total Revenue                                                                             |Ventes                                                                                                |null                 |XTLR                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505074                       |505074                 |505074                            |505074                                   |505074                            |null                           |505074                                    |True    |null                   |3016345                 |null                                 |I|!|    |
|4295859241             |492                |SelfSourcedPublic|INC              |Diluted Net Income excluding Extra Items applicable to Common - (Instrument Level)        |null                                                                                                  |null                 |XNCNDI                |false        |8589989623  |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505074                       |505074                 |505074                            |505074                                   |505074                            |null                           |505074                                    |True    |null                   |1001214357              |null                                 |I|!|    |
|4295859279             |124                |SelfSourcedPublic|INC              |Income Available to Com Excl ExtraOrd                                                     |Toerekenbaar aan de aandeelhouders van de moederonderneming                                           |null                 |XNCN                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505084                       |505074                 |505074                            |505084                                   |505084                            |null                           |505074                                    |True    |null                   |3016316                 |null                                 |I|!|    |
|4295859298             |488                |SelfSourcedPublic|INC              |Other operating income/expenses                                                           |Other operating expenses                                                                              |null                 |EOIE                  |false        |null        |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505074                       |505074                 |505074                            |505074                                   |505074                            |null                           |505074                                    |True    |null                   |3018969                 |null                                 |I|!|    |


val dfMainOutputFinal = dfMainOutput.select($"DataPartition", $"StatementTypeCode",concat_ws("|^|", dfMainOutput.schema.fieldNames.filter(_ != "DataPartition").map(c => col(c)): _*).as("concatenated"))


|DataPartition    |StatementTypeCode|concatenated                                                                                                                                                                                                                                                                                                            |
|SelfSourcedPublic|INC              |4295858898|^|707|^|INC|^|Revenue from long-term construction contracts|^|Erlöse aus langfristigen Fertigungsaufträgen|^|ROBR|^|false|^|false|^|False|^|505096|^|505074|^|505074|^|505096|^|505096|^|505074|^|True|^|3015278|^|I|!|                                                                                      |
|SelfSourcedPublic|INC              |4295858898|^|3289|^|INC|^|Balancing Item - Net Income available to Controlling Interest|^|IIII|^|false|^|false|^|505096|^|505074|^|505074|^|505096|^|505096|^|505074|^|True|^|3014960|^|I|!|                                                                                                                            |
|SelfSourcedPublic|INC              |4295858922|^|808|^|INC|^|Income Taxes - Total|^|Ertragsteuern|^|XTAX|^|false|^|false|^|False|^|505096|^|505074|^|505074|^|505096|^|505096|^|505074|^|False|^|3019589|^|I|!|                                                                                                                                             |
|SelfSourcedPublic|INC              |4295858922|^|1507|^|INC|^|Balancing Item - Operating Expenses|^|IIII|^|false|^|false|^|505096|^|505074|^|505074|^|505096|^|505096|^|505074|^|True|^|3014960|^|I|!|                                                                                                                                                      |
|SelfSourcedPublic|INC              |4295859236|^|172|^|INC|^|Total Revenue |^|Ventes|^|XTLR|^|false|^|false|^|False|^|505074|^|505074|^|505074|^|505074|^|505074|^|505074|^|True|^|3016345|^|I|!|                                                                                                                                                           |
|SelfSourcedPublic|INC              |4295859241|^|492|^|INC|^|Diluted Net Income excluding Extra Items applicable to Common - (Instrument Level) |^|XNCNDI|^|false|^|8589989623|^|false|^|505074|^|505074|^|505074|^|505074|^|505074|^|505074|^|True|^|1001214357|^|I|!|                                                                                     |
|SelfSourcedPublic|INC              |4295859279|^|124|^|INC|^|Income Available to Com Excl ExtraOrd|^|Toerekenbaar aan de aandeelhouders van de moederonderneming|^|XNCN|^|false|^|false|^|False|^|505084|^|505074|^|505074|^|505084|^|505084|^|505074|^|True|^|3016316|^|I|!|                                                                               |
|SelfSourcedPublic|INC              |4295859298|^|488|^|INC|^|Other operating income/expenses|^|Other operating expenses|^|EOIE|^|false|^|false|^|505074|^|505074|^|505074|^|505074|^|505074|^|505074|^|True|^|3018969|^|I|!|                                                                                                                                |

因此,请明确4295858898 |3289 LocalLanguageLabel列{null}中的空值


1 个答案:

答案 0 :(得分:2)

您的主要罪魁祸首是null中有dataframe个值,而concat_ws正在过滤掉所有空值。所以解决方案是将所有null值替换为"" ,以解决您的问题。这不会是一个问题,因为您将模式中的所有null数据类型都设置为string


val dfMainOutputFinal = dfMainOutput.select($"DataPartition", $"StatementTypeCode",concat_ws("|^|", dfMainOutput.schema.fieldNames.filter(_ != "DataPartition").map(c => col(c)): _*).as("concatenated"))

val dfMainOutputFinal = dfMainOutput.na.fill("").select($"DataPartition", $"StatementTypeCode",concat_ws("|^|", dfMainOutput.schema.fieldNames.filter(_ != "DataPartition").map(c => col(c)): _*).as("concatenated"))
