如何在Spark中选择名称中带有引号的数据框中的列

时间:2018-10-04 09:32:43

标签: apache-spark apache-spark-sql apache-spark-dataset

我尝试使用

访问列"accession" "database" "disease" "ec.code" "omics_type" "species"
fileDf.select("\"accession\"","\"database\"","\"connections\"")

但仍然出现错误

root
 |-- "accession"    "database"  "disease"   "ec.code"   "omics_type"    "species"   "tissue"    "citations.x"   "coding.x"  "ensembl.x" "go.x"  "intact.x"  "kegg.compound.x"   "kegg.glycan.x" "kegg.pathway.x"    "kegg.reaction.x"   "metabolights.x"    "ncbi.x"    "pubchem.compound.x"    "pubchem.substance.x"   "reactome.x"    "reanalysis.x"  "rnacentral.x"  "sgd.x" "sra.x" "uniprot.x" "ajs.connectivity.score"    "citations.y"   "coding.y"  "ensembl.y" "go.y"  "intact.y"  "kegg.compound.y"   "kegg.glycan.y" "kegg.pathway.y"    "kegg.reaction.y"   "metabolights.y"    "ncbi.y"    "pubchem.compound.y"    "pubchem.substance.y"   "reactome.y"    "reanalysis.y"  "rnacentral.y"  "sgd.y" "sra.y" "uniprot.y" "connections": string (nullable = true)

Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '`"accession"`' given input columns: ["accession" "database"  "disease"   "ec.code"   "omics_type"    "species"   "tissue"    "citations.x"   "coding.x"  "ensembl.x" "go.x"  "intact.x"  "kegg.compound.x"   "kegg.glycan.x" "kegg.pathway.x"    "kegg.reaction.x"   "metabolights.x"    "ncbi.x"    "pubchem.compound.x"    "pubchem.substance.x"   "reactome.x"    "reanalysis.x"  "rnacentral.x"  "sgd.x" "sra.x" "uniprot.x" "ajs.connectivity.score"    "citations.y"   "coding.y"  "ensembl.y" "go.y"  "intact.y"  "kegg.compound.y"   "kegg.glycan.y" "kegg.pathway.y"    "kegg.reaction.y"   "metabolights.y"    "ncbi.y"    "pubchem.compound.y"    "pubchem.substance.y"   "reactome.y"    "reanalysis.y"  "rnacentral.y"  "sgd.y" "sra.y" "uniprot.y" "connections"];;
'Project ['"accession", '"database", '"connections"]
+- AnalysisBarrier
      +- Relation["accession"   "database"  "disease"   "ec.code"   "omics_type"    "species"   "tissue"    "citations.x"   "coding.x"  "ensembl.x" "go.x"  "intact.x"  "kegg.compound.x"   "kegg.glycan.x" "kegg.pathway.x"    "kegg.reaction.x"   "metabolights.x"    "ncbi.x"    "pubchem.compound.x"    "pubchem.substance.x"   "reactome.x"    "reanalysis.x"  "rnacentral.x"  "sgd.x" "sra.x" "uniprot.x" "ajs.connectivity.score"    "citations.y"   "coding.y"  "ensembl.y" "go.y"  "intact.y"  "kegg.compound.y"   "kegg.glycan.y" "kegg.pathway.y"    "kegg.reaction.y"   "metabolights.y"    "ncbi.y"    "pubchem.compound.y"    "pubchem.substance.y"   "reactome.y"    "reanalysis.y"  "rnacentral.y"  "sgd.y" "sra.y" "uniprot.y" "connections"#10] csv

我该如何选择数据框中带有Spark名称的引号?

0 个答案:

没有答案