致力于对我从Kaggle下载的数据实施基本的信用卡欺诈检测算法。
列名似乎用双引号表示,例如:“时间”,“ V1”,“ V2”,“ V3”,“ V4”,“ V5”,“ V6”,“ V7”,“ V8”, “ V9”,“ V10”,“ V11”,“ V12”,“ V13”,“ V14”,“ V15”,“ V16”,“ V17”,“ V18”,“ V19”,“ V20”,“ V21” “,” V22“ ....
我想要做的是使用VectorAssembler将所有列组合为MLlib的功能列,如下所示:
assembler = VectorAssembler(
inputCols=["Time","V1","V2","V3","V4","V5","V6","V7","V8","V9","V10","V11","V12","V13","V14","V15","V16","V17","V18","V19","V20","V21","V22","V23","V24","V25","V26","V27","V28","Amount"],
outputCol="features")
output = assembler.transform(df)
但是得到错误:
IllegalArgumentException: 'Field "Time" does not exist.\nAvailable fields: "Time","V1","V2","V3","
我意识到这是由于双引号的列名所致,因为我尝试使用以下方式更改单个列名:
df1 = df.selectExpr("""'Time' as test""")
然后工作了。但是,鉴于在此示例中我有30个,而在下一个示例中我可能会更多,因此似乎无法对所有列上的选择进行“硬编码”。
我尝试了所有可能的语法,即:
inputCols=['"Time"']
inputCols=["'Time'"]
inputCols=[""Time""]
inputCols=["""Time"""]
inputCols=["´Time´"]
但是都给出相同的错误。有什么解决方案吗?还是我应该对select语句进行硬编码以重命名列?