我有一个pyspark数据帧,其中包含一些后缀为select
d.file_no,
d.name,
d.subject,
r.requested_date,
r.approved_date,
i.issue_date
from tbl_documents d
join tbl_requests r on r.document_id = d.id
join tbl_issues i on i.document_id = d.id
的列。
_24
我尝试使用colRegex方法选择它们,但是下面的代码导致异常:
df.columns = [timestamp',
'air_temperature_median_24',
'air_temperature_median_6',
'wind_direction_mean_24',
'wind_speed',
'building_id']
pyspark可以正常运行,因此没有问题,因此这很可能是语法错误。
另一方面,此语法也会失败:
df.select(ashrae.colRegex(".+'_24'")).show()
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
<ipython-input-103-a8189f0298e6> in <module>
----> 1 ashrae.select(ashrae.colRegex(".+'_24'")).show()
C:\spark\spark-3.0.0-preview-bin-hadoop2.7\python\pyspark\sql\dataframe.py in colRegex(self, colName)
957 if not isinstance(colName, basestring):
958 raise ValueError("colName should be provided as string")
--> 959 jc = self._jdf.colRegex(colName)
960 return Column(jc)
961
C:\spark\spark-3.0.0-preview-bin-hadoop2.7\python\lib\py4j-0.10.8.1-src.zip\py4j\java_gateway.py in __call__(self, *args)
1284 answer = self.gateway_client.send_command(command)
1285 return_value = get_return_value(
-> 1286 answer, self.gateway_client, self.target_id, self.name)
1287
1288 for temp_arg in temp_args:
C:\spark\spark-3.0.0-preview-bin-hadoop2.7\python\pyspark\sql\utils.py in deco(*a, **kw)
96 def deco(*a, **kw):
97 try:
---> 98 return f(*a, **kw)
99 except py4j.protocol.Py4JJavaError as e:
100 converted = convert_exception(e.java_exception)
C:\spark\spark-3.0.0-preview-bin-hadoop2.7\python\lib\py4j-0.10.8.1-src.zip\py4j\protocol.py in get_return_value(answer, gateway_client, target_id, name)
326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
--> 328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
Py4JJavaError: An error occurred while calling o151.colRegex.
: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.charAt(Unknown Source)
at scala.collection.immutable.StringOps$.apply$extension(StringOps.scala:41)
at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:202)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:121)
at org.apache.spark.sql.Dataset.resolve(Dataset.scala:259)
at org.apache.spark.sql.Dataset.colRegex(Dataset.scala:1364)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Unknown Source)
什么原因导致异常以及如何纠正代码?
答案 0 :(得分:2)
尝试以下语法:
renter
使用colRegex时,列名由反引号引起来。