我正在尝试从零件号中删除某些字母,但在尝试使其正确工作方面遇到困难。
这就是我现在所在的位置。它不起作用。
#include <stdio.h>
int main()
{
char c[2];
c[0] = 'a';
c[1] = '\0';
printf("%s",c); // you get only a
return 0;
}
本质上说我有以下五个项目:
D39J02GEN
20F934L
2984CPL
29048L20GEN
1120934L
我只希望检测粗体字。因此,只要它们在L之前有一个数字,它们就以L结尾。
编辑:这一步很近:
a
但仍会显示L后面有东西的地方。这也离删除字母L还很近。
答案 0 :(得分:2)
如果您知道该值位于结尾,请执行以下操作:
SELECT LEFT(part, LENGTH(part) - 2)
FROM `table`
WHERE part REGEXP '[0-9]L$';
如果模式位于字符串的中间,这将更加棘手。
答案 1 :(得分:1)
如果始终需要在文本末尾进行匹配,则类似的事情也应该起作用。
查询
SELECT
*
FROM
t
WHERE
SUBSTRING(REVERSE(t.text_string), 1, 1) = 'L'
AND
SUBSTRING(REVERSE(t.text_string), 2) >> 0 <> 0
结果
| text_string |
| ----------- |
| 20F934L |
| 1120934L |
请参阅demo
注意
SUBSTRING(REVERSE(t.text_string), 2) >> 0
在这里基本上是指CAST(SUBSTRING(REVERSE(t.text_string), 2) AS UNSIGNED)
为什么行得通?
我使用MySQL的松散自动广播功能,该功能可以将439F02
INT
中的439
转换为PC4892
,但不能将INT
转换为0
转换为SELECT
*
, SUBSTRING(REVERSE(t.text_string), 1, 1)
, SUBSTRING(REVERSE(t.text_string), 2)
, SUBSTRING(REVERSE(t.text_string), 2) >> 0
, SUBSTRING(REVERSE(t.text_string), 2) >> 0 <> 0
FROM
t
根据查询查看以下结果集
查询
| text_string | SUBSTRING(REVERSE(t.text_string), 1, 1) | SUBSTRING(REVERSE(t.text_string), 2) | SUBSTRING(REVERSE(t.text_string), 2) >> 0 | SUBSTRING(REVERSE(t.text_string), 2) >> 0 <> 0 |
| ----------- | --------------------------------------- | ------------------------------------ | ----------------------------------------- | ---------------------------------------------- |
| D39J02GEN | N | EG20J93D | 0 | 0 |
| 20F934L | L | 439F02 | 439 | 1 |
| 2984CPL | L | PC4892 | 0 | 0 |
| 29048L20GEN | N | EG02L84092 | 0 | 0 |
| 1120934L | L | 4390211 | 4390211 | 1 |
结果
import os, sys, pandas, time
import findspark
findspark.init('/opt/cloudera/parcels/SPARK2/lib/spark2')
import pyspark
os.environ['PYSPARK_PYTHON'] = "/opt/cloudera/Anaconda3/bin/python"
from pyspark import SparkConf, SparkContext, StorageLevel
from pyspark.sql.types import *
from pyspark.sql import Row, SparkSession, HiveContext, SQLContext
target_directory = "removed for this post"
top_directory = "removed for this post"
sc_conf = SparkConf()
sc_conf.setAppName("test"),
sc_conf.set('spark.ui.port', 49051)
sc_conf.set('spark.executor.memory', '18g'),
sc_conf.set('spark.executor.cores', '4')
sc_conf.set('spark.executor.extraJavaOptions', '-XX:+UseG1GC'),
sc_conf.set('spark.driver.memory', '18g'),
sc_conf.set('spark.yarn.am.memory', '8g'),
sc_conf.set('spark.yarn.am.cores', '4'),
sc_conf.set('spark.task.cpus','1'),
sc_conf.set('spark.serializer','org.apache.spark.serializer.KryoSerializer'),
ses = (
SparkSession
.builder
.config(conf=sc_conf)
.enableHiveSupport()
.getOrCreate()
)
sc = ses.sparkContext
start_time = time.time()
sqlContext = HiveContext(ses)
print("LOADING JSON DF")
start_time_json = time.time()
jsonDF = sqlContext.read.json(top_directory + "traceroute*")
print("SUCCESS")
print("Time elapsed: " + str(time.time() - start_time_json) + " seconds")
print("\n")
jsonDF.printSchema()
jsonDF.show(3)
ses.sql("USE my_table_name")
start_time_orc = time.time()
print("WRITING ORC")
#jsonDF.write.format("orc").saveAsTable("main_orc.json")
jsonDF.write.orc("../traceroute_orc")
print("SUCCESS")
print("Time elapsed: " + str(time.time() - start_time_orc) + " seconds")
#print(time.time() - start_time_orc)
print("\n")
start_time_parquet = time.time()
print("WRITING PARQUET")
jsonDF.write.parquet("../traceroute_parquet")
print("SUCCESS")
print("Time elapsed: " + str(time.time() - start_time_parquet) + " seconds")
print("\n")
print("Total time elapsed: " + str(time.time() - start_time) + " seconds")
sc.stop()
这里是demo,您可以自己查看上述结果。