Question

我正在尝试从零件号中删除某些字母，但在尝试使其正确工作方面遇到困难。

这就是我现在所在的位置。它不起作用。

#include <stdio.h>

int main()
{
    char c[2];
    c[0] = 'a';
    c[1] = '\0';
    printf("%s",c); // you get only a
    return 0;
}

本质上说我有以下五个项目：

D39J02GEN
20F934L
2984CPL
29048L20GEN
1120934L

我只希望检测粗体字。因此，只要它们在L之前有一个数字，它们就以L结尾。

编辑：这一步很近：

a

但仍会显示L后面有东西的地方。这也离删除字母L还很近。

Answer 1

如果您知道该值位于结尾，请执行以下操作：

SELECT LEFT(part, LENGTH(part) - 2)
FROM `table`
WHERE part REGEXP '[0-9]L$';

如果模式位于字符串的中间，这将更加棘手。

Answer 2

如果始终需要在文本末尾进行匹配，则类似的事情也应该起作用。

查询

SELECT 
 *
FROM 
 t 
WHERE
   SUBSTRING(REVERSE(t.text_string), 1, 1) = 'L'
 AND
   SUBSTRING(REVERSE(t.text_string), 2) >> 0 <> 0

结果

| text_string |
| ----------- |
| 20F934L     |
| 1120934L    |

请参阅demo

注意
SUBSTRING(REVERSE(t.text_string), 2) >> 0在这里基本上是指CAST(SUBSTRING(REVERSE(t.text_string), 2) AS UNSIGNED)

为什么行得通？

我使用MySQL的松散自动广播功能，该功能可以将439F02 INT中的439转换为PC4892，但不能将INT转换为0转换为SELECT * , SUBSTRING(REVERSE(t.text_string), 1, 1) , SUBSTRING(REVERSE(t.text_string), 2) , SUBSTRING(REVERSE(t.text_string), 2) >> 0 , SUBSTRING(REVERSE(t.text_string), 2) >> 0 <> 0 FROM t

根据查询查看以下结果集

查询

| text_string | SUBSTRING(REVERSE(t.text_string), 1, 1) | SUBSTRING(REVERSE(t.text_string), 2) | SUBSTRING(REVERSE(t.text_string), 2) >> 0 | SUBSTRING(REVERSE(t.text_string), 2) >> 0 <> 0 |
| ----------- | --------------------------------------- | ------------------------------------ | ----------------------------------------- | ---------------------------------------------- |
| D39J02GEN   | N                                       | EG20J93D                             | 0                                         | 0                                              |
| 20F934L     | L                                       | 439F02                               | 439                                       | 1                                              |
| 2984CPL     | L                                       | PC4892                               | 0                                         | 0                                              |
| 29048L20GEN | N                                       | EG02L84092                           | 0                                         | 0                                              |
| 1120934L    | L                                       | 4390211                              | 4390211                                   | 1                                              |

结果

import os, sys, pandas, time
import findspark
findspark.init('/opt/cloudera/parcels/SPARK2/lib/spark2')
import pyspark
os.environ['PYSPARK_PYTHON'] = "/opt/cloudera/Anaconda3/bin/python"

from pyspark import SparkConf, SparkContext, StorageLevel
from pyspark.sql.types import *
from pyspark.sql import Row, SparkSession, HiveContext, SQLContext

target_directory = "removed for this post"
top_directory = "removed for this post"

sc_conf = SparkConf()
sc_conf.setAppName("test"),
sc_conf.set('spark.ui.port', 49051)
sc_conf.set('spark.executor.memory', '18g'),
sc_conf.set('spark.executor.cores', '4')
sc_conf.set('spark.executor.extraJavaOptions', '-XX:+UseG1GC'),
sc_conf.set('spark.driver.memory', '18g'),
sc_conf.set('spark.yarn.am.memory', '8g'),
sc_conf.set('spark.yarn.am.cores', '4'),
sc_conf.set('spark.task.cpus','1'),
sc_conf.set('spark.serializer','org.apache.spark.serializer.KryoSerializer'),

ses = (
    SparkSession
    .builder
    .config(conf=sc_conf)
    .enableHiveSupport()
    .getOrCreate()
)

sc = ses.sparkContext
start_time = time.time()
sqlContext = HiveContext(ses)

print("LOADING JSON DF")
start_time_json = time.time()
jsonDF = sqlContext.read.json(top_directory + "traceroute*")
print("SUCCESS")
print("Time elapsed: " + str(time.time() - start_time_json) + " seconds")

print("\n")
jsonDF.printSchema()
jsonDF.show(3)

ses.sql("USE my_table_name")
start_time_orc = time.time()
print("WRITING ORC")
#jsonDF.write.format("orc").saveAsTable("main_orc.json")
jsonDF.write.orc("../traceroute_orc")
print("SUCCESS")
print("Time elapsed: " + str(time.time() - start_time_orc) + " seconds")
#print(time.time() - start_time_orc) 
print("\n")

start_time_parquet = time.time()
print("WRITING PARQUET")
jsonDF.write.parquet("../traceroute_parquet")
print("SUCCESS")
print("Time elapsed: " + str(time.time() - start_time_parquet) + " seconds")  
print("\n")
print("Total time elapsed: " + str(time.time() - start_time) + " seconds") 

sc.stop()

这里是demo，您可以自己查看上述结果。

在所有行中查找并替换“ 000L”

2 个答案: