我有表格发票,列“总计” varchar(255)。像这样的值:“ 500.00”,“ 5'199.00”,“ 129.60”,“ 1.00”等。 我需要选择记录并按总计列进行过滤。例如,查找总计不超过180的记录。
我尝试过:
SELECT total from invoices WHERE invoices.total <= '180'
但是结果是:
125.25
100.50
1593.55 - not correct
4'799.00 - not correct
1.00
-99.00
2406.52 -not correct
如何解决此问题并为此列编写正确的过滤器?谢谢!
答案 0 :(得分:1)
您可以使用cast()函数将其转换为浮点型
scala> val diff = udf((col: String, c1: String, c2: String) => if (c1 == c2) "" else col )
scala> DF1.join(DF2, DF1("emp_id") === DF2("emp_id"))
res15: org.apache.spark.sql.DataFrame = [emp_id: int, emp_city: string ... 10 more fields]
scala> res15.withColumn("diffcolumn", split(concat_ws(",",DF1.columns.map(x => diff(lit(x), DF1(x), DF2(x))):_*),","))
res16: org.apache.spark.sql.DataFrame = [emp_id: int, emp_city: string ... 11 more fields]
scala> res16.show(false)
+------+---------+--------+---------+-------+--------+------+--------+--------+---------+-------+--------+---------------------------+
|emp_id|emp_city |emp_name|emp_phone|emp_sal|emp_site|emp_id|emp_city|emp_name|emp_phone|emp_sal|emp_site|diffcolumn |
+------+---------+--------+---------+-------+--------+------+--------+--------+---------+-------+--------+---------------------------+
|3 |Chennai |rahman |9846 |45000 |SanRamon|3 |Chennai |rahman |9846 |45000 |SanRamon|[, , , , , ] |
|1 |Hyderabad|ram |9847 |50000 |SF |1 |Sydney |ram |9847 |48000 |SF |[, emp_city, , , emp_sal, ]|
+------+---------+--------+---------+-------+--------+------+--------+--------+---------+-------+--------+---------------------------+
scala> val diff_cols = res16.select(explode($"diffcolumn")).filter("col != ''").distinct.collect.map(a=>col(a(0).toString))
scala> val exceptOpr = DF1.except(DF2)
scala> exceptOpr.select(diff_cols:_*).show
+-------+---------+
|emp_sal| emp_city|
+-------+---------+
| 50000|Hyderabad|
+-------+---------+
答案 1 :(得分:0)
为什么要将数字存储为字符串?这是您的数据模型的一个基本问题,您应该对其进行修复。
有时候,我们被别人的非常,非常,非常糟糕的决定所困扰。在这种情况下,您可以尝试通过显式转换来解决此问题:
SELECT i.total
FROM invoices i
WHERE CAST(REPLACE(i.total, '''', '') as DECIMAL(20, 4)) <= 180;
请注意,如果总计中还有其他意外字符,则会返回错误。
答案 2 :(得分:0)
如果字符串以数字开头,然后包含非数字字符,则可以使用CAST()
函数或通过添加0
将其隐式转换为数字:
SELECT CAST('1234abc' AS UNSIGNED); -- 1234
SELECT '1234abc'+0; -- 1234
要从任意字符串中提取数字,可以添加自定义的function,例如this:
DELIMITER $$
CREATE FUNCTION `ExtractNumber`(in_string VARCHAR(50))
RETURNS INT
NO SQL
BEGIN
DECLARE ctrNumber VARCHAR(50);
DECLARE finNumber VARCHAR(50) DEFAULT '';
DECLARE sChar VARCHAR(1);
DECLARE inti INTEGER DEFAULT 1;
IF LENGTH(in_string) > 0 THEN
WHILE(inti <= LENGTH(in_string)) DO
SET sChar = SUBSTRING(in_string, inti, 1);
SET ctrNumber = FIND_IN_SET(sChar, '0,1,2,3,4,5,6,7,8,9');
IF ctrNumber > 0 THEN
SET finNumber = CONCAT(finNumber, sChar);
END IF;
SET inti = inti + 1;
END WHILE;
RETURN CAST(finNumber AS UNSIGNED);
ELSE
RETURN 0;
END IF;
END$$
DELIMITER ;
一旦定义了函数,就可以在查询中使用它:
SELECT total from invoices WHERE ExtractNumber(invoices.total) <= 180