我运行如下查询:
select *
from sd_sms LEFT JOIN categories_phrases
on sd_sms.body like concat('%',categories_phrases.phrase1,'%')
and sd_sms.body like concat('%',categories_phrases.phrase2,'%')
and sd_sms.body like concat('%',categories_phrases.phrase3,'%')
and sd_sms.body like concat('%',categories_phrases.phrase4,'%')
基本上,如果表A中的一个字段包含表B中的几个短语,它将连接两个表。但现在我需要在Python中执行此操作。
有没有简单的方法可以使用pandas合并两个表,所以它给我相同的结果?
请告知
答案 0 :(得分:1)
此代码示例适用于连接子句中的文本数据和类似条件。
from pandasql import *
import pandas as pd
pysqldf = lambda q: sqldf(q, globals())
df1 = pd.DataFrame({"name": ['Antony', 'Mark', 'Jacob'], "age":
[11,12,13]})
df2 = pd.DataFrame({"name": ['Antony', 'Gill', 'John']})
q = """SELECT * FROM df1 LEFT JOIN df2 ON df1.name LIKE '%' || df2.name || '%'"""
df = pysqldf(q)
这只是一个带有示例数据的虚拟DF,但我对你的问题应用了类似的条件。
希望它有用。
答案 1 :(得分:0)
我不明白您的数据类型是什么,因为您错过了答案中的一些示例数据;但如果您需要使用像sintax这样的SQL查询pandas数据帧,您可以尝试使用pandasql
package.v它基于SQLAlchemy ORM工具。
from pandasql import *
import pandas as pd
pysqldf = lambda q: sqldf(q, globals())
q = """
SELECT
m.date
, m.beef
, b.births
FROM
meat m
LEFT JOIN
births b
ON m.date = b.date
WHERE
m.date > '1974-12-31';
"""
meat = load_meat()
births = load_births()
df = pysqldf(q)
df
date beef births 0 1975-01-01 00:00:00.000000 2106.0 265775.0 1 1975-02-01 00:00:00.000000 1845.0 241045.0 2 1975-03-01 00:00:00.000000 1891.0 268849.0 3 1975-04-01 00:00:00.000000 1895.0 247455.0 4 1975-05-01 00:00:00.000000 1849.0 254545.0 5 1975-06-01 00:00:00.000000 1849.0 254096.0 6 1975-07-01 00:00:00.000000 1916.0 275163.0 7 1975-08-01 00:00:00.000000 1961.0 281300.0 8 1975-09-01 00:00:00.000000 2065.0 270738.0 9 1975-10-01 00:00:00.000000 2270.0 265494.0 10 1975-11-01 00:00:00.000000 1970.0 251973.0 11 1975-12-01 00:00:00.000000 2055.0 260532.0 12 1976-01-01 00:00:00.000000 2208.0 257455.0 13 1976-01-01 00:00:00.000000 2208.0 259173.0 14 1976-02-01 00:00:00.000000 1966.0 236551.0 15 1976-02-01 00:00:00.000000 1966.0 238153.0 16 1976-03-01 00:00:00.000000 2318.0 257951.0 17 1976-03-01 00:00:00.000000 2318.0 261608.0 18 1976-04-01 00:00:00.000000 2015.0 246469.0 19 1976-04-01 00:00:00.000000 2015.0 250992.0 20 1976-05-01 00:00:00.000000 1969.0 256986.0 21 1976-05-01 00:00:00.000000 1969.0 261572.0 22 1976-06-01 00:00:00.000000 2161.0 250525.0 23 1976-06-01 00:00:00.000000 2161.0 255734.0 24 1976-07-01 00:00:00.000000 2111.0 279630.0 25 1976-07-01 00:00:00.000000 2111.0 279744.0 26 1976-08-01 00:00:00.000000 2233.0 279937.0 27 1976-08-01 00:00:00.000000 2233.0 286496.0 28 1976-09-01 00:00:00.000000 2274.0 273750.0 29 1976-09-01 00:00:00.000000 2274.0 283718.0 ... ... ... ... 533 2010-06-01 00:00:00.000000 2320.0 NaN 534 2010-07-01 00:00:00.000000 2229.6 NaN 535 2010-08-01 00:00:00.000000 2286.6 NaN 536 2010-09-01 00:00:00.000000 2252.2 NaN 537 2010-10-01 00:00:00.000000 2234.9 NaN 538 2010-11-01 00:00:00.000000 2235.5 NaN 539 2010-12-01 00:00:00.000000 2270.9 NaN 540 2011-01-01 00:00:00.000000 2122.9 356457.0 541 2011-02-01 00:00:00.000000 2020.4 338521.0 542 2011-03-01 00:00:00.000000 2266.2 350630.0 543 2011-04-01 00:00:00.000000 2052.5 346397.0 544 2011-05-01 00:00:00.000000 2131.9 354886.0 545 2011-06-01 00:00:00.000000 2375.0 348587.0 546 2011-07-01 00:00:00.000000 2134.1 375384.0 547 2011-08-01 00:00:00.000000 2386.9 373333.0 548 2011-09-01 00:00:00.000000 2215.2 367965.0 549 2011-10-01 00:00:00.000000 2215.1 357875.0 550 2011-11-01 00:00:00.000000 2148.8 323788.0 551 2011-12-01 00:00:00.000000 2126.3 353871.0 552 2012-01-01 00:00:00.000000 2113.8 337980.0 553 2012-02-01 00:00:00.000000 2009.0 316641.0 554 2012-03-01 00:00:00.000000 2159.8 347803.0 555 2012-04-01 00:00:00.000000 1990.6 337272.0 556 2012-05-01 00:00:00.000000 2232.0 345257.0 557 2012-06-01 00:00:00.000000 2252.1 346971.0 558 2012-07-01 00:00:00.000000 2200.8 368450.0 559 2012-08-01 00:00:00.000000 2367.5 359554.0 560 2012-09-01 00:00:00.000000 2016.0 361922.0 561 2012-10-01 00:00:00.000000 2343.7 347625.0 562 2012-11-01 00:00:00.000000 2206.6 320195.0
这里是repo:https://github.com/yhat/pandasql和一个很好的快速入门教程http://blog.yhat.com/posts/pandasql-sql-for-pandas-dataframes.html