比较非常不同的表

时间:2019-03-08 18:36:34

标签: sql ms-access

我有两个Tables,它们是从单独的文件(.xlsx.csv)中读取的,并导入到MS Access中。它们的格式不同 (这就是为什么我很难过)。

这里是xlsxTable

+--------------------------------------------------------------------------------------+
|    ID    |     Name     |    SSN    |    SSN2    |   Address                         |
+--------------------------------------------------------------------------------------+
| 00012345 | Robert Robin | ThisIsSSN | ThisIsSSN2 | 12345 StreetName St. CityName, KS |
| 00013245 | Pete Peters  | ThisIsSSN | ThisIsSSN2 | 54321 StreetName St. CityName, MO |
| 00012358 | Mike Michaels| ThisIsSSN | ThisIsSSN2 | 69874 StreetName St. CityName, NY |
| 00098755 | Tim Timpson  | ThisIsSSN | ThisIsSSN2 | 15987 StreetName St. CityName, KY |
| 00035784 | Tom Thompson | ThisIsSSN | ThisIsSSN2 | 95123 StreetName St. CityName, CA |
| 00012584 | Will Willers | ThisIsSSN | ThisIsSSN2 | 35789 StreetName St. CityName, WA |
| ........ | ...........  | ......... | .......... | ................................. |

这是我的csvTable

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tracking_number   |   last_name   |   first_name   |  middle_name  |  suffix  | alias_last_name   |   alias_first_name   |    alias_middle_name   |   alias_suffix  |    number   |    number_type     |    dob     |    street   |     city     |  state  | zip  | country | phone |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|            135247 |  Keeves       |   Michael      |               |  Jr      |                   |                      |                        |                 |  ThisIsSSN  | SSN/ITIN           |   1/1/1990 | StreetName  |   CityName   |    NJ   |      |   US    |       |
|            135248 |  Jackson      |   Sue          |    Master     |          |                   |                      |                        |                 |  ThisIsSSN  | SSN/ITIN           | 10/29/1980 | StreetName  |   CityName   |    NY   | zip  |   US    |       |
|            135248 |  Thomspon     |   Dolf         |    Laundry    |          |                   |                      |                        |                 |  DriverNum  | Driver'sLicense    | 11/15/1962 | StreetName  |   CityName   |    KS   |      |   US    |       |
|            135249 |  Peters       |   Pete         |               |          |    Peters         |     Petey            |                        |                 |  ThisIsSSN  | SSN/ITIN           |   5/6/1975 | StreetName  |   CityName   |    PA   | zip  |   US    |       |
|            135250 |  Rogers       |   Steve        |               |          |                   |                      |                        |                 |  ThisIsSSN  | SSN/ITIN           | 12/25/1990 | StreetName  |   CityName   |    CT   | zip  |   US    |       |
|            135250 |  Nikolson     |   Jack         |               |  Jr      |                   |                      |                        |                 |  DriverNum  | Driver'sLicense    |   8/5/1975 | StreetName  |   CityName   |    CA   | zip  |   US    |       |
|            135251 |  Keeves       |   Keanu        |    Neo        |          |                   |                      |                        |                 |  ThisIsSSN  | SSN/ITIN           | 10/30/2000 | StreetName  |   CityName   |    TX   | zip  |   US    |       |
|            135252 |  Starch       |   Tony         |               |          |                   |                      |                        |                 |  ThisIsSSN  | SSN/ITIN           |  9/10/1975 | StreetName  |   CityName   |    NJ   |      |   US    |       |
|...................|...............|................|...............|..........|...................|......................|........................|.................|.............|....................|............|.............|..............|.........|......|.........|.......|
| dba_name          |   number      |   number_type  |  incorporated |  street  |       city        |        state         |        zip             |    country      |    phone    |                    |            |             |              |         |      |         |       |
| Mini Mart         |   92585487    |   EIN          |               |  Street  |      CityName     |        state         |        zipNum          |    GT           |             |                    |            |             |              |         |      |         |       |
|                   |   15987548    |   EIN          |               |  street  |      CityName     |        KS            |        zipNum          |    US           |             |                    |            |             |              |         |      |         |       |
| Check Systems     |   35854855    |   EIN          |               |  street  |      CityName     |        CA            |        zipNum          |    US           |             |                    |            |             |              |         |      |         |       |
|...................|...............|................|...............|..........|...................|......................|........................|.................|.............|....................|............|.............|..............|.........|......|.........|.......|

上表中dba_name的位置是实际行。由于某种原因,文件的另一部分将开始新列表。

我必须查询这些表,如果名称与SSN匹配,那么我必须获取名称,地址和SSN,然后对它们进行处理(很可能放在另一个表中以进行导出)。我已经从文件中加载了两个表。 我现在需要遍历并找到匹配项。出于示例数据的考虑,Pete Peters应该在此处匹配,因为数据在两个表中。我的预期输出应与第一个表非常相似:

|    ID    |     Name     |    SSN    |    SSN2    |   Address                         |

我目前有一个包含这些表的MS Access数据库。但是,关于如何解析数据,我不确定从哪里开始使用SQL。在性能方面,这可能是广泛的。我只是在寻找一种使其首先工作的方法。

如何查询这两个截然不同的表,而只提取匹配的数据?

1 个答案:

答案 0 :(得分:1)

访问具有“查找重复项”查询向导。解决问题的最快方法是手动组合表或使用1个或多个查询,然后运行向导。同样,将所有数据放入一个表中,然后运行向导。通过分解使事情复杂化。 您可能会从CSV表中获取数据:带有类似以下查询的

SELECT csvTable.First_Name AS First_Name, csvTable.Last_Name AS Last_Name, csvTable.Number AS [Number]
FROM csvTable
GROUP BY csvTable.First_Name, csvTable.Last_Name, csvTable.Number
HAVING (((Count(csvTable.Number))>1));

然后从xlsx表创建具有相同结构的查询:


SELECT Left([xlsxTable]![FullName],InStr([xlsxTable]![FullName]," ")) AS First_Name, Right([xlsxTable].[FullName],Len([xlsxTable].[FullName])-InStr([xlsxTable]![FullName]," ")) AS Last_Name, xlsxTable.SSN AS [Number]
FROM xlsxTable
GROUP BY Left([xlsxTable]![FullName],InStr([xlsxTable]![FullName]," ")), Right([xlsxTable].[FullName],Len([xlsxTable].[FullName])-InStr([xlsxTable]![FullName]," ")), xlsxTable.SSN
HAVING (((Count(xlsxTable.SSN))>1));

Count> 1可以找到重复项。其余大部分都是钝性字符串操作,可直接在sql中将全名转换为名字和姓氏。然后合并查询,以便您可以使用UNION ALL语句在sql窗格中同时运行它们:

SELECT csvTable.First_Name AS First_Name, csvTable.Last_Name AS Last_Name, csvTable.Number AS [Number]
FROM csvTable
GROUP BY csvTable.First_Name, csvTable.Last_Name, csvTable.Number

UNION ALL 
SELECT Left([xlsxTable]![FullName],InStr([xlsxTable]![FullName]," ")) AS First_Name, Right([xlsxTable].[FullName],Len([xlsxTable].[FullName])-InStr([xlsxTable]![FullName]," ")) AS Last_Name, xlsxTable.SSN AS [Number]
FROM xlsxTable
GROUP BY Left([xlsxTable]![FullName],InStr([xlsxTable]![FullName]," ")), Right([xlsxTable].[FullName],Len([xlsxTable].[FullName])-InStr([xlsxTable]![FullName]," ")), xlsxTable.SSN;

联合会保留所有重复项,而工会会忽略它们。我从工会中删除了hading声明,因为我认为它更好。接下来,在组合查询中使用“查找重复项”向导,例如:

SELECT [combine tables].First_Name, [combine tables].Last_Name, [combine tables].Number
FROM [combine tables]
GROUP BY [combine tables].First_Name, [combine tables].Last_Name, [combine tables].Number
HAVING (((Count([combine tables].Number))>1));