我有两个Tables
,它们是从单独的文件(.xlsx
和.csv
)中读取的,并导入到MS Access中。它们的格式不同
(这就是为什么我很难过)。
这里是xlsxTable
:
+--------------------------------------------------------------------------------------+
| ID | Name | SSN | SSN2 | Address |
+--------------------------------------------------------------------------------------+
| 00012345 | Robert Robin | ThisIsSSN | ThisIsSSN2 | 12345 StreetName St. CityName, KS |
| 00013245 | Pete Peters | ThisIsSSN | ThisIsSSN2 | 54321 StreetName St. CityName, MO |
| 00012358 | Mike Michaels| ThisIsSSN | ThisIsSSN2 | 69874 StreetName St. CityName, NY |
| 00098755 | Tim Timpson | ThisIsSSN | ThisIsSSN2 | 15987 StreetName St. CityName, KY |
| 00035784 | Tom Thompson | ThisIsSSN | ThisIsSSN2 | 95123 StreetName St. CityName, CA |
| 00012584 | Will Willers | ThisIsSSN | ThisIsSSN2 | 35789 StreetName St. CityName, WA |
| ........ | ........... | ......... | .......... | ................................. |
这是我的csvTable
:
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tracking_number | last_name | first_name | middle_name | suffix | alias_last_name | alias_first_name | alias_middle_name | alias_suffix | number | number_type | dob | street | city | state | zip | country | phone |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 135247 | Keeves | Michael | | Jr | | | | | ThisIsSSN | SSN/ITIN | 1/1/1990 | StreetName | CityName | NJ | | US | |
| 135248 | Jackson | Sue | Master | | | | | | ThisIsSSN | SSN/ITIN | 10/29/1980 | StreetName | CityName | NY | zip | US | |
| 135248 | Thomspon | Dolf | Laundry | | | | | | DriverNum | Driver'sLicense | 11/15/1962 | StreetName | CityName | KS | | US | |
| 135249 | Peters | Pete | | | Peters | Petey | | | ThisIsSSN | SSN/ITIN | 5/6/1975 | StreetName | CityName | PA | zip | US | |
| 135250 | Rogers | Steve | | | | | | | ThisIsSSN | SSN/ITIN | 12/25/1990 | StreetName | CityName | CT | zip | US | |
| 135250 | Nikolson | Jack | | Jr | | | | | DriverNum | Driver'sLicense | 8/5/1975 | StreetName | CityName | CA | zip | US | |
| 135251 | Keeves | Keanu | Neo | | | | | | ThisIsSSN | SSN/ITIN | 10/30/2000 | StreetName | CityName | TX | zip | US | |
| 135252 | Starch | Tony | | | | | | | ThisIsSSN | SSN/ITIN | 9/10/1975 | StreetName | CityName | NJ | | US | |
|...................|...............|................|...............|..........|...................|......................|........................|.................|.............|....................|............|.............|..............|.........|......|.........|.......|
| dba_name | number | number_type | incorporated | street | city | state | zip | country | phone | | | | | | | | |
| Mini Mart | 92585487 | EIN | | Street | CityName | state | zipNum | GT | | | | | | | | | |
| | 15987548 | EIN | | street | CityName | KS | zipNum | US | | | | | | | | | |
| Check Systems | 35854855 | EIN | | street | CityName | CA | zipNum | US | | | | | | | | | |
|...................|...............|................|...............|..........|...................|......................|........................|.................|.............|....................|............|.............|..............|.........|......|.........|.......|
上表中dba_name
的位置是实际行。由于某种原因,文件的另一部分将开始新列表。
我必须查询这些表,如果名称与SSN匹配,那么我必须获取名称,地址和SSN,然后对它们进行处理(很可能放在另一个表中以进行导出)。我已经从文件中加载了两个表。
我现在需要遍历并找到匹配项。出于示例数据的考虑,Pete Peters
应该在此处匹配,因为数据在两个表中。我的预期输出应与第一个表非常相似:
| ID | Name | SSN | SSN2 | Address |
我目前有一个包含这些表的MS Access数据库。但是,关于如何解析数据,我不确定从哪里开始使用SQL。在性能方面,这可能是广泛的。我只是在寻找一种使其首先工作的方法。
如何查询这两个截然不同的表,而只提取匹配的数据?
答案 0 :(得分:1)
访问具有“查找重复项”查询向导。解决问题的最快方法是手动组合表或使用1个或多个查询,然后运行向导。同样,将所有数据放入一个表中,然后运行向导。通过分解使事情复杂化。 您可能会从CSV表中获取数据:带有类似以下查询的
SELECT csvTable.First_Name AS First_Name, csvTable.Last_Name AS Last_Name, csvTable.Number AS [Number]
FROM csvTable
GROUP BY csvTable.First_Name, csvTable.Last_Name, csvTable.Number
HAVING (((Count(csvTable.Number))>1));
然后从xlsx表创建具有相同结构的查询:
SELECT Left([xlsxTable]![FullName],InStr([xlsxTable]![FullName]," ")) AS First_Name, Right([xlsxTable].[FullName],Len([xlsxTable].[FullName])-InStr([xlsxTable]![FullName]," ")) AS Last_Name, xlsxTable.SSN AS [Number]
FROM xlsxTable
GROUP BY Left([xlsxTable]![FullName],InStr([xlsxTable]![FullName]," ")), Right([xlsxTable].[FullName],Len([xlsxTable].[FullName])-InStr([xlsxTable]![FullName]," ")), xlsxTable.SSN
HAVING (((Count(xlsxTable.SSN))>1));
Count> 1可以找到重复项。其余大部分都是钝性字符串操作,可直接在sql中将全名转换为名字和姓氏。然后合并查询,以便您可以使用UNION ALL语句在sql窗格中同时运行它们:
SELECT csvTable.First_Name AS First_Name, csvTable.Last_Name AS Last_Name, csvTable.Number AS [Number]
FROM csvTable
GROUP BY csvTable.First_Name, csvTable.Last_Name, csvTable.Number
UNION ALL
SELECT Left([xlsxTable]![FullName],InStr([xlsxTable]![FullName]," ")) AS First_Name, Right([xlsxTable].[FullName],Len([xlsxTable].[FullName])-InStr([xlsxTable]![FullName]," ")) AS Last_Name, xlsxTable.SSN AS [Number]
FROM xlsxTable
GROUP BY Left([xlsxTable]![FullName],InStr([xlsxTable]![FullName]," ")), Right([xlsxTable].[FullName],Len([xlsxTable].[FullName])-InStr([xlsxTable]![FullName]," ")), xlsxTable.SSN;
联合会保留所有重复项,而工会会忽略它们。我从工会中删除了hading声明,因为我认为它更好。接下来,在组合查询中使用“查找重复项”向导,例如:
SELECT [combine tables].First_Name, [combine tables].Last_Name, [combine tables].Number
FROM [combine tables]
GROUP BY [combine tables].First_Name, [combine tables].Last_Name, [combine tables].Number
HAVING (((Count([combine tables].Number))>1));