我有两个包含许多n列的CSV文件。我必须将这两个csv文件与一个CSV文件合并,该文件在两个输入文件中都有一个唯一的列。
我彻底浏览了所有博客和网站。所有这些都将导致使用自定义.NET活动。所以我只需浏览this site
但仍然无法确定C#Coding中的哪个部分。任何人都可以使用Azure数据工厂中的自定义.NET Activity共享代码以了解如何合并这两个CSV文件。
答案 0 :(得分:1)
以下是如何使用U-SQL在Zip_Code列上连接这两个以制表符分隔的文件的示例。此示例假定这两个文件都保存在Azure Data Lake Storage(ADLS)中。该脚本可以很容易地合并到数据工厂管道中:
// Get raw input from file A
@inputA =
EXTRACT
Date_received string,
Product string,
Sub_product string,
Issue string,
Sub_issue string,
Consumer_complaint_narrative string,
Company_public_response string,
Company string,
State string,
ZIP_Code string,
Tags string,
Consumer_consent_provided string,
Submitted_via string,
Date_sent_to_company string,
Company_response_to_consumer string,
Timely_response string,
Consumer_disputed string,
Complaint_ID string
FROM "/input/input48A.txt"
USING Extractors.Tsv();
// Get raw input from file B
@inputB =
EXTRACT Provider_ID string,
Hospital_Name string,
Address string,
City string,
State string,
ZIP_Code string,
County_Name string,
Phone_Number string,
Hospital_Type string,
Hospital_Ownership string,
Emergency_Services string,
Meets_criteria_for_meaningful_use_of_EHRs string,
Hospital_overall_rating string,
Hospital_overall_rating_footnote string,
Mortality_national_comparison string,
Mortality_national_comparison_footnote string,
Safety_of_care_national_comparison string,
Safety_of_care_national_comparison_footnote string,
Readmission_national_comparison string,
Readmission_national_comparison_footnote string,
Patient_experience_national_comparison string,
Patient_experience_national_comparison_footnote string,
Effectiveness_of_care_national_comparison string,
Effectiveness_of_care_national_comparison_footnote string,
Timeliness_of_care_national_comparison string,
Timeliness_of_care_national_comparison_footnote string,
Efficient_use_of_medical_imaging_national_comparison string,
Efficient_use_of_medical_imaging_national_comparison_footnote string,
Location string
FROM "/input/input48B.txt"
USING Extractors.Tsv();
// Join the two files on the Zip_Code column
@output =
SELECT b.Provider_ID,
b.Hospital_Name,
b.Address,
b.City,
b.State,
b.ZIP_Code,
a.Complaint_ID
FROM @inputA AS a
INNER JOIN
@inputB AS b
ON a.ZIP_Code == b.ZIP_Code
WHERE a.ZIP_Code == "36033";
// Output the file
OUTPUT @output
TO "/output/output.txt"
USING Outputters.Tsv(quoting : false);
这也可以转换为带有文件名和邮政编码参数的U-SQL存储过程。
当然有可能实现这一目标,各有各的利弊。例如.net自定义活动对于具有.net背景的人来说可能会感觉更舒服,但是您需要一些计算来运行它。对于在订阅中具有SQL /数据库背景和Azure SQL DB的人来说,将文件导入Azure SQL数据库是一个不错的选择。