**csv file 1**
date yearMonth deviceCategory channelGrouping eventCategory Totalevents
20160719 201607 desktop Direct _GW_Legal_RM_false 149
20160719 201607 desktop Direct _GW_Risk_RM_false 298
20160719 201607 desktop Direct _GW_Risk_RM_true 149
20160719 201607 desktop Direct _GW__Product-Sign-In__ 895
20160719 201607 desktop Organic Search _GW_Legal_RM_false 149
20160719 201607 desktop Organic Search _GW_Risk_RM_false 746
20160719 201607 desktop Organic Search _GW__Product-Sign-In__ 1342
20160719 201607 desktop Referral _GW__Product-Sign-In__ 1044
20160719 201607 mobile Direct _GW_Legal_RM_false 149
20160719 201607 mobile Social _GW_Legal_RM_false 149
20160719 201607 tablet Direct _GW_Legal_RM_false 149
20160720 201607 desktop Branded Paid Search _GW_Legal_RM_false 149
20160720 201607 desktop Direct _GW_Legal_RM_false 149
20160720 201607 desktop Direct _GW__Product-Sign-In__ 746
20160720 201607 desktop Non-Branded Paid Search _GW_Legal_RM_false 149
20160720 201607 desktop Non-Branded Paid Search _GW_Risk_RM_false 149
20160720 201607 desktop Organic Search _GW_Legal_RM_false 1939
20160720 201607 desktop Organic Search _GW_Risk_RM_false 298
我有2个CSV文件,我想基于一个公共列进行合并,但是公共列的长度不同!有没有办法在不复制值的情况下合并/合并
csv文件2
eventCategory event_type
_GW_Legal_RM_false Legal
_GW_Legal_RM_true Legal
_GW_Legal_RM_ Legal
_GW_Risk_RM_false Risk
_GW_Risk_RM_true Risk
_GW_Risk_RM_ Risk
_GW__Product-Sign-In__ Sign-in
Output.csv
eventCategory event_type date yearMonth deviceCategory channelGrouping Totalevents
_GW_Legal_RM_false Legal 20160719 201607 desktop Direct 149
_GW_Legal_RM_false Legal 20160719 201607 desktop Organic Search 149
_GW_Legal_RM_false Legal 20160719 201607 mobile Direct 149
_GW_Legal_RM_false Legal 20160719 201607 mobile Social 149
答案 0 :(得分:1)
将map
与set_index
一起使用:
import pandas as pd
from io import StringIO
csv1 = StringIO("""date yearMonth deviceCategory channelGrouping eventCategory Totalevents
20160719 201607 desktop Direct _GW_Legal_RM_false 149
20160719 201607 desktop Direct _GW_Risk_RM_false 298
20160719 201607 desktop Direct _GW_Risk_RM_true 149
20160719 201607 desktop Direct _GW__Product-Sign-In__ 895
20160719 201607 desktop Organic Search _GW_Legal_RM_false 149
20160719 201607 desktop Organic Search _GW_Risk_RM_false 746
20160719 201607 desktop Organic Search _GW__Product-Sign-In__ 1342
20160719 201607 desktop Referral _GW__Product-Sign-In__ 1044
20160719 201607 mobile Direct _GW_Legal_RM_false 149
20160719 201607 mobile Social _GW_Legal_RM_false 149
20160719 201607 tablet Direct _GW_Legal_RM_false 149
20160720 201607 desktop Branded Paid Search _GW_Legal_RM_false 149
20160720 201607 desktop Direct _GW_Legal_RM_false 149
20160720 201607 desktop Direct _GW__Product-Sign-In__ 746
20160720 201607 desktop Non-Branded Paid Search _GW_Legal_RM_false 149
20160720 201607 desktop Non-Branded Paid Search _GW_Risk_RM_false 149
20160720 201607 desktop Organic Search _GW_Legal_RM_false 1939
20160720 201607 desktop Organic Search _GW_Risk_RM_false 298""")
csv2= StringIO("""eventCategory event_type
_GW_Legal_RM_false Legal
_GW_Legal_RM_true Legal
_GW_Legal_RM_ Legal
_GW_Risk_RM_false Risk
_GW_Risk_RM_true Risk
_GW_Risk_RM_ Risk
_GW__Product-Sign-In__ Sign-in""")
df1 = pd.read_csv(csv1,sep='\s\s+')
df2 = pd.read_csv(csv2, sep='\s\s+')
df1['event_type'] = df1['eventCategory'].map(df2.set_index('eventCategory')['event_type'])
df1
输出:
date yearMonth deviceCategory channelGrouping eventCategory Totalevents event_type
0 20160719 201607 desktop Direct _GW_Legal_RM_false 149 Legal
1 20160719 201607 desktop Direct _GW_Risk_RM_false 298 Risk
2 20160719 201607 desktop Direct _GW_Risk_RM_true 149 Risk
3 20160719 201607 desktop Direct _GW__Product-Sign-In__ 895 Sign-in
4 20160719 201607 desktop Organic Search _GW_Legal_RM_false 149 Legal
5 20160719 201607 desktop Organic Search _GW_Risk_RM_false 746 Risk
6 20160719 201607 desktop Organic Search _GW__Product-Sign-In__ 1342 Sign-in
7 20160719 201607 desktop Referral _GW__Product-Sign-In__ 1044 Sign-in
8 20160719 201607 mobile Direct _GW_Legal_RM_false 149 Legal
9 20160719 201607 mobile Social _GW_Legal_RM_false 149 Legal
10 20160719 201607 tablet Direct _GW_Legal_RM_false 149 Legal
11 20160720 201607 desktop Branded Paid Search _GW_Legal_RM_false 149 Legal
12 20160720 201607 desktop Direct _GW_Legal_RM_false 149 Legal
13 20160720 201607 desktop Direct _GW__Product-Sign-In__ 746 Sign-in
14 20160720 201607 desktop Non-Branded Paid Search _GW_Legal_RM_false 149 Legal
15 20160720 201607 desktop Non-Branded Paid Search _GW_Risk_RM_false 149 Risk
16 20160720 201607 desktop Organic Search _GW_Legal_RM_false 1939 Legal
17 20160720 201607 desktop Organic Search _GW_Risk_RM_false 298 Risk
答案 1 :(得分:0)
要扩展ALollz的回复,
import pandas as pd
df1 = pd.read_csv("1.csv", sep=" ")
df2 = pd.read_csv("2.csv", sep=" ")
df = pd.merge([df1, df2], on='eventCategory', how='left')
答案 2 :(得分:0)
df1 = pd.read_csv("csv1.csv")
df2 = pd.read_csv("csv2.csv")
df = pd.merge(df1, df2, on='eventCategory', how='left')
对@FrankZhu答案的一些修改。