使用python或pandas来编辑两个csv文件

时间:2018-09-18 13:13:02

标签: python pandas csv

**csv file 1**

date    yearMonth   deviceCategory  channelGrouping eventCategory   Totalevents
20160719    201607  desktop Direct  _GW_Legal_RM_false  149
20160719    201607  desktop Direct  _GW_Risk_RM_false   298
20160719    201607  desktop Direct  _GW_Risk_RM_true    149
20160719    201607  desktop Direct  _GW__Product-Sign-In__  895
20160719    201607  desktop Organic Search  _GW_Legal_RM_false  149
20160719    201607  desktop Organic Search  _GW_Risk_RM_false   746
20160719    201607  desktop Organic Search  _GW__Product-Sign-In__  1342
20160719    201607  desktop Referral    _GW__Product-Sign-In__  1044
20160719    201607  mobile  Direct  _GW_Legal_RM_false  149
20160719    201607  mobile  Social  _GW_Legal_RM_false  149
20160719    201607  tablet  Direct  _GW_Legal_RM_false  149
20160720    201607  desktop Branded Paid Search _GW_Legal_RM_false  149
20160720    201607  desktop Direct  _GW_Legal_RM_false  149
20160720    201607  desktop Direct  _GW__Product-Sign-In__  746
20160720    201607  desktop Non-Branded Paid Search _GW_Legal_RM_false  149
20160720    201607  desktop Non-Branded Paid Search _GW_Risk_RM_false   149
20160720    201607  desktop Organic Search  _GW_Legal_RM_false  1939
20160720    201607  desktop Organic Search  _GW_Risk_RM_false   298

我有2个CSV文件,我想基于一个公共列进行合并,但是公共列的长度不同!有没有办法在不复制值的情况下合并/合并

csv文件2

eventCategory   event_type
_GW_Legal_RM_false  Legal
_GW_Legal_RM_true   Legal
_GW_Legal_RM_   Legal
_GW_Risk_RM_false   Risk
_GW_Risk_RM_true    Risk
_GW_Risk_RM_    Risk
_GW__Product-Sign-In__  Sign-in

Output.csv

eventCategory   event_type  date    yearMonth   deviceCategory  channelGrouping Totalevents
 _GW_Legal_RM_false Legal   20160719    201607  desktop Direct  149
 _GW_Legal_RM_false Legal   20160719    201607  desktop Organic Search  149
 _GW_Legal_RM_false Legal   20160719    201607  mobile  Direct  149
 _GW_Legal_RM_false Legal   20160719    201607  mobile  Social  149

3 个答案:

答案 0 :(得分:1)

mapset_index一起使用:

import pandas as pd
from io import StringIO

csv1 = StringIO("""date    yearMonth   deviceCategory  channelGrouping  eventCategory   Totalevents
20160719    201607  desktop  Direct  _GW_Legal_RM_false  149
20160719    201607  desktop  Direct  _GW_Risk_RM_false   298
20160719    201607  desktop  Direct  _GW_Risk_RM_true    149
20160719    201607  desktop  Direct  _GW__Product-Sign-In__  895
20160719    201607  desktop  Organic Search  _GW_Legal_RM_false  149
20160719    201607  desktop  Organic Search  _GW_Risk_RM_false   746
20160719    201607  desktop  Organic Search  _GW__Product-Sign-In__  1342
20160719    201607  desktop  Referral    _GW__Product-Sign-In__  1044
20160719    201607  mobile  Direct  _GW_Legal_RM_false  149
20160719    201607  mobile  Social  _GW_Legal_RM_false  149
20160719    201607  tablet  Direct  _GW_Legal_RM_false  149
20160720    201607  desktop  Branded Paid Search  _GW_Legal_RM_false  149
20160720    201607  desktop  Direct  _GW_Legal_RM_false  149
20160720    201607  desktop  Direct  _GW__Product-Sign-In__  746
20160720    201607  desktop  Non-Branded Paid Search  _GW_Legal_RM_false  149
20160720    201607  desktop  Non-Branded Paid Search  _GW_Risk_RM_false   149
20160720    201607  desktop  Organic Search  _GW_Legal_RM_false  1939
20160720    201607  desktop  Organic Search  _GW_Risk_RM_false   298""")

csv2= StringIO("""eventCategory   event_type
_GW_Legal_RM_false  Legal
_GW_Legal_RM_true   Legal
_GW_Legal_RM_   Legal
_GW_Risk_RM_false   Risk
_GW_Risk_RM_true    Risk
_GW_Risk_RM_    Risk
_GW__Product-Sign-In__  Sign-in""")

df1 = pd.read_csv(csv1,sep='\s\s+')
df2 = pd.read_csv(csv2, sep='\s\s+')

df1['event_type'] = df1['eventCategory'].map(df2.set_index('eventCategory')['event_type'])

df1

输出:

        date  yearMonth deviceCategory          channelGrouping           eventCategory  Totalevents event_type
0   20160719     201607        desktop                   Direct      _GW_Legal_RM_false          149      Legal
1   20160719     201607        desktop                   Direct       _GW_Risk_RM_false          298       Risk
2   20160719     201607        desktop                   Direct        _GW_Risk_RM_true          149       Risk
3   20160719     201607        desktop                   Direct  _GW__Product-Sign-In__          895    Sign-in
4   20160719     201607        desktop           Organic Search      _GW_Legal_RM_false          149      Legal
5   20160719     201607        desktop           Organic Search       _GW_Risk_RM_false          746       Risk
6   20160719     201607        desktop           Organic Search  _GW__Product-Sign-In__         1342    Sign-in
7   20160719     201607        desktop                 Referral  _GW__Product-Sign-In__         1044    Sign-in
8   20160719     201607         mobile                   Direct      _GW_Legal_RM_false          149      Legal
9   20160719     201607         mobile                   Social      _GW_Legal_RM_false          149      Legal
10  20160719     201607         tablet                   Direct      _GW_Legal_RM_false          149      Legal
11  20160720     201607        desktop      Branded Paid Search      _GW_Legal_RM_false          149      Legal
12  20160720     201607        desktop                   Direct      _GW_Legal_RM_false          149      Legal
13  20160720     201607        desktop                   Direct  _GW__Product-Sign-In__          746    Sign-in
14  20160720     201607        desktop  Non-Branded Paid Search      _GW_Legal_RM_false          149      Legal
15  20160720     201607        desktop  Non-Branded Paid Search       _GW_Risk_RM_false          149       Risk
16  20160720     201607        desktop           Organic Search      _GW_Legal_RM_false         1939      Legal
17  20160720     201607        desktop           Organic Search       _GW_Risk_RM_false          298       Risk

答案 1 :(得分:0)

要扩展ALollz的回复,

import pandas as pd
df1 = pd.read_csv("1.csv", sep=" ")
df2 = pd.read_csv("2.csv", sep=" ")

df = pd.merge([df1, df2], on='eventCategory', how='left')

答案 2 :(得分:0)

df1 = pd.read_csv("csv1.csv")

df2 = pd.read_csv("csv2.csv")

df = pd.merge(df1, df2, on='eventCategory', how='left')

对@FrankZhu答案的一些修改。