如何从csv数据添加数据帧列

时间:2016-11-25 00:49:20

标签: python python-2.7 pandas dataframe

我有csv数据

index   username
1       ailee
2       yura
3       sony
4       lily
5       alex
6       eunji
7       hyun
8       jingo
9       kim
10      min

和集群的数据框结果:

index   cluster
1        1
3        1
5        1
7        1
8        1
9        2
4        2
2        2
10       2
6        2

可以根据csv数据在pd.dataframe中添加用户名栏吗?

2 个答案:

答案 0 :(得分:1)

我正在使用'DataFrame.merge'。这是代码

>>> import StringIO as sio
>>> import pandas as pd

>>> s1='''index   username
1       ailee
2       yura
3       sony
4       lily
5       alex
6       eunji
7       hyun
8       jingo
9       kim
10      min'''
>>> s2 = '''index   cluster
1        1
3        1
5        1
7        1
8        1
9        2
4        2
2        2
10       2
6        2'''

>>> df1=pd.read_csv(sio.StringIO(s1), index_col=0, delim_whitespace=True)
>>> df2=pd.read_csv(sio.StringIO(s2), index_col=0, delim_whitespace=True)

>>> df1
      username
index
1        ailee
2         yura
3         sony
4         lily
5         alex
6        eunji
7         hyun
8        jingo
9          kim
10         min
>>> df2
       cluster
index
1            1
3            1
5            1
7            1
8            1
9            2
4            2
2            2
10           2
6            2

>>> df1.merge(df2, left_index=True, right_index=True)
      username  cluster
index
1        ailee        1
3         sony        1
5         alex        1
7         hyun        1
8        jingo        1
9          kim        2
4         lily        2
2         yura        2
10         min        2
6        eunji        2

答案 1 :(得分:0)

您可以使用join

print (df2.join(df1))
       cluster username
index                  
1            1    ailee
3            1     sony
5            1     alex
7            1     hyun
8            1    jingo
9            2      kim
4            2     lily
2            2     yura
10           2      min
6            2    eunji

map

#map by column cluster
df2['username'] = df2.cluster.map(df1.username)
#map by index
df2['username1'] = df2.index.to_series().map(df1.username)
print (df2)
       cluster username username1
index                            
1            1    ailee     ailee
3            1    ailee      sony
5            1    ailee      alex
7            1    ailee      hyun
8            1    ailee     jingo
9            2     yura       kim
4            2     yura      lily
2            2     yura      yura
10           2     yura       min
6            2     yura     eunji