如何在Pandas中合并两列

时间:2014-06-09 20:23:05

标签: python pandas

我有这些数据:

1975,a,b
1976,b,c
1977,b,a
1977,a,b
1978,c,d
1979,e,f
1980,a,f    

我希望有一个包含年份和项目的两列列表,如下所示:

1975,a
1975,b
...

我有这段代码:

import pandas

# Set column names
colnames=['Date','Item1','Item2']

# read csv adding column names
data = pandas.read_csv('/Users/Simon/Dropbox/Work/Datasets/lagtest.csv', names=colnames)

# create a dataframe with info on dates for first column
datelist1 = data[['Date', 'Item1']]

# create a dataframe with info on dates for first column
datelist2 = data[['Date', 'Item2']]

bigdatelist = datelist1.append(datelist2)

print bigdatelist

但它给了我这个:

   Date Item1 Item2
0  1975     a   NaN
1  1976     b   NaN
2  1977     b   NaN
3  1977     a   NaN
4  1978     c   NaN
5  1979     e   NaN
6  1980     a   NaN
0  1975   NaN     b
1  1976   NaN     c
2  1977   NaN     a
3  1977   NaN     b
4  1978   NaN     d
5  1979   NaN     f
6  1980   NaN     f

我希望行号连续,并将最后两列合并为一列。有什么建议吗?

2 个答案:

答案 0 :(得分:2)

您正在寻找pd.melt

假设您将此作为数据框

>>> df
   Date item1 item2
0  1975     a     b
1  1976     b     c
2  1977     b     a
3  1977     a     b
4  1978     c     d
5  1979     e     f
6  1980     a     f

[7 rows x 3 columns]

现在使用:

pd.melt(df, id_vars='year')['year','value']

得到你需要的东西。

>>> pd.melt(df, id_vars='Date')[['Date','value']]
    Date value
0   1975     a
1   1976     b
2   1977     b
3   1977     a
4   1978     c
5   1979     e
6   1980     a
7   1975     b
8   1976     c
9   1977     a
10  1977     b
11  1978     d
12  1979     f
13  1980     f

[14 rows x 2 columns]

答案 1 :(得分:0)

您可以使用set_indexstack,保留您想要的订单

In [4227]: df.set_index('Date').stack().reset_index(name='value')[['Date', 'value']]
Out[4227]:
    Date value
0   1975     a
1   1975     b
2   1976     b
3   1976     c
4   1977     b
5   1977     a
6   1977     a
7   1977     b
8   1978     c
9   1978     d
10  1979     e
11  1979     f
12  1980     a
13  1980     f