将具有项列表的数据帧行转换为具有项对的行

时间:2014-06-11 05:07:30

标签: python pandas

df1看起来像:

dateA; item1; item2; item3; item4; itemN...
dateB; item5; item2; item3; item6; itemN...

df2应如下所示:

dateA; item1; item2
dateA; item1; item3
dateA; item1; item4
dateA; item2; item3
etc.

df2中的每一行都应该有三列, 和df2作为一个整体应该包含日期 在df1

中的同一行上一起出现

2 个答案:

答案 0 :(得分:2)

这是你想要的吗?

import io
data = """dateA; item1; item2; item3; item4; itemN
dateB; item5; item2; item3; item6; itemN
"""

df = pd.read_csv(io.BytesIO(data), sep=";", header=None, skipinitialspace=True, index_col=0)

from itertools import combinations, chain
df2 = df[list(chain.from_iterable(combinations(df.columns, 2)))]

df2.columns = pd.MultiIndex.from_product([range(df2.shape[1]//2), ["A", "B"]])

print df2.stack(level=0)

输出:

             A      B
0                    
dateA 0  item1  item2
      1  item1  item3
      2  item1  item4
      3  item1  itemN
      4  item2  item3
      5  item2  item4
      6  item2  itemN
      7  item3  item4
      8  item3  itemN
      9  item4  itemN
dateB 0  item5  item2
      1  item5  item3
      2  item5  item6
      3  item5  itemN
      4  item2  item3
      5  item2  item6
      6  item2  itemN
      7  item3  item6
      8  item3  itemN
      9  item6  itemN

修改

由于不是每一行都有相同的项目数,您需要以下代码:

import itertools
import pandas as pd
import io
txt = """1975;a;b
1976;b;c;d;e;f
1977;b;a
1977;a;b;g
1978;c;d
1979;e;f;b;k
1980;a;f"""

f = io.BytesIO(txt) # change this line to f = open("yourfile.csv")
result = []
for line in f:
    data = line.strip().split(";")
    year = data[0]
    for row in itertools.combinations(data[1:], 2):
        result.append((year,) + row)
df = pd.DataFrame(result)
print df

答案 1 :(得分:0)

from pandas import read_csv
import io
data = read_csv('/Users/Simon/Dropbox/Work/Datasets/lagtest.csv')
df = read_csv(io.BytesIO(data), sep=";", header=None, skipinitialspace=True, index_col=0)

以上代码停止并显示错误:

TypeError: 'DataFrame' does not have the buffer interface

打印数据'看起来像这样:

         1975;a;b
0  1976;b;c;d;e;f
1        1977;b;a
2      1977;a;b;g
3        1978;c;d
4    1979;e;f;b;k
5        1980;a;f

[6 rows x 1 columns]