df1看起来像:
dateA; item1; item2; item3; item4; itemN...
dateB; item5; item2; item3; item6; itemN...
df2应如下所示:
dateA; item1; item2
dateA; item1; item3
dateA; item1; item4
dateA; item2; item3
etc.
df2中的每一行都应该有三列, 和df2作为一个整体应该包含日期对 在df1
中的同一行上一起出现答案 0 :(得分:2)
这是你想要的吗?
import io
data = """dateA; item1; item2; item3; item4; itemN
dateB; item5; item2; item3; item6; itemN
"""
df = pd.read_csv(io.BytesIO(data), sep=";", header=None, skipinitialspace=True, index_col=0)
from itertools import combinations, chain
df2 = df[list(chain.from_iterable(combinations(df.columns, 2)))]
df2.columns = pd.MultiIndex.from_product([range(df2.shape[1]//2), ["A", "B"]])
print df2.stack(level=0)
输出:
A B
0
dateA 0 item1 item2
1 item1 item3
2 item1 item4
3 item1 itemN
4 item2 item3
5 item2 item4
6 item2 itemN
7 item3 item4
8 item3 itemN
9 item4 itemN
dateB 0 item5 item2
1 item5 item3
2 item5 item6
3 item5 itemN
4 item2 item3
5 item2 item6
6 item2 itemN
7 item3 item6
8 item3 itemN
9 item6 itemN
修改强>
由于不是每一行都有相同的项目数,您需要以下代码:
import itertools
import pandas as pd
import io
txt = """1975;a;b
1976;b;c;d;e;f
1977;b;a
1977;a;b;g
1978;c;d
1979;e;f;b;k
1980;a;f"""
f = io.BytesIO(txt) # change this line to f = open("yourfile.csv")
result = []
for line in f:
data = line.strip().split(";")
year = data[0]
for row in itertools.combinations(data[1:], 2):
result.append((year,) + row)
df = pd.DataFrame(result)
print df
答案 1 :(得分:0)
from pandas import read_csv
import io
data = read_csv('/Users/Simon/Dropbox/Work/Datasets/lagtest.csv')
df = read_csv(io.BytesIO(data), sep=";", header=None, skipinitialspace=True, index_col=0)
以上代码停止并显示错误:
TypeError: 'DataFrame' does not have the buffer interface
打印数据'看起来像这样:
1975;a;b
0 1976;b;c;d;e;f
1 1977;b;a
2 1977;a;b;g
3 1978;c;d
4 1979;e;f;b;k
5 1980;a;f
[6 rows x 1 columns]