我有4个包含一列的CSV文件。每列代表名称的一部分(4部分):
CSV 1:
first_name
michael
madonna
steve
albert
CSV 2:
second_name
luke
han
kurt
CSV 3:
first_last_name
jackson
jobs
skywalker
CSV 4:
second_last_name
solo
cobain
einstein
我想要的最终结果是获得所有4列(4个CSV)之间的所有可能组合:
first_name,second_name,first_last_name,second_last_name
michael,luke,jackson,solo
michael,luke,jackson,cobain
michael,luke,jackson,einstein
michael,luke,jobs,solo
michael,luke,jobs,cobain
michael,luke,jobs,einstein
michael,luke,skywalker,solo
michael,luke,skywalker,cobain
michael,luke,skywalker,einstein
...
使用 pandas 我将每个 CSV 转换为数据框,但我不知道如何将所有四个结合起来。我怎样才能做到这一点?
答案 0 :(得分:2)
import numpy as np
import pandas as pd
import itertools
import functools
def cartesian(df1, df2):
rows = itertools.product(df1.iterrows(), df2.iterrows())
df = pd.DataFrame(left.append(right) for (_, left), (_, right) in rows)
return df.reset_index(drop=True)
df1 = pd.read_csv('first_name.csv')
df2 = pd.read_csv('second_name.csv')
df3 = pd.read_csv('first_last_name.csv')
df4 = pd.read_csv('second_last_name.csv')
combined = functools.reduce(cartesian, [df1, df2, df3, df4])
combined.to_csv('combined.csv')
答案 1 :(得分:0)
使用itertools.product
进行繁重的工作。
import pandas as pd
from itertools import product
lists = [list(pd.read_csv('data{}.csv'.format(i), header=0).iloc[:,0]) for i in range(1,5)]
combined = list(','.join(items) for items in product(*lists))
pd.DataFrame(combined).to_csv('combined.csv', index=0)
如果您只想要列表表单,请使用combined
。它看起来像:
['michael,luke,jackson,solo',
'michael,luke,jackson,cobain',
'michael,luke,jackson,einstein',
'michael,luke,jobs,solo',
'michael,luke,jobs,cobain',
'michael,luke,jobs,einstein',
'michael,luke,skywalker,solo',
'michael,luke,skywalker,cobain',
'michael,luke,skywalker,einstein',
...
或者最后一行将合并后的值写入CSV。