如何串联来自两个不同数据帧的行的组合?

时间:2019-05-22 16:07:10

标签: python pandas

我有两个具有不同列名的数据框。我想创建一个新的数据框,其列名是两个数据框列的串联。结果行数将是两个数据集的行之间所有可能的组合(n_rows选择2)。


data$countryname= as.character(data$countryname)

  output$top10countries <-renderChart({
    topcountries <- 
      arrange(data%>%  
                group_by(as.character(countryname)) %>% 
                summarise(
                  Collective_Turnover= sum(as.numeric(`Net turnover`))
                ), desc(Collective_Turnover))
    colnames(topcountries )[colnames(topcountries )=="as.character(countryname)"] <- "Country"

    topcountries <- subset(topcountries [1:10,], select = c(Country, Collective_Turnover))

    p <- nPlot(Collective_Turnover~Country, data = topcountries , type = "discreteBarChart", dom = "top10countries")
    p$params$width <- 1000
    p$params$height <- 200
    p$xAxis(staggerLabels = TRUE)
    # p$yAxis(axisLabel = "CollectiveTO", width = 50)
    return(p)
  })

将生成

df1 = pd.DataFrame({'A': ['1', '2']})
df2 = pd.DataFrame({'B': ['a', 'b', 'c']})

3 个答案:

答案 0 :(得分:3)

使用itertools.product()

import itertools
pd.DataFrame(list(itertools.product(df1.A,df2.B)),columns=['A','B'])

   A  B
0  1  a
1  1  b
2  1  c
3  2  a
4  2  b
5  2  c

答案 1 :(得分:0)

product()函数将执行您想要的操作:

pd.DataFrame(list(itertools.product(df1.A,df2.B)),columns=['A','B'])

product()的定义:

def product(*args, repeat=1):
    # product('ABCD', 'xy') --> Ax Ay Bx By Cx Cy Dx Dy
    # product(range(2), repeat=3) --> 000 001 010 011 100 101 110 111
    pools = [tuple(pool) for pool in args] * repeat
    result = [[]]
    for pool in pools:
        result = [x+[y] for x in result for y in pool]
    for prod in result:
        yield tuple(prod)

答案 2 :(得分:0)

您可以使用pd.MultiIndex

(pd.DataFrame(index=pd.MultiIndex.from_product([df1['A'], df2['B']], 
                                              names=['A','B']))
.reset_index())

输出:

    A   B
0   1   a
1   1   b
2   1   c
3   2   a
4   2   b
5   2   c