Question

清理共享点列表，以具有适当表关系的方式上传到mssql。

基本上，两个数据框（数据，配置）都共享一些公共列（国家/地区，业务）。我想做的是在datadf中插入一个新列，其中对于每行，它都基于列country和business的值包含configdf中匹配行的索引。

数据帧数据：

-----|---------|----------|-----
 ... | Country | Business | ...
-----|---------|----------|-----
     |    A    |     1    |
-----|---------|----------|-----
     |    A    |     1    |
-----|---------|----------|-----
     |    A    |     2    |
-----|---------|----------|-----
     |    A    |     2    |
-----|---------|----------|-----
     |    B    |     1    |
-----|---------|----------|-----
     |    B    |     1    |
-----|---------|----------|-----
     |    B    |     2    |
-----|---------|----------|-----
     |    C    |     1    |
-----|---------|----------|-----
     |    C    |     2    |
-----|---------|----------|-----

数据帧配置（ID =索引）：

----|---------|----------|-----
 ID | Country | Business | ...
----|---------|----------|-----
  1 |    A    |     1    |
----|---------|----------|-----
  2 |    A    |     2    |
----|---------|----------|-----
  3 |    B    |     1    |
----|---------|----------|-----
  4 |    B    |     2    |
----|---------|----------|-----
  5 |    C    |     1    |
----|---------|----------|-----
  6 |    C    |     2    |
----|---------|----------|-----

我要添加到数据框数据中的内容：

-----|---------|----------|-----------|-----
 ... | Country | Business | config_ID | ... 
-----|---------|----------|-----------|-----
     |    A    |     1    |     1     |
-----|---------|----------|-----------|-----
     |    A    |     1    |     1     |
-----|---------|----------|-----------|-----
     |    A    |     2    |     2     |
-----|---------|----------|-----------|-----
     |    A    |     2    |     2     |
-----|---------|----------|-----------|-----
     |    B    |     1    |     3     |
-----|---------|----------|-----------|-----
     |    B    |     1    |     3     |
-----|---------|----------|-----------|-----
     |    B    |     2    |     4     |
-----|---------|----------|-----------|-----
     |    C    |     1    |     5     |
-----|---------|----------|-----------|-----
     |    C    |     2    |     6     |
-----|---------|----------|-----------|-----

----发现有用的东西----

datadf['config_ID'] =  datadf.apply(lambda x: configdf[(configdf.country == x.country) & (configdf.business_unit == x.business_unit)].index[0], axis=1)

它可以完成工作，尽管我愿意接受其他建议，特别是如果它可以与df.insert（）一起使用

Answer 1

您可以使用numpy.where函数来匹配数据帧

例如：

datadf = pd.DataFrame([['USA','Business1'],['AUS','Business2'],['UK','Business3'],['IND','Business4']],
                          columns=['country','business'])
configdf = pd.DataFrame([['AUS','Business2'],['IND','Business4'],['USA','Business1'],['UK','Business3']],
                          columns=['country','business'])

datadf['new_col'] = datadf.apply(lambda x: (np.where(x == configdf)[0][0]),axis=1)
print(datadf)

输出：

  country   business  new_col
0     USA  Business1        2
1     AUS  Business2        0
2      UK  Business3        3
3     IND  Business4        1

EDIT1：

在这种情况下，您可以使用

datadf['new_col'] = datadf.apply(lambda x: (np.where((x['country'] == configdf['country']) & (x['business'] == configdf['business']))[0][0]),axis=1)

基于示例数据帧datadf和configdf的输出：

  country business  new_col
0       A        1        0
1       A        1        0
2       A        2        1
3       A        2        1
4       B        1        2
5       B        1        2
6       B        2        3
7       C        1        4
8       C        2        5

Answer 2

这是使用熊猫合并的解决方案。

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html#pandas.DataFrame.merge

import pandas as pd

# make the two dataframes
data = pd.DataFrame({'Country':['A','A','A','A','B','B','B','C','C'],
                     'Business':[1,1,2,2,1,1,2,1,2]})

configdf = pd.DataFrame({'Country':['A','A','B','B','C','C'],
                         'Business':[1,2,1,2,1,2]})

# make a column with the index values
configdf.reset_index(inplace=True)

# merge the two dataframes based on the selected columns.
newdf = data.merge(configdf, on=['Country', 'Business'])

大熊猫：添加具有与其他数据帧相匹配的行的索引的列

2 个答案: