Question

我有一个日期框架1

          Place     
  0       New York
  1       Los Angeles 1 
  2       Los Angeles- 2 
  3       Dallas -1
  4       Dallas - 2
  5       Dallas3

数据框2

Place          target    value1     value2
New York        1000       a          b
Los Angeles     1500       c          d
Dallas 1        2000       e          f

所需数据框

Place          target       value1     value2
New York        1000           a           b
Los Angeles 1   750            c           d
Los Angeles- 2  750            c           d
Dallas -1       666.6          e           f
Dallas - 2      666.6          e           f
Dallas3         666.6          e           f

说明：我们必须在“位置”上合并dataframe1和dateframe2。 dataframe1中有1个纽约，2个洛杉矶，3个达拉斯，但dateframe2中只有一个。因此，我们根据df1中的位置计数（仅名称，而不是数字）划分目标，并将value1和value2分配给相应的位置。

是否可以使用正则表达式考虑所有拼写检查，空格，特殊字符并获得所需的数据框？

Answer 1

这是确切的解决方案：

def extract_city(col):
    return col.str.extract('([a-zA-Z]+(?:\s+[a-zA-Z]+)*)')[0]

df = pd.merge(df1, df2, left_on=extract_city(df1['Place']), right_on=extract_city(df2['Place']))

df = df.drop(['key_0', 'Place_y'], axis=1).rename({'Place_x' : 'Place'}, axis=1)

df['Target'] /= df.groupby(extract_city(df['Place']))['Place'].transform('count')

df

Answer 2

执行此操作的另一种方法如下：

import pandas as pd
df1 = pd.DataFrame({'Place':['New York','Los Angeles 1','Los Angeles- 2','Dallas -1','Dallas - 2','Dallas3']})

print (df1)

#create a column to compare both dataframes. Remove numeric, - and space values
df1['Place_compare'] = df1.Place.str.replace('\d+|-| ', '')


df2 = pd.DataFrame({'Place':['New York','Los Angeles','Dallas 1'],
                    'target':[1000,1500,2000],
                    'value1':['a','c','e'],
                    'value2':['b','d','f']})

print (df2)

#create a column to compare both dataframes. Remove numeric, - and space values
df2['Place_compare'] = df2.Place.str.replace('\d+|-| ', '')

#count number of times the unique values of Place occurs in df1. assign to df2
df2['counts'] = df2['Place_compare'].map(df1['Place_compare'].value_counts())

#calculate new target based on number of occurrences of Place in df1
df2['new_target'] = (df2['target'] / df2['counts']).round(2)

#repeat the nows by the number of times it appears in counts
df2 = df2.reindex(df2.index.repeat(df2['counts']))

#drop temp columns
df2.drop(['counts','Place_compare','target'], axis=1, inplace=True)

#rename new_target as target
df2 = df2.rename({'new_target': 'target'}, axis=1)
print (df2)

其输出将是：

Dataframe1：

            Place
0        New York
1   Los Angeles 1
2  Los Angeles- 2
3       Dallas -1
4      Dallas - 2
5         Dallas3

Dataframe2：

         Place  target value1 value2
0     New York    1000      a      b
1  Los Angeles    1500      c      d
2     Dallas 1    2000      e      f

使用重复值更新的DataFrame：

         Place value1 value2  target
0     New York      a      b  1000.00
1  Los Angeles      c      d   750.00
1  Los Angeles      c      d   750.00
2     Dallas 1      e      f   666.67
2     Dallas 1      e      f   666.67
2     Dallas 1      e      f   666.67

在Python中的名称之间分割值

2 个答案: