我有一个日期框架1
Place
0 New York
1 Los Angeles 1
2 Los Angeles- 2
3 Dallas -1
4 Dallas - 2
5 Dallas3
数据框2
Place target value1 value2
New York 1000 a b
Los Angeles 1500 c d
Dallas 1 2000 e f
所需数据框
Place target value1 value2
New York 1000 a b
Los Angeles 1 750 c d
Los Angeles- 2 750 c d
Dallas -1 666.6 e f
Dallas - 2 666.6 e f
Dallas3 666.6 e f
说明:我们必须在“位置”上合并dataframe1和dateframe2。 dataframe1中有1个纽约,2个洛杉矶,3个达拉斯,但dateframe2中只有一个。因此,我们根据df1中的位置计数(仅名称,而不是数字)划分目标,并将value1和value2分配给相应的位置。
是否可以使用正则表达式考虑所有拼写检查,空格,特殊字符并获得所需的数据框?
答案 0 :(得分:0)
这是确切的解决方案:
def extract_city(col):
return col.str.extract('([a-zA-Z]+(?:\s+[a-zA-Z]+)*)')[0]
df = pd.merge(df1, df2, left_on=extract_city(df1['Place']), right_on=extract_city(df2['Place']))
df = df.drop(['key_0', 'Place_y'], axis=1).rename({'Place_x' : 'Place'}, axis=1)
df['Target'] /= df.groupby(extract_city(df['Place']))['Place'].transform('count')
df
答案 1 :(得分:0)
执行此操作的另一种方法如下:
import pandas as pd
df1 = pd.DataFrame({'Place':['New York','Los Angeles 1','Los Angeles- 2','Dallas -1','Dallas - 2','Dallas3']})
print (df1)
#create a column to compare both dataframes. Remove numeric, - and space values
df1['Place_compare'] = df1.Place.str.replace('\d+|-| ', '')
df2 = pd.DataFrame({'Place':['New York','Los Angeles','Dallas 1'],
'target':[1000,1500,2000],
'value1':['a','c','e'],
'value2':['b','d','f']})
print (df2)
#create a column to compare both dataframes. Remove numeric, - and space values
df2['Place_compare'] = df2.Place.str.replace('\d+|-| ', '')
#count number of times the unique values of Place occurs in df1. assign to df2
df2['counts'] = df2['Place_compare'].map(df1['Place_compare'].value_counts())
#calculate new target based on number of occurrences of Place in df1
df2['new_target'] = (df2['target'] / df2['counts']).round(2)
#repeat the nows by the number of times it appears in counts
df2 = df2.reindex(df2.index.repeat(df2['counts']))
#drop temp columns
df2.drop(['counts','Place_compare','target'], axis=1, inplace=True)
#rename new_target as target
df2 = df2.rename({'new_target': 'target'}, axis=1)
print (df2)
其输出将是:
Dataframe1:
Place
0 New York
1 Los Angeles 1
2 Los Angeles- 2
3 Dallas -1
4 Dallas - 2
5 Dallas3
Dataframe2:
Place target value1 value2
0 New York 1000 a b
1 Los Angeles 1500 c d
2 Dallas 1 2000 e f
使用重复值更新的DataFrame:
Place value1 value2 target
0 New York a b 1000.00
1 Los Angeles c d 750.00
1 Los Angeles c d 750.00
2 Dallas 1 e f 666.67
2 Dallas 1 e f 666.67
2 Dallas 1 e f 666.67