根据python中的条件将列标题分配给其他数据框

时间:2020-04-15 11:51:24

标签: python pandas

df1:

use role abc;
use warehouse xyz;
use database DEV;
use schema foo;

list @user/table_name/000000_0.csv.part.00.gz (internal stage file uploaded using PUT into DEV database)

df2:

string            country
i live in NY
chicago is best
delhi is xyz

输出:
df1:

usa      india  china  france
NY       Delhi  xyz    paris
Chicago
SF

基本上,如果该df包含在该列中定义的子字符串,则需要将列标题分配给另一个数据帧。
附言df2包含许多列。无法一一列出。

1 个答案:

答案 0 :(得分:0)

首先,我将创建一个字典,该字典可以快速查找给定城市中的国家/地区。然后,您可以浏览每个单词并检查它是否在词典中。

根据字符串的复杂程度,您应该自己检查一些边缘情况,例如包含标点符号或多个城市的字符串。

import pandas as pd

df2 = pd.DataFrame({
    'usa': ['ny', 'chicago'],
    'india': ['delhi', 'mumbay'],
    'china': ['bejing', 'shanghai'],
    'france': ['paris', 'toulouse']
})
city_country = {
    city: country 
    for country, cities in df2.to_dict(orient='list').items() 
    for city in cities
}

def get_country_for_string(string):
    for word in string.split():
        if word.lower() in city_country:
            return city_country[word.lower()]
    return None

df1 = pd.DataFrame()
df1['strings'] = ['i live in NY', 'chicago is best', 'delhi is xyz']
df1['country'] = list(map(get_country_for_string, df1['strings']))
print(df1)

输出

           strings country
0     i live in NY     usa
1  chicago is best     usa
2     delhi is xyz   india