df1:
use role abc;
use warehouse xyz;
use database DEV;
use schema foo;
list @user/table_name/000000_0.csv.part.00.gz (internal stage file uploaded using PUT into DEV database)
df2:
string country
i live in NY
chicago is best
delhi is xyz
输出:
df1:
usa india china france
NY Delhi xyz paris
Chicago
SF
基本上,如果该df包含在该列中定义的子字符串,则需要将列标题分配给另一个数据帧。
附言df2包含许多列。无法一一列出。
答案 0 :(得分:0)
首先,我将创建一个字典,该字典可以快速查找给定城市中的国家/地区。然后,您可以浏览每个单词并检查它是否在词典中。
根据字符串的复杂程度,您应该自己检查一些边缘情况,例如包含标点符号或多个城市的字符串。
import pandas as pd
df2 = pd.DataFrame({
'usa': ['ny', 'chicago'],
'india': ['delhi', 'mumbay'],
'china': ['bejing', 'shanghai'],
'france': ['paris', 'toulouse']
})
city_country = {
city: country
for country, cities in df2.to_dict(orient='list').items()
for city in cities
}
def get_country_for_string(string):
for word in string.split():
if word.lower() in city_country:
return city_country[word.lower()]
return None
df1 = pd.DataFrame()
df1['strings'] = ['i live in NY', 'chicago is best', 'delhi is xyz']
df1['country'] = list(map(get_country_for_string, df1['strings']))
print(df1)
输出
strings country
0 i live in NY usa
1 chicago is best usa
2 delhi is xyz india