我正在尝试创建一个函数,该函数根据另一个数据框的特定列中出现的不同单词列表来创建一个数据框。
在我的示例中,我希望在“未分类的”数据框的“描述”列中出现基于单词“ chandos”和“电子”创建的数据框。
该函数的要点是,我希望能够在不同的单词列表上运行此函数,因此我最终得到了仅包含所需单词的不同数据框。
words_Telephone = ["tfl", "electronics"]
df_Telephone = pd.DataFrame(columns=['date','description','paid out'])
def categorise(word_list, df_name):
""" takes the denoted terms from the "uncategorised" df and puts it into new df"""
for word in word_list:
df_name = uncategorised[uncategorised['description'].str.contains(word)]
return(df_name)
#apply the function
categorise(words_Telephone, df_Telephone)
我期望一个包含以下内容的数据框:
d = {'date': {0: '05/04/2017',
1: '06/04/2017',
2:“ 08/04/2017”, 3:'08 / 04/2017', 4:“ 08/04/2017”, 5:“ 10/04/2017”, 6:“ 10/04/2017”, 7:“ 10/04/2017”}, '说明':{0:'tfl', 1:“ tfl”, 2:“ tfl”, 3:'tfl', 4:“交流电子产品”, 5:“交流电子产品”,}, 'index':{0:1,1,2:2,2:3,3:4,4:5,5:6,6:7,7:8,8:9,9:10}, “已付款”:{0:3.0, 1:4.3 2:6.1, 3:1.5 4:16.39, 5:20.4,}}
可再现的df:
d = {'date': {0: '05/04/2017',
1: '06/04/2017',
2: '06/04/2017',
3: '08/04/2017',
4: '08/04/2017',
5: '08/04/2017',
6: '10/04/2017',
7: '10/04/2017',
8: '10/04/2017'},
'description': {0: 'tfl',
1: 'mu subscription',
2: 'tfl',
3: 'tfl',
4: 'tfl',
5: 'ac electronics ',
6: 'itunes',
7: 'ac electronics ',
8: 'google adwords'},
'index': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10},
'paid out': {0: 3.0,
1: 16.9,
2: 4.3,
3: 6.1,
4: 1.5,
5: 16.39,
6: 12.99,
7: 20.4,
8: 39.68}}
解决方案:
def categorise(word_list):
""" takes the denoted terms from the "uncategorised" df and puts it into new df then deletes from the uncategorised df"""
global uncategorised
new_dfs = []
for word in word_list:
new_dfs.append(uncategorised[uncategorised['description'].str.contains(word)])
uncategorised= uncategorised[ ~uncategorised['description'].str.contains(word)]
return (uncategorised)
return (pd.concat(new_dfs).reset_index())
#apply the function
df_Telephone = categorise(words_Telephone)
df_Telephone
答案 0 :(得分:1)
words_Telephone = ["tfl", "electronics"]
original_df = pd.DataFrame().from_dict({'date': ['05/04/2017','06/04/2017','06/04/2017','08/04/2017','08/04/2017','08/04/2017','10/04/2017','10/04/2017','10/04/2017'], 'description': ['tfl','mu subscription','tfl','tfl','tfl','ac electronics','itunes','ac electronics','google adwords'], 'paid out' :[ 3.0,16.9, 4.3,6.1,1.5,16.39,12.99,20.4,39.68]})
def categorise(word_list, original_df):
""" takes the denoted terms from the "uncategorised" df and puts it into new df"""
new_dfs = []
for word in word_list:
new_dfs.append(original_df[original_df['description'].str.contains(word)])
return pd.concat(new_dfs).reset_index()
#apply the function
df_Telephone = categorise(words_Telephone, original_df)
print(df_Telephone)
date description paid out
0 05/04/2017 tfl 3.00
1 06/04/2017 tfl 4.30
2 08/04/2017 tfl 6.10
3 08/04/2017 tfl 1.50
4 08/04/2017 ac electronics 16.39
5 10/04/2017 ac electronics 20.40