我有一个如下数据框,我想查找Jan
列中URL
列的值和列URL
的相应单元格中出现的次数。
我想创建3列 - found in cell
和found in column
以及distinct finds
例如,当我们从列try
的第一个单元格中搜索值Jan
时,它应该在found in cell
中返回1,在and 2 in
列中找到2不同的查找because the word was found in 2 rows
when we search for value
为什么from the second cell of the column
Jan , it should return 0 in
在单元and 2 in 'found in column
中找到,2 distinct finds
中找到{1}}
我知道如何在字符串中搜索。但是我怎么能在一个单元格内和一个列内搜索呢?
s="ea2017-104.pdf bb cc for why"
s.lower().count("why")#to find text within string
sales = [{'account': '3', 'Jan': 'try', 'Feb': '200 .jones', 'URL': 'ea2018-001.pdf try bbbbb why try'},
{'account': '1', 'Jan': 'why', 'Feb': '210', 'URL': 'try '},
{'account': '2', 'Jan': 'bbbbb', 'Feb': '90', 'URL': 'ea2017-104.pdf bb cc for why' }]
df = pd.DataFrame(sales)
df
df['column_find']=df['URL'].str.lower().count('why')
最终输出 将有3个附加列,如下所示
found_inCell found_in_column distinct_finds
2 3 2
0 2 2
0 1 1
当我尝试在空/ np.nan
中的一个单元格中运行代码时出错sales = [{'account': '3', 'Jan': np.nan, 'Feb': '200 .jones', 'URL': 'ea2018-001.pdf try bbbbb why try'},
{'account': '1', 'Jan': 'try', 'Feb': '210', 'URL': 'try '},
{'account': '2', 'Jan': 'bbbbb', 'Feb': '90', 'URL': 'ea2017-104.pdf bb cc for why' }]
df = pd.DataFrame(sales)
df
df['found_inCell'] = df.apply(lambda row: row['URL'].count(row['Jan']), axis=1)
df['found_in_column'] = df['Jan'].apply(lambda x: ''.join(df['URL'].tolist()).count(x))
df['distinct_finds'] = df['Jan'].apply(lambda x: sum(df['URL'].str.contains(x)))
答案 0 :(得分:2)
这是一种方式。
df['found_inCell'] = df.apply(lambda row: row['URL'].count(row['Jan']), axis=1)
df['found_in_column'] = df['Jan'].apply(lambda x: ''.join(df['URL'].tolist()).count(x))
df['distinct_finds'] = df['Jan'].apply(lambda x: sum(df['URL'].str.contains(x)))
# Feb Jan URL account found_inCell \
# 0 200 .jones try ea2018-001.pdf try bbbbb why 3 1
# 1 210 why try 1 0
# 2 90 bbbbb ea2017-104.pdf bb cc for why 2 0
# found_in_column distinct_finds
# 0 2 2
# 1 2 2
# 2 1 1