在pandas dataframe列和单元格中查找字符串

时间:2018-02-21 21:30:11

标签: python pandas search text

我有一个如下数据框,我想查找Jan列中URL列的值和列URL的相应单元格中出现的次数。

我想创建3列 - found in cellfound in column以及distinct finds 例如,当我们从列try的第一个单元格中搜索值Jan时,它应该在found in cell中返回1,在and 2 in列中找到2不同的查找because the word was found in 2 rows when we search for value为什么from the second cell of the column Jan , it should return 0 in在单元and 2 in 'found in column中找到,2 distinct finds中找到{1}}

我知道如何在字符串中搜索。但是我怎么能在一个单元格内和一个列内搜索呢?

s="ea2017-104.pdf bb cc for why"
s.lower().count("why")#to find text within string

sales = [{'account': '3', 'Jan': 'try', 'Feb': '200 .jones', 'URL': 'ea2018-001.pdf try bbbbb why try'},
             {'account': '1',  'Jan': 'why', 'Feb': '210', 'URL': 'try '},
             {'account': '2',  'Jan': 'bbbbb',  'Feb': '90',  'URL': 'ea2017-104.pdf bb cc for why' }]
df = pd.DataFrame(sales)
df

df['column_find']=df['URL'].str.lower().count('why')

最终输出 将有3个附加列,如下所示

found_inCell    found_in_column           distinct_finds
2                3                   2
0                2                   2
0                1                   1

更新

当我尝试在空/ np.nan

中的一个单元格中运行代码时出错
sales = [{'account': '3', 'Jan': np.nan, 'Feb': '200 .jones', 'URL': 'ea2018-001.pdf try bbbbb why try'},
             {'account': '1',  'Jan': 'try', 'Feb': '210', 'URL': 'try '},
             {'account': '2',  'Jan': 'bbbbb',  'Feb': '90',  'URL': 'ea2017-104.pdf bb cc for why' }]
df = pd.DataFrame(sales)
df

df['found_inCell'] = df.apply(lambda row: row['URL'].count(row['Jan']), axis=1)
df['found_in_column'] = df['Jan'].apply(lambda x: ''.join(df['URL'].tolist()).count(x))
df['distinct_finds'] = df['Jan'].apply(lambda x: sum(df['URL'].str.contains(x)))

1 个答案:

答案 0 :(得分:2)

这是一种方式。

df['found_inCell'] = df.apply(lambda row: row['URL'].count(row['Jan']), axis=1)
df['found_in_column'] = df['Jan'].apply(lambda x: ''.join(df['URL'].tolist()).count(x))
df['distinct_finds'] = df['Jan'].apply(lambda x: sum(df['URL'].str.contains(x)))

#           Feb    Jan                           URL account  found_inCell  \
# 0  200 .jones    try  ea2018-001.pdf try bbbbb why       3             1   
# 1         210    why                          try        1             0   
# 2          90  bbbbb  ea2017-104.pdf bb cc for why       2             0   

#    found_in_column  distinct_finds  
# 0                2               2  
# 1                2               2  
# 2                1               1