我有一个数据框,其中包含一些要累加的项目列表(如果它们与另一个列表部分匹配)。我想做的是以下事情:
我要在Pandas中复制的excel中有一个实现。
=sumif(b7:b32,"*"&b3&"*",c7:32)
https://www.webanalyticsworld.net/2014/07/using-ngrams-to-analyse-keyword-performance.html
import pandas as pd
import numpy as np
import re
sqr = pd.read_csv('Search terms report (7).csv', encoding='utf-8', skiprows=2, skipfooter=2, engine='python')
search_terms = sqr['Search term']
required_ngrams = 2
ngram_holder = []
ngrams = []
def generate_ngrams(s, n):
s = s.lower()
s = re.sub(r'[^a-zA-Z0-9\s]', ' ', s)
tokens = [token for token in s.split(" ") if token != ""]
ngrams = zip(*[tokens[i:] for i in range(n)])
return [" ".join(ngram) for ngram in ngrams]
for x in search_terms:
if len(x) > 0:
ngram_holder.append(generate_ngrams(x, required_ngrams))
for x in ngram_holder:
if x:
for y in x:
ngrams.append(y)
df = pd.DataFrame(ngrams)
df = df.drop_duplicates()
df.columns = ['Search term ngram']
sqr = sqr[['Search term','Cost','Clicks','orders','Online Sales']]
sqr
此代码生成2个列表。一个包含关键字列表,另一个包含搜索词以及数据。
我想做的是在第一个列表上加和,并提取与list2中的关键字部分匹配的所有项目。