这是我的数据
id keyword
1 transfer
2 atm transfer
3 atm
4 ulta transfer
5 transfer transfer
如果只剩下一个单词,我想删除transfer
个单词,但是如果只剩下transfer
个单词,则该单词仍然保留,另一个单词是atm
,但只有{{ 1}}和atm
左,我们选择transfer
,如果关键字出现多次,请输入一次
这是代码
atm
我的输出
df['keyword_2'] = df['keyword'].mask(df['keyword'] != 'transfer', df['keyword'].str.replace('transfer', '').str.strip())
我的预期输出
id keyword keyword_2
1 transfer transfer
2 atm transfer atm
3 atm atm
4 ulta transfer transfer
5 transfer transfer
希望这个问题很清楚
答案 0 :(得分:2)
您可以尝试Series.apply:
def remove_transfer(x):
l = x.split()
if len(l) == 1:
return x
#l = [el for el in l if el != 'transfer']
l.remove('transfer')
return ' '.join(l)
df['keyword_2'] = df['keyword'].apply(remove_transfer)
id keyword keyword_2
1 transfer transfer
2 atm transfer atm
3 atm atm
4 ulta transfer ulta
5 transfer
答案 1 :(得分:2)
将由空格分隔的值与集合进行比较,并将设置值与numpy.where
进行比较:
mask = df['keyword'].str.split().apply(set) != set(['transfer'])
df['keyword1'] = np.where(mask, df['keyword'].str.replace('transfer', '').str.strip(),
'transfer')
print (df)
id keyword keyword1
0 1 transfer transfer
1 2 atm transfer atm
2 3 atm atm
3 4 ulta transfer ulta
4 5 transfer transfer transfer
答案 2 :(得分:2)
将Series.apply
与lambda函数配合使用:
pat = 'transfer'
df['keyword2'] = df['keyword'].apply(lambda x: x if x == pat else x.replace(pat, '', 1).strip())
答案 3 :(得分:1)
因此,我可以想象您要替换的单词在数据中的作用可能是两倍以上,而不是可以使用以下函数来解决您的情况,然后像下面这样使用import requests
from datetime import datetime
import pandas as pd
def proba():
my_url = requests.get('https://www.telekom.hu/shop/categoryresults/?N=10994&contractType=list_price&instock_products=1&Ns=sku.sortingPrice%7C0%7C%7Cproduct.displayName%7C0&No=0&Nrpp=9&paymentType=FULL')
data = my_url.json()
results = []
products = data['MainContent'][0]['contents'][0]['productList']['products']
for product in products:
name = product['productModel']['displayName']
try:
priceGross = product['priceInfo']['priceItemSale']['gross']
except:
priceGross = product['priceInfo']['priceItemToBase']['gross']
url = product['productModel']['url']
results.append([name, priceGross, url])
df = pd.DataFrame(results, columns = ['Name', 'Price', 'Url'])
return df
headers = ['Name', 'Price', 'Url']
df = pd.DataFrame(columns = headers)
while True:
mytime = datetime.now().strftime("%H:%M:%S")
while mytime < "23:59:59":
print(mytime)
dfCurrent = proba()
mytime=datetime.now().strftime("%H:%M:%S")
df = pd.concat([df, dfCurrent])
df.to_csv(r"C:\Users\User\Desktop\test.csv", encoding='utf-8')
:
想象一下您的数据如下:
.apply
我们看到,索引4包含您的关键字3x。
所以我们需要一个更强大的解决方案,如下所示:
keyword
0 transfer
1 atm transfer
2 atm
3 ulta transfer
4 transfer transfer transfer
输出
# Function to remove a word
def remove_word(x, word):
if x == word:
return x
elif x.count(word) > 2:
return x.replace(word, '', x.count(word)-1)
else:
return x.replace(word, '', 1)
# Apply the function
df['keyword_2'] = df.keyword.apply(lambda x: remove_word(x, 'transfer'))