仅当它是附加关键字而不删除关键字重复项时,如何删除某些单词

时间:2019-02-25 10:30:03

标签: pandas dataframe

这是我的数据

id  keyword
1   transfer
2   atm transfer
3   atm
4   ulta transfer
5   transfer transfer

如果只剩下一个单词,我想删除transfer个单词,但是如果只剩下transfer个单词,则该单词仍然保留,另一个单词是atm,但只有{{ 1}}和atm左,我们选择transfer,如果关键字出现多次,请输入一次

这是代码

atm

我的输出

df['keyword_2'] = df['keyword'].mask(df['keyword'] != 'transfer', df['keyword'].str.replace('transfer', '').str.strip())

我的预期输出

id  keyword            keyword_2
1   transfer           transfer
2   atm transfer       atm
3   atm                atm
4   ulta transfer      transfer
5   transfer transfer  

希望这个问题很清楚

4 个答案:

答案 0 :(得分:2)

您可以尝试Series.apply

def remove_transfer(x):
    l = x.split()
    if len(l) == 1:
        return x
    #l = [el for el in l if el != 'transfer']
    l.remove('transfer')
    return ' '.join(l)

df['keyword_2'] = df['keyword'].apply(remove_transfer)

id  keyword       keyword_2
1   transfer      transfer
2   atm transfer  atm
3   atm           atm
4   ulta transfer ulta
5   transfer

答案 1 :(得分:2)

将由空格分隔的值与集合进行比较,并将设置值与numpy.where进行比较:

mask = df['keyword'].str.split().apply(set) != set(['transfer'])
df['keyword1'] = np.where(mask, df['keyword'].str.replace('transfer', '').str.strip(), 
                                'transfer')
print (df)
   id            keyword  keyword1
0   1           transfer  transfer
1   2       atm transfer       atm
2   3                atm       atm
3   4      ulta transfer      ulta
4   5  transfer transfer  transfer

答案 2 :(得分:2)

Series.apply与lambda函数配合使用:

pat = 'transfer'
df['keyword2'] = df['keyword'].apply(lambda x: x if x == pat else x.replace(pat, '', 1).strip())

答案 3 :(得分:1)

因此,我可以想象您要替换的单词在数据中的作用可能是两倍以上,而不是可以使用以下函数来解决您的情况,然后像下面这样使用import requests from datetime import datetime import pandas as pd def proba(): my_url = requests.get('https://www.telekom.hu/shop/categoryresults/?N=10994&contractType=list_price&instock_products=1&Ns=sku.sortingPrice%7C0%7C%7Cproduct.displayName%7C0&No=0&Nrpp=9&paymentType=FULL') data = my_url.json() results = [] products = data['MainContent'][0]['contents'][0]['productList']['products'] for product in products: name = product['productModel']['displayName'] try: priceGross = product['priceInfo']['priceItemSale']['gross'] except: priceGross = product['priceInfo']['priceItemToBase']['gross'] url = product['productModel']['url'] results.append([name, priceGross, url]) df = pd.DataFrame(results, columns = ['Name', 'Price', 'Url']) return df headers = ['Name', 'Price', 'Url'] df = pd.DataFrame(columns = headers) while True: mytime = datetime.now().strftime("%H:%M:%S") while mytime < "23:59:59": print(mytime) dfCurrent = proba() mytime=datetime.now().strftime("%H:%M:%S") df = pd.concat([df, dfCurrent]) df.to_csv(r"C:\Users\User\Desktop\test.csv", encoding='utf-8')

想象一下您的数据如下:

.apply

我们看到,索引4包含您的关键字3x。

所以我们需要一个更强大的解决方案,如下所示:

    keyword
0   transfer
1   atm transfer
2   atm
3   ulta transfer
4   transfer transfer transfer

输出

# Function to remove a word
def remove_word(x, word):
    if x == word:
        return x
    elif x.count(word) > 2:
        return x.replace(word, '', x.count(word)-1)
    else: 
        return x.replace(word, '', 1)

# Apply the function
df['keyword_2'] = df.keyword.apply(lambda x: remove_word(x, 'transfer'))