Question

我想遍历数据帧的每一行，如果列与列表中的字符串之间存在匹配，我会在新列中添加一个元素。在此示例中，我想添加一个新列以对产品进行分类..因此，如果该列的某行与列表之一匹配，则该类别可以是“饮料”或“食品”，如果不匹配，则该类别将是其他。


ui <- material_page(
  useShinyjs(),
  title = NULL,

  # Define tabs
  material_tabs(
    tabs = c(
      "First Tab" = "first_tab",
      "Second Tab" = "second_tab"
    )
  ),
  # Define tab content
  material_tab_content(
    tab_id = "first_tab",
    tags$h1("First Tab Content")
  ),
  material_tab_content(
    tab_id = "second_tab",
    tags$h1("Second Tab Content")
  )
)

server <- function(input, output, session) {
  
  #Below does not work
  # observe({
  #   if(session$sendCustomMessage(type = "shinymaterialJS", "$('li.tab a.active[href$=\"#second_tab\"]')")){
  #     print("Hello")
  #   }
  #   
  # })
}
shinyApp(ui = ui, server = server)

输出为：

list_drinks={'Water','Juice','Tea'}
list_food={'Apple','Orange'}
data = {'Price':  ['1', '5','3'], 'Product': ['Juice','book', Pen]}
for (i,j) in itertools.zip_longest(list_drinks,list_food):
    for index in data.index: 
        if(j in data.loc[index,'product']):
            data["Category"] = "Food"
        elif(i in data.loc[index,'product']):
            data["Category"] ="drinks"
        else:
            data["Category"]="Other"

我的问题主要是我不知道如何匹配列表和行之间的模式。我也尝试过： Price Product Category 1 Juice drinks 5 book Other 3 Pen Other，但无效。

Answer 1

无需循环。您可以将.isin()与np.select()结合使用，以根据条件返回结果。参见以下代码：

import pandas as pd
import numpy as np
list_drinks=['Water','Juice','Tea']
list_food=['Apple','Orange']
data = {'Price':  ['1', '5','3'],
    'Product': ['Juice','book','Pen']}
df = pd.DataFrame(data)
df['Category'] = np.select([(df['Product'].isin(list_drinks)),
               (df['Product'].isin(list_food))],
              ['drinks',
              'food'], 'Other')
df
Out[1]: 
  Price Product Category
0     1   Juice   drinks
1     5    book    Other
2     3     Pen    Other

下面，我将代码分解为更多细节，以便您了解其工作原理。根据您的评论，我也做了些微改动。我使用列表推导和in检查列表中的值是否在数据框的值的子字符串中。为了提高匹配率，我还将.lower()的所有小写字母进行了比较：

import pandas as pd
import numpy as np
list_drinks=['Water','Juice','Tea']
list_food=['Apple','Orange']
data = {'Price':  ['1', '5','3'],
    'Product': ['green Juice','book','oRange you gonna say banana']}
df = pd.DataFrame(data)
c1 = (df['Product'].apply(lambda x: len([y for y in list_drinks if y.lower() in x.lower()]) > 0))
c2 = (df['Product'].apply(lambda x: len([y for y in list_food if y.lower() in x.lower()]) > 0))
r1 = 'drinks'
r2 = 'food'

conditions = [c1,c2]
results= [r1,r2]

df['Category'] = np.select(conditions, results, 'Other')
df
Out[1]: 
  Price                      Product Category
0     1                  green Juice   drinks
1     5                         book    Other
2     3  oRange you gonna say banana     food

Answer 2

这是另一种选择-

import itertools
import pandas as pd

list_drinks={'Water','Juice','Tea'}
list_food={'Apple','Orange'}
data = pd.DataFrame({'Price':  ['1', '5','3'], 'Product': ['Juice','book', 'Pen']})
category = list()
for prod in data['Product']: 
    if prod in list_food:
        category.append("Food")
    elif prod in list_drinks:
        category.append("drinks")
    else:
        category.append("Other")
data['Category']= category
print(data)

输出-

Price  Product Category
 1      Juice    drinks
 5      book     Other
 3      Pen      Other

循环遍历数据帧的每一行，并根据条件将元素添加到数据帧

2 个答案: