Question

我有两个数据帧：

[in] print(testing_df.head(n=5))
print(product_combos1.head(n=5))

[out]
                     product_id  length
transaction_id                         
001                      (P01,)       1
002                  (P01, P02)       2
003             (P01, P02, P09)       3
004                  (P01, P03)       2
005             (P01, P03, P05)       3

             product_id  count  length
0            (P06, P09)  36340       2
1  (P01, P05, P06, P09)  10085       4
2            (P01, P06)  36337       2
3            (P01, P09)  49897       2
4            (P02, P09)  11573       2

我想返回频率最高product_combos的{{1}}行，其中包含len(testing_df + 1)个字符串。所以例如，transaction_id 001我想返回testing_df（虽然只有P09）。

对于第一部分（仅根据长度进行比较）我试过：

product_combos[3]

但是，这会返回错误：

# Return the product combos values that are of the appropriate length and the strings match
for i in testing_df['length']:
    for k in product_combos1['length']:
        if (i)+1 == (k):
            matches = list(k)

Answer 1

您无法像这样的非迭代创建列表。尝试将matches = list(k)替换为matches = [k]。这些括号也是多余的 - 您可以将if (i)+1 == (k):替换为if i + 1 == k:。

Answer 2

只需使用.append（）方法即可。我还建议将“匹配”设置为顶部的空列表，以便在重新运行单元格时不会出现重复。

# Setup

testing_df = pd.DataFrame(columns = ['product_id','length'])
testing_df.product_id = [('P01',),('P01', 'P02')]
testing_df.length = [1,2]
product_combos1 = pd.DataFrame(columns = ['product_id','count','length'])
product_combos1.length = [3,1]
product_combos1.product_id = [('P01',),('P01', 'P02')]
product_combos1.count = [100,5000]

# Matching

matches = []
for i in testing_df['length']:
    for k in product_combos1['length']:
        if i+1 == k:
            matches.append(k)

让我知道这是否有效，或者是否还有其他内容！祝你好运！

Pandas根据条件比较数据帧的行和返回集

2 个答案: