Pandas根据条件比较数据帧的行和返回集

时间:2017-08-05 16:30:40

标签: python pandas dataframe

我有两个数据帧:

[in] print(testing_df.head(n=5))
print(product_combos1.head(n=5))

[out]
                     product_id  length
transaction_id                         
001                      (P01,)       1
002                  (P01, P02)       2
003             (P01, P02, P09)       3
004                  (P01, P03)       2
005             (P01, P03, P05)       3

             product_id  count  length
0            (P06, P09)  36340       2
1  (P01, P05, P06, P09)  10085       4
2            (P01, P06)  36337       2
3            (P01, P09)  49897       2
4            (P02, P09)  11573       2

我想返回频率最高product_combos的{​​{1}}行,其中包含len(testing_df + 1)个字符串。所以例如,transaction_id 001我想返回testing_df(虽然只有P09)。

对于第一部分(仅根据长度进行比较)我试过:

product_combos[3]

但是,这会返回错误:

# Return the product combos values that are of the appropriate length and the strings match
for i in testing_df['length']:
    for k in product_combos1['length']:
        if (i)+1 == (k):
            matches = list(k) 

2 个答案:

答案 0 :(得分:0)

您无法像这样的非迭代创建列表。尝试将matches = list(k)替换为matches = [k]。 这些括号也是多余的 - 您可以将if (i)+1 == (k):替换为if i + 1 == k:

答案 1 :(得分:0)

只需使用.append()方法即可。我还建议将“匹配”设置为顶部的空列表,以便在重新运行单元格时不会出现重复。

# Setup

testing_df = pd.DataFrame(columns = ['product_id','length'])
testing_df.product_id = [('P01',),('P01', 'P02')]
testing_df.length = [1,2]
product_combos1 = pd.DataFrame(columns = ['product_id','count','length'])
product_combos1.length = [3,1]
product_combos1.product_id = [('P01',),('P01', 'P02')]
product_combos1.count = [100,5000]

# Matching

matches = []
for i in testing_df['length']:
    for k in product_combos1['length']:
        if i+1 == k:
            matches.append(k)

让我知道这是否有效,或者是否还有其他内容!祝你好运!