我有两个数据帧:
[in] print(testing_df.head(n=5))
print(product_combos1.head(n=5))
[out]
product_id length
transaction_id
001 (P01,) 1
002 (P01, P02) 2
003 (P01, P02, P09) 3
004 (P01, P03) 2
005 (P01, P03, P05) 3
product_id count length
0 (P06, P09) 36340 2
1 (P01, P05, P06, P09) 10085 4
2 (P01, P06) 36337 2
3 (P01, P09) 49897 2
4 (P02, P09) 11573 2
我想返回频率最高product_combos
的{{1}}行,其中包含len(testing_df + 1)
个字符串。所以例如,transaction_id 001我想返回testing_df
(虽然只有P09)。
对于第一部分(仅根据长度进行比较)我试过:
product_combos[3]
但是,这会返回错误:
# Return the product combos values that are of the appropriate length and the strings match
for i in testing_df['length']:
for k in product_combos1['length']:
if (i)+1 == (k):
matches = list(k)
答案 0 :(得分:0)
您无法像这样的非迭代创建列表。尝试将matches = list(k)
替换为matches = [k]
。
这些括号也是多余的 - 您可以将if (i)+1 == (k):
替换为if i + 1 == k:
。
答案 1 :(得分:0)
只需使用.append()方法即可。我还建议将“匹配”设置为顶部的空列表,以便在重新运行单元格时不会出现重复。
# Setup
testing_df = pd.DataFrame(columns = ['product_id','length'])
testing_df.product_id = [('P01',),('P01', 'P02')]
testing_df.length = [1,2]
product_combos1 = pd.DataFrame(columns = ['product_id','count','length'])
product_combos1.length = [3,1]
product_combos1.product_id = [('P01',),('P01', 'P02')]
product_combos1.count = [100,5000]
# Matching
matches = []
for i in testing_df['length']:
for k in product_combos1['length']:
if i+1 == k:
matches.append(k)
让我知道这是否有效,或者是否还有其他内容!祝你好运!