Question

我知道这个问题可能没什么意义，但希望下面的例子可以澄清它。我需要在sentA列中引用一个字符串，然后将其与sentB中的所有字符串进行比较。以下示例显示了我定义为questions的数据框。

sentA     sentB
str1      str1
str2      str2
          str3

我目前使用的代码只能比较偶数列，如下所示：

def compare(row):
    sentA = row[0]
    return pd.Series([simalarity_funct(sentA, sentB) for sentB in questions['sentB']])

results = questions.apply(compare, axis=1).T

该代码为str1A提供了3个输出（与str1B，str2B和str3B相似）并将它们放在一列中。

这是另一个基于输入df数字的简化代码示例：

num1    num2 
   3       5    
   4       6
           7

def multiply(num1, num2):
    return num1*num2

def compare(row):
    num1 = row[0]
 # I would like to prevent this next statement from passing an "NaN" to the 
 # multiply function. The empty cells will always be at the end of the column.
    return pd.Series([multiply(num1, num2) for num2 in numbers['num2']])

results = numbers.apply(compare, axis=1).T
print(results)
15     20     NaN
18     24     NaN
21     28     NaN

潜在的问题是，如果输入错误数据，我的相似性函数将抛出错误。我能想到解决这个问题的最简单方法就是不要输入不好的数据。有没有办法可以修改最后一步以防止它通过＆＃34; NaN＆＃34;到相似函数？

Answer 1

def compare(row):
    num1 = row[0]
    pd.Series([multiply(num1, num2) for num2 in numbers[numbers.num2.notnull()].num2 ])

numbers[numbers.num1.notnull()].apply(compare, axis=1).T

如何在Python中的不均匀列上迭代函数？

1 个答案: