Question

我正在使用pandas打开文本文档，如下所示。

input_data = pd.read_csv('input.tsv', header=0, delimiter="\t", quoting=3 )
L= input_data["title"] + '. ' + input_data["description"]

我发现我的某些文字等于nan。因此，我尝试了以下方法。

import math
for text in L:

    if not math.isnan(text):
        print(text)

但是，这返回了以下错误TypeError: must be real number, not str

是否可以在python中识别字符串nan的值？

我的tsv如下所示

id  title   description major   minor
27743058    Partial or total open meniscectomy? : A prospective, randomized study.  In order to compare partial with total meniscectomy a prospective clinical study of 200 patients was carried out. At arthrotomy 100 patients were allocated to each type of operation. The two groups did not differ in duration of symptoms, age distribution, or sex ratio. The operations were performed as conventional arthrotomies. One hundred and ninety two of the patients were seen at follow up 2 and 12 months after operation. There was no difference in the period off work between the two groups. One year after operation, 6 of the 98 patients treated with partial meniscectomy had undergone further operation. In all posterior tears were found at both procedures. Among the 94 patients undergoing total meniscectomy, 4 required further operation. In each, part of the posterior horn had been left at the primary procedure. One year after operation significantly more patients who had undergone partial meniscectomy had been relieved of symptoms. However, the two groups did not show any difference in the degree of radiological changes present.    ### ###
27743057        Synovial oedema is a frequent complication in arthroscopic procedures performed with normal saline as the irrigating fluid. The authors have studied the effect of saline solution, Ringer lactate, 5% Dextran and 10% Dextran in normal saline on 12 specimens of human synovial membrane. They found that 10% Dextran in normal saline decreases the water content of the synovium without causing damage, and recommend this solution for procedures lasting longer than 30 minutes. ### ###

Answer 1

您的第一个问题是math.isnan()不接受字符串值作为输入。您可以尝试看看math.isnan('any string')。

因为您已经在pandas数据框中，所以最好使用Pandas处理您的案件。例如：

df.dropna()           # column-wise nan drop
df.dropna(axis=1)     # row-wise nan drop

请注意，dropna（）中有一些非常有用的参数可为您带来更多便利，因此请务必从doctring或相应的手动条目中进行检查。

作为一个建议，当您使用熊猫时，最好记住，无论您想做什么，只要在本地熊猫功能内就更容易做。因为Pandas是这类工作的黄金标准，所以通常来说，无论您想做什么，只要有意义，Pandas社区就已经想到（并实现了）。

Answer 2

您给定的数据框很难复制。这是一个示例df：

df = pd.DataFrame([["11","1", np.nan], [np.nan,"1", "2"], ['abc','def','ijk']],
             columns=["ix","a", "b"])
>>df

    a   b   c
0   11  1   NaN
1   NaN 1   2
2   abc def ijk

来自文档：df.dropna()

df.dropna()

这将返回所有列中没有任何nan的所有行。输出：

    a   b   c
2   abc def ijk

用于过滤没有任何nan的列：

df.dropna(axis=1)

    b
0   1
1   1
2   def

用于查找带有nan的行：

df_nan= df.drop(list(df.dropna().index))

还请检查how=内置函数，该函数可让您根据所选的轴为any或all行/列删除na值。

如何在python中检测具有nan值的字符串

2 个答案: