Python + Pandas + Dataframe + CSV:代码从数据框中删除所有行,而不是指定的行

时间:2018-10-05 11:32:59

标签: python pandas csv dataframe

我已经编写了一个代码,删除了category_id列中所有具有NaN的行,从而成功删除了category_id列中具有NaN的行:

   #removal of rows in dataframe that have NaN values in 'category_id' column

   #data = data[np.isfinite(data['category_id'])]
   data = data[data['category_id'].notnull()]

   print(data['category_id'].shape)
   data.to_csv('dataset.csv', encoding='utf-8', index=False)
   print(type(data['category_id']))

输出:

(778,)
<class 'pandas.core.series.Series'>

接下来,我写了一个代码来保留列表中仅指定值的所有行:

#selecting rows of the dataset whose 'category' column has values mentioned in a list


category_ids = [19, 22, 2, 30, 23]
data = data[data.category_id.isin(category_ids)]
print(data.shape) 

data.to_csv('dataset.csv', encoding='utf-8', index=False)

输出:

(0, 164)

因此,它会产生空的数据框和CSV。为什么?

1 个答案:

答案 0 :(得分:3)

问题是您的数据是字符串,而不是列category_id中的整数。

print (data.category_id.dtype)
object

因此需要将列表中的值转换为字符串:

category_ids = ['19', '22', '2', '30', '23']
data = data[data.category_id.isin(category_ids)]

或通过Series.astype将列转换为整数:

category_ids = [19, 22, 2, 30, 23]
data = data[data.category_id.astype(int).isin(category_ids)]