Question

我是Pandas的新手，我想知道如何使用行ID删除特定行。目前，我有一个CSV文件，其中包含有关不同学生的数据。 CSV文件中没有任何标题。

data.csv：

John    21 34 87 ........ #more than 100 columns of data
Abigail 18 45 53 ........ #more than 100 columns of data
Norton  19 45 12 ........ #more than 100 columns of data

data.py：

我有一个列表，其中记录了一些名字。

names = ['Jonathan', 'Abigail', 'Cassandra', 'Ezekiel']

我用Python打开了CSV文件，并使用列表理解功能来读取第一列中的所有名称，并将它们存储在分配了变量'student_list'的列表中。

现在，对于student_list中的所有元素，如果“名称” 列表中未显示该元素，我想在CSV文件中删除该元素。在此示例中，我要删除John和Norton，因为它们未出现在名称列表中。如何使用熊猫来实现？或者，比起使用熊猫来解决这个问题，有没有比这更好的选择了？

我尝试了以下代码：

csv_filename = data.csv
    with open(csv_filename, 'r') as readfile:
        reader = csv.reader(readfile, delimiter=',') 
        student_list = [row[0] for row in reader]  #returns John, Abigail and Norton.

        for student in student_list:
        if student not in names:
            id = student_list.index(student) #grab the index of the student in student list who's not found in the names list.

            #using pandas
            df = pd.read_csv(csv_filename) #read data.csv file
            df.drop(df.index[id], in_place = True) #delete the row id for the student who does not exist in names list.
            df.to_csv(csv_filename, index = False, sep=',')  #close the csv file with no index
        else:
            print("Student name found in names list")

我无法正确删除数据。有人可以解释吗？

Answer 1

您可以只使用过滤器过滤掉不需要的ID。

示例：

import pandas as pd
from io import StringIO

data = """
1,John
2,Beckey
3,Timothy
"""

df = pd.read_csv(StringIO(data), sep=',', header=None, names=['id', 'name'])


unwanted_ids = [3]

new_df = df[~df.id.isin(unwanted_ids)]

您还可以使用过滤器并获取索引以将列拖放到原始数据框中。示例：

df.drop(df[df.id.isin([3])].index, inplace=True)

更新以获取更新的问题：

df = pd.read_csv(csv_filename, sep='\t', header=None, names=['name', 'age'])
# keep only names wanted and reset index starting from 0
# drop=True makes sure to drop old index and not add it as column
df = df[df.name.isin(names)].reset_index(drop=True)
# if you really want index starting from 1 you can use this
df.index = df.index + 1
df.to_csv(csv_filename, index = False, sep=',')

如何使用熊猫从CSV文件中删除行数据？

1 个答案: