从CSV文件中获取值列表

时间:2017-01-31 13:16:19

标签: python csv

我有一个带有订单列表的csv文件,如下所示:

CUSTOMER_CODE,CUSTOMER_NAME,NAME,PRODUCT
1044, C1, Name1, Arduino,
1044, C1, Name1, ESP8266,
1048, C2, Name1, Arduino Uno,
1042, C3, Name1, ESP32,
1049, C4, Name1, Arduino Mega,
1042, C3, Name1, Nexus 4,

现在我只想提取客户代码列表 [1042, 1044 ,1048 ,1049]

不是

[1042, 1044 ,1044,1044,1044,1044,1044,1044,1048,1048,1048,1048,1048,1048,1048,1049 etc.]

#!/usr/bin/python
import MySQLdb, csv
CUSTOMER_CODES = []

with open('Customers.csv','r') as csvfile:
    reader = csv.DictReader(csvfile)

    for row in reader:
        if len(CUSTOMER_CODES) == 0:
            #adding 1st value
            CUSTOMER_CODES.append(int(row['CUSTOMER_CODE']))
        for i in range(0,len(CUSTOMER_CODES)):
            #check each value of table
            print CUSTOMER_CODES
            if CUSTOMER_CODES[i] == int(row['CUSTOMER_CODE']):
                print "Code is already here "+ str(row['CUSTOMER_CODE'])
            else:
                CUSTOMER_CODES.append(int(row['CUSTOMER_CODE']))

而是输出如下:

[1044, 1045, 1047....]

我有这个:

[1044, 1045, 1045, 1045, 1045, 1045, 1045, 1045, 1045, 1045, 1045, 1045, 1045, 1045, 1045, 1045, 1045, 1045, 1045, 1045, 1045, 1045, 1047, 1047, 1047, 1047, 1047, 1047, 1047, 1047, 1047,

2 个答案:

答案 0 :(得分:5)

只需使用set代替list

#!/usr/bin/python
import MySQLdb, csv
CUSTOMER_CODES = set()

with open('Customers.csv','r') as csvfile:
    reader = csv.DictReader(csvfile)

    for row in reader:
        CUSTOMER_CODES.add(int(row['CUSTOMER_CODE']))

或者使用set comprehension(假设Python 2.6 +):

#!/usr/bin/python
import MySQLdb, csv

with open('Customers.csv','r') as csvfile:
    reader = csv.DictReader(csvfile)
    CUSTOMER_CODES = {int(row['CUSTOMER_CODE']) for row in reader}


如果您想要排序列表,请添加CUSTOMER_CODES = sorted(CUSTOMER_CODES)

答案 1 :(得分:1)

使用pandas的另一种解决方案:

# Import your csv file into Dataframe
df = pd.read_csv('yourfile.csv')

# Extract the column you want and export to list
a = df['CUSTOMER_CODE'].tolist()

# Sort it
a = a.sort()

返回:

In [29]: a
Out[29]: [1042, 1042, 1044, 1044, 1048, 1049]

编辑:删除重复项:

a = df['CUSTOMER_CODE'].drop_duplicates().tolist()

然后排序。