在列表中值的不同列标题中打印值

时间:2018-08-27 22:58:08

标签: python pandas csv datetime

此代码用于跟踪哪些交货延迟。我希望找到每个延迟交货的实例,并找到与其关联的采购订单号。我当前的代码能够生成一个列表,该列表显示指定范围内的延迟天数。但是,由于这些值存储在列表中,因此我无法找到与晚些时候关联的采购订单号。我希望在终端上打印,采购订单号以及交货的天数。 (从那里开始,我将使用if语句仅查看值> 0以仅查看哪些交付延迟)。我的问题是,我该如何在终端中打印采购订单号及其旁边的延迟天数(我不知道该怎么做,因为所有“延迟天数”的值都存储在列表)

import csv
import pandas as pd
import datetime


def calculate(number):   
    fileread = pd.read_csv('otd.csv', encoding='latin-1')
    Deliveryvalue = fileread['Delivery Date']
    Desiredvalue = fileread['source desired delivery date']



    date_format = '%m/%d/%Y'

    date1 = datetime.datetime.strptime(Deliveryvalue[number], date_format)
    date2 = datetime.datetime.strptime(Desiredvalue[number], date_format)

    diff= date1 - date2
    diff2 = diff.days



    return diff2

list = [] 
for i in range(1,20):
    list.append(calculate(i))

for y in list: 
    if y > 1:
        print(list)    

打印此:

[0, 0, 0, 0, 0, 0, 0, 0, -7, 3, 50, 0, 0, 0, 0, 0, 1, -9, 0]
# the negative numbers are early deliveries 

这是我的csv文件的虚拟示例:

enter image description here

3 个答案:

答案 0 :(得分:2)

import pandas as pd

# change names appropriately
PURCHASE_ORDER = 'Purchase Order'
DELIVERY_DATE = 'Delivery Date'
DESIRED_DATE = 'Desired Date'
DELAYED_DAYS = 'Delayed Days'

df = pd.read_csv('otd.csv', index_col=PURCHASE_ORDER)

-

>> df
                  Delivery Date Desired Date
Purchase Order
001               2014-12-31   2014-12-31
002               2014-12-31   2014-12-31
003               2015-01-05   2015-01-05
004               2015-01-05   2015-01-05
005               2015-02-12   2015-02-11
006               2016-02-13   2016-02-11

最后两次交货要迟到。

df[DELIVERY_DATE] = pd.to_datetime(df[DELIVERY_DATE])
df[DESIRED_DATE] = pd.to_datetime(df[DESIRED_DATE])
df[DELAYED_DAYS] = df[DELIVERY_DATE] - df[DESIRED_DATE]
late_threshold = pd.Timedelta(days=0)
late_deliveries = df[DELAYED_DAYS] > late_threshold

-

>> df[late_deliveries].drop([DELIVERY_DATE, DESIRED_DATE], axis=1)

                     Delayed Days
Purchase Order             
005                  1 days
006                  2 days

答案 1 :(得分:2)

似乎您希望它成为calculate()函数的一部分,因此您可以在其他组件上运行该函数。也许尝试做这样的事情:

data = {}
data['ordernum'] = 'ordernum'
data['delayed_days'] = 'diff2'

return data

因此,基本上,每次在数据帧的一行上循环时,它都会返回一个python字典(json)。

这是我玩过的代码:

import csv
import pandas as pd
import datetime


def calculate(row):   
    Deliveryvalue = row['delivery']
    Desiredvalue = row['desired']

    date_format = '%m/%d/%Y'

    date1 = datetime.datetime.strptime(Deliveryvalue, date_format)
    date2 = datetime.datetime.strptime(Desiredvalue, date_format)

    diff= date1 - date2
    diff2 = diff.days

    data = {}
    data['ordernum'] = row['order']
    data['delayed_days'] = diff2

    return data

file = pd.read_csv('otd.csv')

l = []

for index, row in file.iterrows():
    data = calculate(row)
    l.append(data)

print(l)

我更改了函数的工作方式,所以现在它遍历数据帧的行。如果我对您的问题的解释正确,那应该是解决方案。

输出:

{'ordernum': 1, 'delayed_days': 0}
{'ordernum': 2, 'delayed_days': 0}
{'ordernum': 3, 'delayed_days': 0}
{'ordernum': 4, 'delayed_days': 0}
{'ordernum': 5, 'delayed_days': 0}

答案 2 :(得分:2)

我采用了另一种方法,将您的专栏文章压缩,然后进行了比较。 抱歉,标题看起来很奇怪,我看不到整个内容。我在您的csv文件中添加了2行以包含延迟交货。这些订单是:

Purchase order number    Delivery Date    Source Desired Deliv
17                       2/10/2018        2/5/2018
18                       7/16/2017        7/14/2018

请参见下文

import pandas as pd
from datetime import datetime
from datetime import timedelta
import csv

df = pd.read_csv('./Desktop/dummy.csv')
late_items = []
date_format = '%m/%d/%Y'

for x,y,z in zip(df['Purchase order number'], df['Delivery Date'], df['Source desired delive']):
    actual_deliv_date = datetime.strptime(y, date_format)       
    supposed_deliv_date = datetime.strptime(z, date_format)    
    diff_deliv_date = supposed_deliv_date - actual_deliv_date
    if diff_deliv_date < timedelta(0):
        late_items.append([x, diff_deliv_date]) 
print(late_items)

输出:

[[17, datetime.timedelta(-5)], [18, datetime.timedelta(-2)]]

或者通过这种方式将“ Diff Delivery Date”列添加到原始df中:

diff_delivery_date = []
date_format = '%m/%d/%Y'
for x,y,z in zip(df['Purchase order number'], df['Delivery Date'], df['Source desired delive']):
    actual_deliv_date = datetime.strptime(y, date_format)
    supposed_deliv_date = datetime.strptime(z, date_format)
    diff_deliv_date = supposed_deliv_date - actual_deliv_date
    diff_delivery_date.append(diff_deliv_date)

df['Diff Deliv Date'] = diff_delivery_date
df.loc[df['Diff Deliv Date'] < timedelta(0)] # To get only those values less than 0 for late deliveries.
#df option to output whole df with on time and late deliveries.

输出:

    Purchase order number Delivery Date Source desired delive Diff Deliv Date

5                     17     2/10/2018              2/5/2018         -5 days
6                     18     7/16/2017             7/14/2017         -2 days