使用datetime创建新列或列表

时间:2017-10-24 17:13:48

标签: python python-3.x pandas numpy

我有问题。我有以下形式的数据:

0   A
1   B
2   2015-01-02
3   A
4   B
5   2015-01-03
6   B
7   C
8   2015-01-04

我希望在表单中获得一个新列或列表:

0   2015-01-02
1   2015-01-02
2   2015-01-02
3   2015-01-03
4   2015-01-03
5   2015-01-03
6   2015-01-04
7   2015-01-04
8   2015-01-04

关键是必须将给定日期之上的所有记录更改为该日期。每个下一次约会都是如此。 在这种情况下,我在新文件中创建了一个新列表,但我最想在现有文件中获取一个新列。当然,各个日期之间的记录数可能会有所不同。

我正在处理csv文件。

http://uploadfile.pl/pokaz/1246822---xx3t.html

数据非常不规律,我尝试按日期创建产品列表。

我的初始代码,在我从第一列获得的列表中是:

import pandas as pd
import numpy as np
import seaborn as sns
sns.set()

df = pd.read_csv("C:\\Users\dell\\Desktop\\alko_del2.csv", sep=';')

df = df.replace(['destylowane', 'alkoholowe'], [np.nan, np.nan], regex=True)
df = df.replace(['napoje'], ['WODKA'], regex=True)
df = df.replace(['wina'], ['WINO'], regex=True)
df = df.dropna(how='all')

df2 = df.loc[~(df == 'SN:').any(axis=1)]
df3 = df2.loc[~(df == 'Lp').any(axis=1)]
df4 = df3.loc[~(df == 'zakupu').any(axis=1)]
df5 = df4.loc[~(df == 'netto').any(axis=1)]
print (df5)
h=[]

for n in range(len(df5)):
    n=df5.iloc[[n]].dropna(axis=1, how="any")

    n.columns = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
    n = n.drop(['j','b', 'd', 'e', 'f', 'g', 'h', 'i','c'], 1)

    z=n.to_string(header=False)
    h.append(z)
m='\n'.join(h)

with open("C:\\Users\dell\\Desktop\\lista_1.csv", "w") as output:
    output.write(m)

2 个答案:

答案 0 :(得分:2)

我只会根据您的样本数据回馈您的预期出货量

df.date=pd.to_datetime(df.date,errors='coerce').bfill()
df
Out[71]: 
        date
0 2015-01-02
1 2015-01-02
2 2015-01-02
3 2015-01-03
4 2015-01-03
5 2015-01-03
6 2015-01-04
7 2015-01-04
8 2015-01-04

答案 1 :(得分:0)

成功:) 起初,数据看起来像这样:

napoje  alkoholowe  destylowane 30  11,86   355,94  17,06   416,03  511,7   47,54%  14,44%  60  
PIWO    188 1,96    369,11  3   459,1   564,72  52,46%  19,60%  89,9            


SN: 010B-3F87-ECBF-816F Sprzedaż    grupy   towarów 0   wg  okresów strona  1   z   8   
Lp  Sklep   Kod:    Nazwa:  Ilość   Cena    Wartość Cena    Wartość Wartość Udział  Marża   Kwota
zakupu  zakupu  sprzed. sprzed. sprzed. %   w   %   marży               
netto   netto   brutto  netto   brutto  sprzed.                         
2015-01-11  218 3,33    725,05  4,94    875,13  1076,42 0,12%   17,15%  150         

napoje  alkoholowe  destylowane 17  14,95   254,07  20,55   284,09  349,43  51,57%  10,57%  30  
PIWO    122 1,69    206,66  2,69    266,79  328,16  48,43%  22,54%  60,1            

2015-01-12  139 3,31    460,73  4,87    550,88  677,59  0,08%   16,36%  90,1

现在:

              data  grupa
    24  2015-01-11  WODKA
    25  2015-01-11  PIWO
    26  2015-01-11  RAZEM
    27  2015-01-12  WODKA
    28  2015-01-12  PIWO
    29  2015-01-12  RAZEM

完整代码:

import pandas as pd
import numpy as np
import seaborn as sns
sns.set()
from datetime import datetime as dt

df = pd.read_csv("C:\\Users\dell\\Desktop\\alko_del2.csv", sep=';')

df = df.replace(['destylowane', 'alkoholowe'], [np.nan, np.nan], regex=True)
df = df.replace(['napoje'], ['WODKA'], regex=True)
df = df.replace(['wina'], ['WINO'], regex=True)
df = df.dropna(how='all')

df2 = df.loc[~(df == 'SN:').any(axis=1)]
df3 = df2.loc[~(df == 'Lp').any(axis=1)]
df4 = df3.loc[~(df == 'zakupu').any(axis=1)]
df5 = df4.loc[~(df == 'netto').any(axis=1)]

h1 = [] 
for n in range(1):
    df6=df5.iloc[[n]].dropna(axis=1, how="any")    
    df6.columns = ['grupa', 'b', 'c', 'd', 'e', 'f', 'g', 'h','i','j']
    df6 = df6.drop(['d', 'e', 'f', 'g', 'h', 'i','c', 'b','j'], 1) 
    h1.append(df6.to_string(header=True, index=False))
    g1 = '\n'.join(h1)

for n in range(1,len(df5)):
    df6=df5.iloc[[n]].dropna(axis=1, how="any")   
    df6.columns = ['grupa', 'b', 'c', 'd', 'e', 'f', 'g', 'h','i','j']
    df6 = df6.drop(['d', 'e', 'f', 'g', 'h', 'i','c','b','j'], 1) 
    h1.append(df6.to_string(header=False, index=False))
    g1 = '\n'.join(h1)

with open("C:\\Users\dell\\Desktop\\lista_1.csv", 'w') as file_handler:
    file_handler.write(g1)

x = pd.read_csv("C:\\Users\dell\\Desktop\\lista_1.csv", sep=';')
x.insert(0,'data', x['grupa'])
x.data = pd.to_datetime(x.data, errors='coerce').bfill()
x.loc[pd.to_datetime(x['grupa'], errors='coerce').notnull(), 'grupa'] = 'RAZEM'
x.to_csv("C:\\Users\dell\\Desktop\\lista_2.csv")

不太漂亮但它有效:)现在我可以安全地添加其他值的列。