我有问题。我有以下形式的数据:
0 A
1 B
2 2015-01-02
3 A
4 B
5 2015-01-03
6 B
7 C
8 2015-01-04
我希望在表单中获得一个新列或列表:
0 2015-01-02
1 2015-01-02
2 2015-01-02
3 2015-01-03
4 2015-01-03
5 2015-01-03
6 2015-01-04
7 2015-01-04
8 2015-01-04
关键是必须将给定日期之上的所有记录更改为该日期。每个下一次约会都是如此。 在这种情况下,我在新文件中创建了一个新列表,但我最想在现有文件中获取一个新列。当然,各个日期之间的记录数可能会有所不同。
我正在处理csv文件。
http://uploadfile.pl/pokaz/1246822---xx3t.html
数据非常不规律,我尝试按日期创建产品列表。
我的初始代码,在我从第一列获得的列表中是:
import pandas as pd
import numpy as np
import seaborn as sns
sns.set()
df = pd.read_csv("C:\\Users\dell\\Desktop\\alko_del2.csv", sep=';')
df = df.replace(['destylowane', 'alkoholowe'], [np.nan, np.nan], regex=True)
df = df.replace(['napoje'], ['WODKA'], regex=True)
df = df.replace(['wina'], ['WINO'], regex=True)
df = df.dropna(how='all')
df2 = df.loc[~(df == 'SN:').any(axis=1)]
df3 = df2.loc[~(df == 'Lp').any(axis=1)]
df4 = df3.loc[~(df == 'zakupu').any(axis=1)]
df5 = df4.loc[~(df == 'netto').any(axis=1)]
print (df5)
h=[]
for n in range(len(df5)):
n=df5.iloc[[n]].dropna(axis=1, how="any")
n.columns = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
n = n.drop(['j','b', 'd', 'e', 'f', 'g', 'h', 'i','c'], 1)
z=n.to_string(header=False)
h.append(z)
m='\n'.join(h)
with open("C:\\Users\dell\\Desktop\\lista_1.csv", "w") as output:
output.write(m)
答案 0 :(得分:2)
我只会根据您的样本数据回馈您的预期出货量
df.date=pd.to_datetime(df.date,errors='coerce').bfill()
df
Out[71]:
date
0 2015-01-02
1 2015-01-02
2 2015-01-02
3 2015-01-03
4 2015-01-03
5 2015-01-03
6 2015-01-04
7 2015-01-04
8 2015-01-04
答案 1 :(得分:0)
成功:) 起初,数据看起来像这样:
napoje alkoholowe destylowane 30 11,86 355,94 17,06 416,03 511,7 47,54% 14,44% 60
PIWO 188 1,96 369,11 3 459,1 564,72 52,46% 19,60% 89,9
SN: 010B-3F87-ECBF-816F Sprzedaż grupy towarów 0 wg okresów strona 1 z 8
Lp Sklep Kod: Nazwa: Ilość Cena Wartość Cena Wartość Wartość Udział Marża Kwota
zakupu zakupu sprzed. sprzed. sprzed. % w % marży
netto netto brutto netto brutto sprzed.
2015-01-11 218 3,33 725,05 4,94 875,13 1076,42 0,12% 17,15% 150
napoje alkoholowe destylowane 17 14,95 254,07 20,55 284,09 349,43 51,57% 10,57% 30
PIWO 122 1,69 206,66 2,69 266,79 328,16 48,43% 22,54% 60,1
2015-01-12 139 3,31 460,73 4,87 550,88 677,59 0,08% 16,36% 90,1
现在:
data grupa
24 2015-01-11 WODKA
25 2015-01-11 PIWO
26 2015-01-11 RAZEM
27 2015-01-12 WODKA
28 2015-01-12 PIWO
29 2015-01-12 RAZEM
完整代码:
import pandas as pd
import numpy as np
import seaborn as sns
sns.set()
from datetime import datetime as dt
df = pd.read_csv("C:\\Users\dell\\Desktop\\alko_del2.csv", sep=';')
df = df.replace(['destylowane', 'alkoholowe'], [np.nan, np.nan], regex=True)
df = df.replace(['napoje'], ['WODKA'], regex=True)
df = df.replace(['wina'], ['WINO'], regex=True)
df = df.dropna(how='all')
df2 = df.loc[~(df == 'SN:').any(axis=1)]
df3 = df2.loc[~(df == 'Lp').any(axis=1)]
df4 = df3.loc[~(df == 'zakupu').any(axis=1)]
df5 = df4.loc[~(df == 'netto').any(axis=1)]
h1 = []
for n in range(1):
df6=df5.iloc[[n]].dropna(axis=1, how="any")
df6.columns = ['grupa', 'b', 'c', 'd', 'e', 'f', 'g', 'h','i','j']
df6 = df6.drop(['d', 'e', 'f', 'g', 'h', 'i','c', 'b','j'], 1)
h1.append(df6.to_string(header=True, index=False))
g1 = '\n'.join(h1)
for n in range(1,len(df5)):
df6=df5.iloc[[n]].dropna(axis=1, how="any")
df6.columns = ['grupa', 'b', 'c', 'd', 'e', 'f', 'g', 'h','i','j']
df6 = df6.drop(['d', 'e', 'f', 'g', 'h', 'i','c','b','j'], 1)
h1.append(df6.to_string(header=False, index=False))
g1 = '\n'.join(h1)
with open("C:\\Users\dell\\Desktop\\lista_1.csv", 'w') as file_handler:
file_handler.write(g1)
x = pd.read_csv("C:\\Users\dell\\Desktop\\lista_1.csv", sep=';')
x.insert(0,'data', x['grupa'])
x.data = pd.to_datetime(x.data, errors='coerce').bfill()
x.loc[pd.to_datetime(x['grupa'], errors='coerce').notnull(), 'grupa'] = 'RAZEM'
x.to_csv("C:\\Users\dell\\Desktop\\lista_2.csv")
不太漂亮但它有效:)现在我可以安全地添加其他值的列。