融化宽广的df与熊猫共存

时间:2019-12-15 22:20:05

标签: python python-3.x pandas

我正在从磁盘-print(pd.read_csv('data.csv'))中读取csv:

    Unnamed:0    Company1    Company2    Company3 ...
0   2019-01-01   €100,000    €100,000    €100,000
1   2019-01-02   €100,000    €100,000    €100,000
2   2019-01-03   €100,000    €100,000    €100,000
3   2019-01-04   €100,000    €100,000    €100,000

正在读取的CSV是df上游的结果,并且未命名日期列已建立索引。我的问题是我有70多个公司,因此有70多个专栏。当我将其写到表中时,我希望公司落在“ company_name”列下,然后“ Company1”,“ Company2”等当前值落在“ predicted”列下。我先写最后的df,然后用Spark写到表中。

这是我想要的格式:

date         company_name    predicted
2019-01-01   Company1        €100,000
2019-01-01   Company2        €100,000
2019-01-01   Company3        €100,000
2019-01-02   Company1        €100,000
2019-01-02   Company2        €100,000
2019-01-02   Company3        €100,000

我已经尝试过了:

my_dict = pd.read_csv('data.csv')
df = pd.DataFrame(my_dict)
df.rename(columns={'Unnamed:0': 'date'}, inplace=True)
df = df.melt(id_vars=['date'], value_vars=df.columns[1:], var_name='company_name', 
value_name='predicted')
df.sort_values(by=['date'], inplace=True)
print(df)

几乎可以使用,但日期列具有NaN值:

        date   company_name   predicted
0       NaN    Company1       €100,000
1       NaN    Company1       €100,000
2       NaN    Company1       €100,000
3       NaN    Company1       €100,000
4       NaN    Company1       €100,000

2 个答案:

答案 0 :(得分:0)

这似乎可行:

import pandas as pd
df = pd.read_csv('data.csv')
df.rename( columns={'Unnamed: 0':'yyyy_mm_dd'}, inplace=True)
df = df.melt(id_vars=['yyyy_mm_dd'])
df.rename(columns={'variable': 'company'}, inplace=True)
df.rename(columns={'value': 'predicted'}, inplace=True)

df['predicted'] = df['predicted'].str.replace('€', '')

输出df:

       yyyy_mm_dd      variable       value
0      2019-12-10      Company1       100,000
1      2019-12-11      Company1       100,000
2      2019-12-12      Company1       100,000

答案 1 :(得分:0)

您可以像这样重新融化:

df.rename(columns={'Unnamed:0':'date'}, inplace=True) 
df.melt(col_level=0, id_vars='date').sort_values(by='date').reset_index(drop=True)                                                                                        

          date  variable     value
0   2019-01-01  Company1  €100,000
1   2019-01-01  Company2  €100,000
2   2019-01-01  Company3  €100,000
3   2019-01-02  Company1  €100,000
4   2019-01-02  Company2  €100,000
5   2019-01-02  Company3  €100,000
6   2019-01-03  Company1  €100,000
7   2019-01-03  Company2  €100,000
8   2019-01-03  Company3  €100,000
9   2019-01-04  Company1  €100,000
10  2019-01-04  Company2  €100,000
11  2019-01-04  Company3  €100,000