我仍然是python的新手,需要帮助:
我的数据是csv格式,如下所示:
Month YEAR AZ-Phoenix CA-Los Angeles CA-San Diego CA-San Francisco CO-Denver DC-Washington January 1987 59.33 54.67 46.61 50.20 February 1987 59.65 54.89 46.87 49.96 64.77
这需要合并并在第2列和第3列中显示,方法是增加第1列n ..次。
输出应为:
Month YEAR January 1987 AZ-Phoenix January 1987 CA-Los Angeles 59.33 January 1987 CA-San Diego 54.67 January 1987 CA-San Francisco 46.61 January 1987 CO-Denver 50.20
如何在csv阅读器中实现这一目标?
答案 0 :(得分:2)
使用read_csv
与分隔符tab
- \t
或如果分隔符2 and more whitespaces
使用piRSquared's
解决方案:
import pandas as pd
df = pd.read_csv(sep='\t')
我认为你需要:
df = df.set_index('YEAR').stack(dropna=False).reset_index()
df.columns = ['YEAR','A','B']
print (df)
YEAR A B
0 January 1987 AZ-Phoenix 59.33
1 January 1987 CA-Los Angeles 54.67
2 January 1987 CA-San 46.61
3 January 1987 Diego 50.20
4 January 1987 CA-San Francisco NaN
5 January 1987 CO-Denver NaN
6 January 1987 DC-Washington NaN
7 February 1987 AZ-Phoenix 59.65
8 February 1987 CA-Los Angeles 54.89
9 February 1987 CA-San 46.87
10 February 1987 Diego 49.96
11 February 1987 CA-San Francisco 64.77
12 February 1987 CO-Denver NaN
13 February 1987 DC-Washington NaN
#if need remove rows with NaN
df = df.set_index('YEAR').stack().reset_index()
df.columns = ['YEAR','A','B']
print (df)
YEAR A B
0 January 1987 AZ-Phoenix 59.33
1 January 1987 CA-Los Angeles 54.67
2 January 1987 CA-San 46.61
3 January 1987 Diego 50.20
4 February 1987 AZ-Phoenix 59.65
5 February 1987 CA-Los Angeles 54.89
6 February 1987 CA-San 46.87
7 February 1987 Diego 49.96
8 February 1987 CA-San Francisco 64.77
melt
的另一个解决方案:
df = pd.melt(df, id_vars='YEAR', value_name='B', var_name='A')
print (df)
YEAR A B
0 January 1987 AZ-Phoenix 59.33
1 February 1987 AZ-Phoenix 59.65
2 January 1987 CA-Los Angeles 54.67
3 February 1987 CA-Los Angeles 54.89
4 January 1987 CA-San 46.61
5 February 1987 CA-San 46.87
6 January 1987 Diego 50.20
7 February 1987 Diego 49.96
8 January 1987 CA-San Francisco NaN
9 February 1987 CA-San Francisco 64.77
10 January 1987 CO-Denver NaN
11 February 1987 CO-Denver NaN
12 January 1987 DC-Washington NaN
13 February 1987 DC-Washington NaN
#if need remove rows with NaN
df = pd.melt(df, id_vars='YEAR', value_name='B', var_name='A').dropna(subset=['B'])
print (df)
YEAR A B
0 January 1987 AZ-Phoenix 59.33
1 February 1987 AZ-Phoenix 59.65
2 January 1987 CA-Los Angeles 54.67
3 February 1987 CA-Los Angeles 54.89
4 January 1987 CA-San 46.61
5 February 1987 CA-San 46.87
6 January 1987 Diego 50.20
7 February 1987 Diego 49.96
9 February 1987 CA-San Francisco 64.77
答案 1 :(得分:2)
选项1
使用pd.melt
pd.melt(df, 'YEAR')
YEAR variable value
0 January 1987 AZ-Phoenix 59.33
1 February 1987 AZ-Phoenix 59.65
2 January 1987 CA-Los Angeles 54.67
3 February 1987 CA-Los Angeles 54.89
4 January 1987 CA-San Diego 46.61
5 February 1987 CA-San Diego 46.87
6 January 1987 CA-San Francisco 50.20
7 February 1987 CA-San Francisco 49.96
8 January 1987 CO-Denver NaN
9 February 1987 CO-Denver 64.77
10 January 1987 DC-Washington NaN
11 February 1987 DC-Washington NaN
选项2
使用numpy
工具重建
pd.DataFrame(dict(
YEAR=df.YEAR.values.repeat(len(df.columns) - 1),
B=df.drop('YEAR', 1).values.ravel(),
A=np.tile(df.columns.difference(['YEAR']).values, len(df)),
))[['YEAR', 'A', 'B']]
YEAR variable value
0 January 1987 AZ-Phoenix 59.33
1 February 1987 AZ-Phoenix 59.65
2 January 1987 CA-Los Angeles 54.67
3 February 1987 CA-Los Angeles 54.89
4 January 1987 CA-San Diego 46.61
5 February 1987 CA-San Diego 46.87
6 January 1987 CA-San Francisco 50.20
7 February 1987 CA-San Francisco 49.96
8 January 1987 CO-Denver NaN
9 February 1987 CO-Denver 64.77
10 January 1987 DC-Washington NaN
11 February 1987 DC-Washington NaN
<强> 设置 强>
df = pd.read_csv(sep='\s{2,}', engine='python')