我试图通过使用Pandas查找行来计算出这两个csv文件:
File1中:
---------------------------------------------------------------
Day Mth Yr Hr Min Loc_Nu Lat Long Rain
---------------------------------------------------------------
1 1 2005 9 30 12456 -34.9211 138.6216 Yes
1 1 2005 9 45 12375 -34.9211 138.6216 Yes
1 12 1998 17 5 12376 -34.9211 138.6216 No
File2:
----------------------------------------------------------------------
date 12375 12376 12456
----------------------------------------------------------------------
1/1/2005 9:30 NA NA 0.2
1/1/2005 10:00 NA 0 NA
1/1/2005 10:30 0 NA 0.6
Loc_Nu
和file1
中的时间与file2
NA
,0
,>0
。到目前为止,这是我的脚本:
import pandas as pd
file1 = pd.read_csv(r'E:\project\test\file1.csv')
print file1
file2 = pd.read_csv(r'E:\project\test\file2.csv')
print file2
我必须去目录。如果没有它,我无法打印file1和file2。
答案 0 :(得分:0)
你可以尝试这个解决方案,如果你不明白,你可以在评论中提问:
import pandas as pd
import io, datetime
df = pd.read_csv(r'E:\project\test\file1.csv')
df1 = pd.read_csv(r'E:\project\test\file2.csv')
#set column date to datetime
df1["date"] = pd.to_datetime(df1["date"], format="%d/%m/%Y %H:%M")
#set column date to index, stack columns to rows(not drop NaN values), reset index
df1 = df1.set_index("date").stack(dropna=False).reset_index()
#set column names
df1.columns = ['date','Loc_Nu', 'values']
#set column type to int for merging
df1['Loc_Nu'] = df1['Loc_Nu'].astype(int)
#set datetime column to column date, delete these columns
df['date'] = df[['Yr', 'Mth', 'Day', 'Hr', 'Min']].apply(lambda s : datetime.datetime(*s),axis = 1)
df = df.drop(['Yr', 'Mth', 'Day', 'Hr', 'Min'], axis=1)
print df
# Loc_Nu Lat Long Rain date
#0 12456 -34.9211 138.6216 Yes 2005-01-01 09:30:00
#1 12375 -34.9211 138.6216 Yes 2005-01-01 09:45:00
#2 12376 -34.9211 138.6216 No 1998-12-01 17:05:00
print df1
# date Loc_Nu values
#0 2005-01-01 09:30:00 12375 NaN
#1 2005-01-01 09:30:00 12376 NaN
#2 2005-01-01 09:30:00 12456 0.2
#3 2005-01-01 10:00:00 12375 NaN
#4 2005-01-01 10:00:00 12376 0.0
#5 2005-01-01 10:00:00 12456 NaN
#6 2005-01-01 10:30:00 12375 0.0
#7 2005-01-01 10:30:00 12376 NaN
#8 2005-01-01 10:30:00 12456 0.6
#intersection df and df1 by columns date and Loc_Nu
df2 = pd.merge(df, df1, on=['date', 'Loc_Nu'])
#if you want, you can reorder columns
df2 = df2[['date','Loc_Nu','Lat','Long','Rain','values']]
print df2
# date Loc_Nu Lat Long Rain values
#0 2005-01-01 09:30:00 12456 -34.9211 138.6216 Yes 0.2
#what are dataframes and count them by matches 0, >0, NaN
print df2.loc[df2['values'] == 0 ]
print len(df2.loc[df2['values'] == 0 ].index)
print df2.loc[df2['values'] > 0 ]
print len(df2.loc[df2['values'] > 0 ].index)
print df2.loc[df2['values'].isnull()]
print len(df2.loc[df2['values'].isnull()].index)
#Empty DataFrame
#Columns: [date, Loc_Nu, Lat, Long, Rain, values]
#Index: []
#0
# date Loc_Nu Lat Long Rain values
#0 2005-01-01 09:30:00 12456 -34.9211 138.6216 Yes 0.2
#1
#Empty DataFrame
#Columns: [date, Loc_Nu, Lat, Long, Rain, values]
#Index: []
#0