我有2个数据帧,即df1和df2,如下所示
df1=pd.read_csv("abc.csv")
print (df1.head(10))
df2=pd.read_csv("xyz.csv")
print (df2.head(10))
A B
0 2019-01-01 03:56:29 197.199997
1 2019-01-01 04:02:29 197.186142
2 2019-01-02 06:24:29 196.857986
3 2019-01-02 06:42:29 196.816376
4 2019-01-03 11:52:29 196.100006
5 2019-01-03 12:00:30 196.015961
6 2019-01-04 14:18:30 194.566376
7 2019-01-04 14:38:30 194.356293
8 2019-01-04 19:48:30 191.100006
9 2019-01-05 19:56:30 191.081512
C D
0 2019-01-1 18:00:00 1333
1 2019-01-2 19:00:00 1.18
2 2019-01-3 20:00:00 1666667
3 2019-01-4 21:00:00 0
4 2019-01-5 22:00:00 1
5 2019-01-6 23:00:00 1.5
6 2019-01-7 00:00:00 109
7 2019-01-8 01:00:00 200
8 2019-01-9 02:00:00 192
9 2019-01-10 03:00:00 1.700000
df2具有每小时的平均数据,现在如何选择df1中仅日期的值,其中df2列“ D”的值大于2,即输出看起来像,
A B
0 2019-01-01 03:56:29 197.199997
1 2019-01-01 04:02:29 197.186142
2 2019-01-03 11:52:29 196.100006
4 2019-01-03 12:00:30 196.015961
我尝试过
,`final_data=pd.concat([df1.reset_index(drop=True),df2.reset_index(drop=True)],axis=1)
final_data=final_data[final_data["D"] > 2]
但是我没有得到正确的输出,有人可以帮助我解决该问题吗
答案 0 :(得分:1)
您可以尝试以下方法:
import pandas as pd
df1 = pd.read_csv("file.csv")
df2 = pd.read_csv("file2.csv")
df2['C'] = pd.to_datetime(df2['C'], format='%Y-%m-%d')
dates = []
for ind in df2.index:
if(df2['D'][ind]>2):
date_tup = (df2['C'][ind].year,df2['C'][ind].month,df2['C'][ind].day)
dates.append(date_tup)
df1['A'] = pd.to_datetime(df1['A'], format='%Y-%m-%d', errors='ignore')
for ind in df1.index:
date_tup = (df1['A'][ind].year,df1['A'][ind].month,df1['A'][ind].day)
if(date_tup not in dates):
df1 = df1.drop([ind])
print(df1)
file1.csv:
A,B
2019-01-01 03:56:29,197.199997
2019-01-01 04:02:29,197.186142
2019-01-02 06:24:29,196.857986
2019-01-02 06:42:29,196.816376
2019-01-03 11:52:29,196.100006
2019-01-03 12:00:30,196.015961
2019-01-04 14:18:30,194.566376
2019-01-04 14:38:30,194.356293
2019-01-04 19:48:30,191.100006
2019-01-05 19:56:30,191.081512
file2.csv:
C,D
2019-01-01 18:00:00,1333
2019-01-02 19:00:00,1.18
2019-01-03 20:00:00,1666667
2019-01-04 21:00:00,0
2019-01-05 22:00:00,1
2019-01-06 23:00:00,1.5
2019-01-07 00:00:00,109
2019-01-08 01:00:00,200
2019-01-09 02:00:00,192
2019-01-10 03:00:00,1.700000
答案 1 :(得分:0)
尝试一下:
final_data=pd.concat([df1.reset_index(drop=True),df2.reset_index(drop=True)],axis=1)
final_data=final_data[final_data["D"] > 2,['A','B']]
答案 2 :(得分:0)
假设两个矩阵的索引匹配,并且您只想保留df1中的信息,则:
df1[df2['D'] > 2]
应该可以解决问题。