我有一个数据框:
Date Articles
2010-01-04 ((though, reliant, advertis, revenu, internet,...
2010-01-05 ((googl, expect, nexus, one, rival, iphon, hel...
2010-01-06 ((while, googl, introduc, first, piec, hardwar...
2010-01-07 ((googl, form, energi, subsidiari, appli, gove...
2010-01-08 ((david, pogu, review, googl, new, offer, nexu...
2010-01-12 ((the, compani, agre, hand, list, book, scan, ...
日期是索引,而文章是元组的元组。
我有另一个Dataframe:
Date Price
2010-01-08 602.020
2010-01-15 580.000
2010-01-22 550.010
2010-01-29 529.944
其中日期也是索引,但分为几周。
我的问题是,我想在第二个数据框中创建另一个列,其中包含指定特定周的所有文章,由索引指示。就像我的第二个数据帧的第一行一样,我希望在2010-01-08之前从我的第一个数据帧中获取所有文章(这将是我第一个数据帧中的前4个条目)。就像2010-01-15那样明智,我需要2010-01-08至2010-01-14的所有文章,等等。
任何帮助将不胜感激。感谢。
答案 0 :(得分:1)
我们可以使用IntervalIndex.from_breaks
和pd.cut
df1 = pd.DataFrame({'Articles':
{pd.Timestamp('2010-01-04 00:00:00'): [0, 1],
pd.Timestamp('2010-01-05 00:00:00'): [2, 3],
pd.Timestamp('2010-01-06 00:00:00'): [4, 5],
pd.Timestamp('2010-01-07 00:00:00'): [6, 7],
pd.Timestamp('2010-01-08 00:00:00'): [8, 9],
pd.Timestamp('2010-01-12 00:00:00'): [10, 11]}})
Articles
2010-01-04 [0, 1]
2010-01-05 [2, 3]
2010-01-06 [4, 5]
2010-01-07 [6, 7]
2010-01-08 [8, 9]
2010-01-12 [10, 11]
mybins = pd.IntervalIndex.from_breaks(
pd.date_range("2010-1-1", periods=5, freq="7D"),
closed="left"
)
df1["bin"] = pd.cut(df1.index, bins=mybins)
df1.groupby("bin")["Articles"].sum()
bin
[2010-01-01, 2010-01-08) [0, 1, 2, 3, 4, 5, 6, 7]
[2010-01-08, 2010-01-15) [8, 9, 10, 11]
[2010-01-15, 2010-01-22) None
[2010-01-22, 2010-01-29) None
Name: Articles, dtype: object
答案 1 :(得分:0)
以下是使用merge_asof和allow_exact_matches=False
的两步解决方案,以便每个文章行与日期严格大于(不等于)的第一个价格匹配)文章行的日期。
.agg(sum)
使用添加两个元组将它们组合成一个元组的事实。
假设您的DataFrame名为df
和df2
:
# Test data adapted from your examples.
# Sorry that this is difficult to copy-paste into pandas
df
Articles
2010-01-04 (though, reliant, advertis, revenu, internet)
2010-01-05 ((googl, expect, nexus), (one, rival, iphon))
2010-01-06 ((while, googl, introduc), (first,), (piec, hardwar))
2010-01-07 ((googl, form), (energi, subsidiari), (appli,))
2010-01-08 ((david, pogu, review), (googl, new, offer))
2010-01-12 ((the, compani), (agre, hand, list), (book, scan))
df2
Price
2010-01-08 602.020
2010-01-15 580.000
2010-01-22 550.010
2010-01-29 529.944
# Solution
price2articles = (pd.merge_asof(df,
df2,
left_index=True,
right_index=True,
allow_exact_matches=False,
direction='forward')
.groupby('Price')
.agg(sum))
result = pd.merge(df2, price2article, left_on='Price', right_index=True)
# To see full contents of wide data, set
# pd.options.display.max_colwidth = 150 or higher (-1 for no limit)
result
Articles
2010-01-08 (though, reliant, advertis, revenu, internet, (googl, expect, nexus), (one, rival, iphon), (while, googl, introduc), (first,), (piec, hardwar), (googl, form), (energi, subsidiari), (appli,))
2010-01-15 ((david, pogu, review), (googl, new, offer), (the, compani), (agre, hand, list), (book, scan))
答案 2 :(得分:0)
我认为需要df2['Date']
的值为list
with groupby,并将元组连接到print (df1)
Date Articles
0 2010-01-04 ((t, r), (s, q))
1 2010-01-07 ((g, f), (y, l))
2 2010-01-08 ((d, p), (t, o))
3 2010-01-12 ((t, c), (r, p))
b = pd.concat([df2['Date'],
pd.Series(pd.to_datetime(['1970-01-01','2100-01-01']))]).sort_values()
df1['Dates'] = pd.cut(df1['Date'], bins=b, labels=b[1:], right=False)
df3 = (df1.groupby('Dates')['Articles']
.apply(lambda x: [i for s in x for i in s])
.iloc[:-1]
.reset_index())
print (df3)
Dates Articles
0 2010-01-08 [(t, r), (s, q), (g, f), (y, l)]
1 2010-01-15 [(d, p), (t, o), (t, c), (r, p)]
2 2010-01-22 []
3 2010-01-29 []
s:
lists
最后,如果想要过滤掉空df3 = df3[df3['Articles'].astype(bool)]
print (df3)
Dates Articles
0 2010-01-08 [(t, r), (s, q), (g, f), (y, l)]
1 2010-01-15 [(d, p), (t, o), (t, c), (r, p)]
:
cout
答案 3 :(得分:0)
也许这个相当简单的双线也可以起作用: (这利用了2010年1月8日没有休息的日历周,而是在1月11日左右)
for (int i = 0; i < 15 ; i++) {
StudentEntry student = new StudentEntry();
student.name = txtFirstName.getText() + " " + txtLastName.getText();
...
studentBook.add(student);
}
如果您想要实际的一天,我们可以修改此代码以使用日历日的div:
m = {ind:dfx['Articles'].tolist() for ind,dfx in df1.groupby(df1.index.week)}
df2['new'] = pd.Series(df2.index.week).map(m).values
完整示例:
m = {ind+1:dfx['Articles'].tolist() for ind,dfx in df1.groupby((df1.index.dayofyear-1)//7)}
df2['new'] = pd.Series(df2.index.week).map(m).values
DF2:
import pandas as pd
data1 = '''\
Date Articles
2010-01-04 1
2010-01-05 2
2010-01-06 3
2010-01-07 4
2010-01-08 5'''
data2 = '''\
Date Price
2010-01-08 602.020
2010-01-15 580.000
2010-01-22 550.010
2010-01-29 529.944'''
df1 = pd.read_csv(pd.compat.StringIO(data1), sep='\s+', index_col='Date', parse_dates=['Date'])
df2 = pd.read_csv(pd.compat.StringIO(data2), sep='\s+', index_col='Date', parse_dates=['Date'])
m = {ind:dfx['Articles'].tolist() for ind,dfx in df1.groupby(df1.index.week)}
df2['new'] = pd.Series(df2.index.week).map(m).values
或:
Price new
Date
2010-01-08 602.020 [1, 2, 3, 4, 5]
2010-01-15 580.000 NaN
2010-01-22 550.010 NaN
2010-01-29 529.944 NaN