我有一个非常大的数据集(测试),大约有100万行。我想从数据集中更新一列(“日期”)。我只想在“日期”列中输入3个日期:
2014-04-01, 2014-05-01, 2014-06-01
因此,一行中的每个日期以及每第3行之后的日期都是重复的。
我已经尝试过了:
for i in range(0,len(test),3):
if(i <= len(test)):
test['Date'][i] = '2014-04-01'
test['Date'][i+1] = '2014-05-01'
test['Date'][i+2] = '2014-06-01'
我收到此警告:
__main__:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
__main__:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
__main__:5: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
我已通过链接,但无法解决我的问题。而且我已经在Google上进行了搜索,在切片之前得到了诸如copy()数据集之类的一些解决方案,而其他解决方案却无济于事。
答案 0 :(得分:2)
我相信您想要的是np.tile
:
from math import ceil
dates = pd.Series(['2014-04-01', '2014-05-01', '2014-06-01'], dtype='datetime64[ns]')
repeated_dates = np.tile(dates, len(df) // 3 + 1)[:len(df)]
df['dates'] = repeated_dates
这将创建一个包含重复值的Series
,并将其分配给数据框的一列。
答案 1 :(得分:1)
您还可以查看itertools
islice
和cycle
,这使您可以在数据帧的长度上循环列表或序列。
dates = pd.Series(['2014-04-01', '2014-05-01', '2014-06-01'], dtype='datetime64[ns]')
df = pd.DataFrame(np.random.randint(0,50,50).reshape(10,5))
from itertools import islice,cycle
df['dates'] = list(islice(cycle(dates),len(df)))
print(df)
0 1 2 3 4 dates
0 45 3 13 24 13 2014-04-01
1 30 44 6 17 24 2014-05-01
2 47 22 16 28 12 2014-06-01
3 11 13 10 0 47 2014-04-01
4 32 12 49 14 2 2014-05-01
5 15 6 21 17 49 2014-06-01
6 49 49 28 18 9 2014-04-01
7 18 35 35 40 7 2014-05-01
8 44 15 13 49 28 2014-06-01
9 9 14 36 36 6 2014-04-01