因此friends
是一个列,每个实例中都有一个列表,例如df['friends][0] = [id1, id2, ..., idn]
。我试图计算单独列中的朋友数量,例如df['friend_counts'][0] = n
。
我做了以下事情。我已在其他数据集中使用此代码,但由于某种原因,它会永远占用,而数据集只有300,000个实例。
df_user['friend_counts'] = df_user['friends'].apply(lambda x: len(df_user.friends[x]))
此外,由于某些原因,以下代码会创建一个season
列但未填充,即它只是空格。这很麻烦,因为我为每个其他数据集执行了完全相同的代码。他们改变了.apply()
方法吗?
#Convert 'date' to a date time object
df_reviews["date"] = pd.to_datetime(df_reviews["date"])
#Splitting up 'release_date' -> 'release_weekday', 'release_month',
'release_year'
df_reviews["weekday"] = df_reviews["date"].dt.weekday_name
df_reviews["month"] = df_reviews["date"].dt.month
df_reviews["year"] = df_reviews["date"].dt.year
### Helper function
def season_converter(month_name):
""" Returns the season a particular month is in """
season = ""`enter code here`
#Winter
if month_name in ['Jan', 'Feb', 'Dec']:
season = "Winter"
#Spring
if month_name in ['Mar', 'Apr', 'May']:
season = "Spring"
#Summer
if month_name in ['Jun', 'Jul', 'Aug'] :
season = "Summer"
#Fall
if month_name in ['Sep', 'Oct', 'Nov']:
season = "Fall"
#Other
if month_name == "NA":
season = "NA"
return season
#Create a new column that holds seasonal information
df_reviews['season'] = df_reviews['month'].apply(lambda x:
season_converter(x))
答案 0 :(得分:0)
我建议dictionary
使用map
来提高效果:
d = {1:'Winter', 2:'Winter', 12:'Winter', 3: 'Spring', .... np.nan:'NA', 'NA':'NA'}
df_reviews['season'] = df_reviews['month'].map(d)
另一种解决方案是否可以使用数字季节:
df_reviews['season'] = (df_reviews['month'] % 12 + 3) // 3