我有一个excel电子表格的数据框,我在其中找到了每个域发生的频率。我想将域频率计数添加到它的相应域。
index domain extractor Frequency
0 linkedin.com skipped 2
1 facebook.com skipped 5
2 hi5.com skipped 1
....
以下是查找频率并尝试将其添加到相应域的代码。
cnt = Counter()
for row_index, row in df.iterrows():
cnt[row['domain']] += 1
for i in cnt:
frequency = cnt
if i in row['domain']:
df['Frequency'] = df.loc[:(cnt[i])]
当我从数据框中打印出频率时:
Index url Frequency
0 https://www.linkedin.com/in/dgerstenblatt 0
1 http://www.linkedin.com/in/darren-cfbs-5465872 1
2 http://www.hi5.com/friend/p39168004--profile--... 2
3 http://license.reg.state.ma.us/pubLic/pubLicen... 3
4 http://license.reg.state.ma.us/pubLic/pubLicen... 4
5 http://profiles.friendster.com/3523606 5
6 http://www.lenoxadvisors.com/biographies/darre... NaN
7 http://10digits.us/n/Darren_Gerstenblatt/Newto... NaN
8 http://www.facebook.com/people/_/692786728 NaN
答案 0 :(得分:0)
正如Nehal所说,这是正确的解决方案。 stackoverflow.com/q/22391433/1005215