我对熊猫有点陌生,我有一个项目,据此我获得了一个包含链接和各自指标的数据框。我还收集了每个链接的国家/地区数据,当这些链接被解析时,它会返回包含县代码及其相应点击次数的词典列表。
我想做的是将“国家/地区代码”作为列添加到现有的bitly link数据框中,然后将每个国家/地区的点击次数保存到其特定的bitly link行中。 如果有人可以帮助我,那将很好。
bitly_links的熊猫数据框:
index | link | long_url | created_at | link_clicks |
------|-------------|---------------------|---------------------|-------------|
0 | bit.ly/aaaa | https://example.com | 2020-04-01 10:54:33 | 150 |
1 | bit.ly/bbbb | https://example.com | 2020-04-01 10:54:33 | 20 |
2 | bit.ly/cccc | https://example.com | 2020-04-01 10:54:33 | 15 |
3 | bit.ly/dddd | https://example.com | 2020-04-01 10:54:33 | 13 |
Python国家/地区列出了一个特定的按位链接(例如bit.ly/aaaa):
countries_data = [
{'country': 'US', 'clicks': 150}, {'country': 'UK', 'clicks': 20},
{'country': 'AU', 'clicks': 45}, {'country': 'ZS', 'clicks': 31}
]
index | country | clicks |
------|---------|--------|
0 | US | 150 |
1 | UK | 20 |
2 | AU | 45 |
3 | ZS | 31 |
我要制作的新数据框:
index | link | long_url | created_at | link_clicks | US | UK | AU | ZS |
------|-------------|---------------------|---------------------|-------------|----|----|----|----|
0 | bit.ly/aaaa | https://example.com | 2020-04-01 10:54:33 | 110 | 20 | 30 | 10 | 50 |
1 | bit.ly/bbbb | https://example.com | 2020-04-01 10:54:33 | 89 | 25 | 41 | 11 | 12 |
2 | bit.ly/cccc | https://example.com | 2020-04-01 10:54:33 | 81 | 10 | 27 | 31 | 14 |
3 | bit.ly/dddd | https://example.com | 2020-04-01 10:54:33 | 126 | 11 | 74 | 31 | 10 |
答案 0 :(得分:1)
我认为您需要做的是整理每次点击的国家/地区信息数据
# I take the example with two lists for link-level data related to countries, but
# it extends to more :
import pandas as pd
countries_data1 = [
{'country': 'US', 'clicks': 150}, {'country': 'UK', 'clicks': 20},
{'country': 'AU', 'clicks': 45}, {'country': 'ZS', 'clicks': 31}
]
countries_data2 = [
{'country': 'US', 'clicks': 150}, {'country': 'UK', 'clicks': 20},
{'country': 'AU', 'clicks': 45}, {'country': 'ZS', 'clicks': 31}
]
# transform to dataframe, add variable link, and concat
countries_data1 = pd.DataFrame(countries_data1).assign(link="bit.ly/aaaa")
countries_data2 = pd.DataFrame(countries_data2).assign(link="bit.ly/bbbb")
df = pd.concat([countries_data1, countries_data2]) # you will concat the list of all
# your dataframes with link information regarding countries, here I only have 2 in
# this example
# then go in wide format with pivot_table
df = df.pivot_table(index="link", values="clicks", columns="country")
您将获得此表:
country AU UK US ZS
link
bit.ly/aaaa 45 20 150 31
bit.ly/bbbb 45 20 150 31
# assume your first table (simplified) is :
table = pd.DataFrame({"link": ["bit.ly/aaaa", "bit.ly/bbbb"],
"link_clicks": [150,20]})
# set the index for link
table = table.set_index("link")
# then do an outer join on link
merge_df = pd.concat([table, df], join="outer", axis=1)
merge_df.head()
您得到结果:
link_clicks AU UK US ZS
link
bit.ly/aaaa 150 45 20 150 31
bit.ly/bbbb 20 45 20 150 31