我有一个数据框
Counties Numbers
Yabucoa Municipio, Puerto Rico 7766
Marion County, West Virginia 8756
Barbour County, Alabama 33445
Santa Cruz County, Arizona 447
Navajo County, Arizona 1500
Denver County, Colorado 67990
我正在尝试以某种方式进行排序,以使州名按字母顺序排序,而县名在州内进行内部排序
Counties Numbers
Barbour County, Alabama 33445
Navajo County, Arizona 1500
Santa Cruz County, Arizona 447
Denver County, Colorado 67990
Yabucoa Municipio, Puerto Rico 7766
Marion County, West Virginia 8756
数据框代码:
df_test = pd.DataFrame([
{'Counties': 'Yabucoa Municipio, Puerto Rico','Numbers': 7766},
{'Counties': 'Marion County, West Virginia','Numbers': 8756},
{'Counties': 'Barbour County, Alabama','Numbers': 33445},
{'Counties': 'Santa Cruz County, Arizona','Numbers': 447},
{'Counties': 'Navajo County, Arizona','Numbers': 1500},
{'Counties': 'Denver County, Colorado','Numbers': 67990}
])
我已经使用sort
和split
代码进行了尝试,但没有给出期望的输出
df_test['Counties'] = df_test['Counties'].apply(lambda x: ','.join(sorted(x.split(','))))
应该怎么做?请帮忙。谢谢!
答案 0 :(得分:1)
的解决方法可能是:
df = pd.DataFrame(
[
{"Counties": "Yabucoa Municipio, Puerto Rico", "Numbers": 7766},
{"Counties": "Marion County, West Virginia", "Numbers": 8756},
{"Counties": "Barbour County, Alabama", "Numbers": 33445},
{"Counties": "Santa Cruz County, Arizona", "Numbers": 447},
{"Counties": "Navajo County, Alabama", "Numbers": 1500},
{"Counties": "Denver County, Colorado", "Numbers": 67990},
]
)
然后创建密钥以通过以下方式重新排序:
re_order_key = (
df["Counties"]
.str.split(",", expand=True)
.rename(columns={0: "county", 1: "state"})
.sort_values(by=["state", "county"])
)
将此索引与iloc一起使用:
df.iloc[re_order.index, :].reset_index(drop=True)
给出:
Counties Numbers
0 Barbour County, Alabama 33445
1 Navajo County, Alabama 1500
2 Santa Cruz County, Arizona 447
3 Denver County, Colorado 67990
4 Yabucoa Municipio, Puerto Rico 7766
5 Marion County, West Virginia 8756