我有以下数据框(可能会在行和“信息”列中增加):
City Country Info1 Info2
BCN Spain 3 5.6
Moscow Russia 4 7
我正尝试按以下方式拆分信息:
[
{Info1: 3,
City: BCN,
Country: Spain},
{Info2: 5.6,
City: BCN,
Country: Spain},
{Info1: 4,
City: Moscow,
Country: Russia},
{Info2: 7,
City: Moscow,
Country: Russia}
]
这有效:
import pandas as pd
dict = {'city':["BCN", "Moscow"],
'country': ["Spain", "Russia"],
'inf_1':[3, 5],
'inf_2':[4,7]}
#we make the dict a dataframe
df = pd.DataFrame(dict)
# We make a list of the indicators
columns = list(df)[2:]
j=0
i=0
for rows in df.itertuples():
for col in columns:
print(" ")
print("city: " + str(rows.city) )
print("country: " + str(rows.country))
print("ind_id: "+ str(columns[j]))
print("value: "+ str(df[col][i]))
print(" ")
j=j+1
j=0
i=i+1
但是,这个结果对我来说似乎并不美好。由于我刚接触Pandas,是否有办法制作出更优雅的代码来获得相同的结果?
答案 0 :(得分:3)
如果您可以在输出中进行一些细微调整,则可以直接使用melt
和to_dict
来获取每个信息的单独词典:
>>> df.melt(['City', 'Country']).to_dict('r')
[{'City': 'BCN', 'Country': 'Spain', 'value': 3.0, 'variable': 'Info1'},
{'City': 'Moscow', 'Country': 'Russia', 'value': 4.0, 'variable': 'Info1'},
{'City': 'BCN', 'Country': 'Spain', 'value': 5.6, 'variable': 'Info2'},
{'City': 'Moscow', 'Country': 'Russia', 'value': 7.0, 'variable': 'Info2'}]
答案 1 :(得分:0)
对于非特定于熊猫的解决方案,此split_rows
函数可用于任何可重复的namedtuple(或者如果更改rd = ...
行,则可以指定任何内容)。
import pandas as pd
def split_rows(namedtuple_iterable, cols):
for row in namedtuple_iterable:
rd = row._asdict()
cs = [(col, rd.pop(col)) for col in cols]
for key, value in cs:
yield {**rd, key: value}
df = pd.DataFrame(
{
"city": ["BCN", "Moscow"],
"country": ["Spain", "Russia"],
"inf_1": [3, 5],
"inf_2": [4, 7],
}
)
for sr in split_rows(df.itertuples(), ("inf_1", "inf_2")):
print(sr)
输出
{'Index': 0, 'city': 'BCN', 'country': 'Spain', 'inf_1': 3}
{'Index': 0, 'city': 'BCN', 'country': 'Spain', 'inf_2': 4}
{'Index': 1, 'city': 'Moscow', 'country': 'Russia', 'inf_1': 5}
{'Index': 1, 'city': 'Moscow', 'country': 'Russia', 'inf_2': 7}