I have a file that has multiple sets of data separated by rows. It looks something like:
country1
0.9
1.3
2.9
1.1
...
country2
4.1
3.1
0.2
...
I would like to use Pandas to read the whole file into multiple dataframes, where each dataframe corresponds to a country. Is there any easy way to do this? Each country has a different number of entries.
答案 0 :(得分:6)
You can create mask
by to_numeric
with errors='coerce'
, so get NaN
where are column names. Then find them by isnull
and create groups by cumsum
:
import pandas as pd
import io
temp=u"""country1
0.9
1.3
2.9
1.1
country2
4.1
3.1
0.2"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), index_col=None, header=None)
print (df)
0
0 country1
1 0.9
2 1.3
3 2.9
4 1.1
5 country2
6 4.1
7 3.1
8 0.2
mask = pd.to_numeric(df.iloc[:,0], errors='coerce').isnull().cumsum()
print (mask)
0 1
1 1
2 1
3 1
4 1
5 2
6 2
7 2
8 2
Name: 0, dtype: int32
Last use list comprehension
for list of dataframes
:
dfs = [g[1:].rename(columns={0:g.iloc[0].values[0]}) for i, g in df.groupby(mask)]
print (dfs)
print (dfs[0])
country1
1 0.9
2 1.3
3 2.9
4 1.1
print (dfs[1])
country2
6 4.1
7 3.1
8 0.2
If need reset index
:
dfs = [g[1:].rename(columns={0:g.iloc[0].values[0]}).reset_index(drop=True) for i, g in df.groupby(mask)]
print (dfs)
print (dfs[0])
country1
0 0.9
1 1.3
2 2.9
3 1.1
print (dfs[1])
country2
0 4.1
1 3.1
2 0.2
答案 1 :(得分:1)
Pandas supports standard file formats like csv and json, and this is not one of those. I'm going to assume reformatting the file by hand is a waste of time and suggest you parse the file yourself using with open(...) as f:
and f.readlines()
into python objects.
Say you've done that, and the format looks like data
, then from_dict()
should work:
data = { "countryName1": [0.9, 1.3, ...], "countryName2": [...]}
df = pd.DataFrame.from_dict(data)