CSV文件:(sample1.csv)
Location_City, Location_State, Name, hobbies
Los Angeles, CA, John, "['Music', 'Running']"
Texas, TX, Jack, "['Swimming', 'Trekking']"
我想将CSV的兴趣爱好列转换为以下输出
Location_City, Location_State, Name, hobbies
Los Angeles, CA, John, Music
Los Angeles, CA, John, Running
Texas, TX, Jack, Swimming
Texas, TX, Jack, Trekking
我已将csv读入dataframe
,但不知道如何转换?
data = pd.read_csv("sample1.csv")
df=pd.DataFrame(data)
df
答案 0 :(得分:1)
您可以使用findall
或extractall
从hobbies
列中获取列表,然后用chain.from_iterable
展平并重复另一列:
a = df['hobbies'].str.findall("'(.*?)'").astype(np.object)
lens = a.str.len()
from itertools import chain
df1 = pd.DataFrame({
'Location_City' : df['Location_City'].values.repeat(lens),
'Location_State' : df['Location_State'].values.repeat(lens),
'Name' : df['Name'].values.repeat(lens),
'hobbies' : list(chain.from_iterable(a.tolist())),
})
或创建Series
,删除第一级并将join
移至原始DataFrame
:
df1 = (df.join(df.pop('hobbies').str.extractall("'(.*?)'")[0]
.reset_index(level=1, drop=True)
.rename('hobbies'))
.reset_index(drop=True))
print (df1)
Location_City Location_State Name hobbies
0 Los Angeles CA John Music
1 Los Angeles CA John Running
2 Texas TX Jack Swimming
3 Texas TX Jack Trekking
答案 1 :(得分:0)
我们可以使用 pandas.DataFrame.explode
版本中引入的 0.25.0
函数解决这个问题,如果您有相同或更高版本,您可以使用以下代码。
爆炸函数参考:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.explode.html
import pandas as pd
import ast
data = {
'Location_City': ['Los Angeles','Texas'],
'Location_State': ['CA','TX'],
'Name': ['John','Jack'],
'hobbies': ["['Music', 'Running']", "['Swimming', 'Trekking']"]
}
df = pd.DataFrame(data)
# Converting a string representation of a list into an actual list object
list_eval = lambda x: ast.literal_eval(x)
df['hobbies'] = df['hobbies'].apply(list_eval)
# Exploding the list
df = df.explode('hobbies')
print(df)
Location_City Location_State Name hobbies
0 Los Angeles CA John Music
0 Los Angeles CA John Running
1 Texas TX Jack Swimming
1 Texas TX Jack Trekking