我从API请求中收到以下响应:
<movies>
<movie>
<rating>5</rating>
<name>star wars</name>
</movie>
<movie>
<rating>8</rating>
<name>jurassic park</name>
</movie>
</movies>
有没有办法获取这些信息并获得评级和名称值并存储在熊猫系列中?
最终结果如下:
Movie Rating
5 - star Wars
8 - Jurassic park
您会注意到我已经采用了我在响应中找到的每个值,并将它们添加到一栏中。例如,我想将5个串联的“-”和“星球大战”加在一起。
答案 0 :(得分:1)
这是您要找的东西吗?我已经在代码中逐步解释了。有一部分我不知道该怎么做,但是我研究并弄清楚了。
import pandas as pd
import numpy as np
df = pd.DataFrame({'Data' : ['<movies>','<movie>','<rating>5</rating>',
'<name>star wars</name>', '</movie>',
'<rating>8</rating>', '<name>jurassic park</name>',
'</movie>', '</movies>']})
#Filter for the relevant rows of data based upon the logic of the pattern. I have also
#done an optional reset of the index.
df = df.loc[df['Data'].str.contains('>.*<', regex=True)].reset_index(drop=True)
#For the rows we just filtered for, get rid of the irrelevant data with some regex
#string manipulation
df['Data'] = df['Data'].str.findall('>.*<').str[0].replace(['>','<'], '', regex=True)
#Use join with shift and add_suffix CREDIT to @joelostblom:
#https://stackoverflow.com/questions/47450259/merge-row-with-next-row-in-dataframe-
#pandas
df = df.add_suffix('1').join(df.shift(-1).add_suffix('2'))
#Filter for numeric rows only
df = df.loc[df['Data1'].str.isnumeric() == True]
#Combine Columns with desired format
df['Movie Rating'] = df['Data1'] + ' - ' + df['Data2']
#Filter for only relevant column and print dataframe
df = df[['Movie Rating']]
print(df)