我从抓取工具获取此数据,并希望将其转换为漂亮的数据框。
现在我拥有的是:
+-----------+------------------------------------------+------------+---------------------+
| HotelName | RoomType | RoomFloor | RoomPrice |
+-----------+------------------------------------------+------------+---------------------+
| Hotel1 | Standard,Standard,Standard,Deluxe,Deluxe | 10F,20F | 100,105,108,200,205 |
| Hotel2 | Standard,Standard,Deluxe,Deluxe,Grande | 30F,40F,50F| 90,95,250,240,300 |
+-----------+------------------------------------------+------------+---------------------+
我最终想要的是:
+-----------+----------+-----------+-----------+
| HotelName | RoomType | RoomFloor | RoomPrice |
+-----------+----------+-----------+-----------+
| Hotel1 | Standard | 10F | 100 |
| Hotel1 | Standard | 10F | 105 |
| Hotel1 | Standard | 10F | 108 |
| Hotel1 | Deluxe | 20F | 200 |
| Hotel1 | Deluxe | 20F | 205 |
| Hotel2 | Standard | 30F | 90 |
| Hotel2 | Standard | 30F | 95 |
| Hotel2 | Deluxe | 40F | 250 |
| Hotel2 | Deluxe | 40F | 240 |
| Hotel2 | Grande | 50F | 300 |
+-----------+----------+-----------+-----------+
我是Python的新手,我无法解决这个问题。有人可以帮忙吗?非常感谢!
答案 0 :(得分:0)
我认为循环会产生更多可读代码:
data = []
for idx, row in df.iterrows():
room_types = pd.Series(row['RoomType'].split(','))
room_floors = row['RoomFloor'].split(',')
room_prices = row['RoomPrice'].split(',')
mapping = dict(zip(room_types.unique(), room_floors))
room_floors = room_types.map(mapping)
for rm_type, rm_floor, rm_price in zip(room_types, room_floors, room_prices):
data.append((row['HotelName'], rm_type, rm_floor, rm_price))
pd.DataFrame(data, columns=['HotelName', 'RoomType', 'RoomFloor', 'RoomPrice'])
Out[56]:
HotelName RoomType RoomFloor RoomPrice
0 Hotel1 Standard 10F 100
1 Hotel1 Standard 10F 105
2 Hotel1 Standard 10F 108
3 Hotel1 Deluxe 20F 200
4 Hotel1 Deluxe 20F 205
5 Hotel2 Standard 30F 90
6 Hotel2 Standard 30F 95
7 Hotel2 Deluxe 40F 250
8 Hotel2 Deluxe 40F 240
9 Hotel2 Grande 50F 300
这会迭代DataFrame的行,并为每个酒店生成房间类型,房间楼层和房间价格的列表。 mapping = dict(zip(room_types.unique(), room_floors))
创建房间类型和房间楼层之间的映射。使用该映射,room_floors = room_types.map(mapping)
创建一个相等长度的列表。既然room_types
,room_floors
和room_prices
具有相同的长度,您可以迭代它们并将每条记录添加为元组。最后,最后一行将元组列表转换为整洁的DataFrame。
答案 1 :(得分:0)
解决方案是否单独定义RoomFloor
:
print (df)
HotelName RoomType RoomFloor \
0 Hotel1 Standard,Standard,Standard,Deluxe,Deluxe 10F,10F,10F,20F,20F
1 Hotel2 Standard,Standard,Deluxe,Deluxe,Grande 30F,30F,40F,40F,50F
RoomPrice
0 100,105,108,200,205
1 90,95,250,240,300
cols = ['RoomType','RoomFloor','RoomPrice']
a = df[cols].apply(lambda x: x.str.split(',', expand=True).stack()).reset_index(1, drop=True)
df = df.drop(cols, axis=1).join(a).reset_index(drop=True)
print (df)
HotelName RoomType RoomFloor RoomPrice
0 Hotel1 Standard 10F 100
1 Hotel1 Standard 10F 105
2 Hotel1 Standard 10F 108
3 Hotel1 Deluxe 20F 200
4 Hotel1 Deluxe 20F 205
5 Hotel2 Standard 30F 90
6 Hotel2 Standard 30F 95
7 Hotel2 Deluxe 40F 250
8 Hotel2 Deluxe 40F 240
9 Hotel2 Grande 50F 300
答案 2 :(得分:0)
我尝试重现一个DataFrame,我想它应该与发布的相同:
import pandas as pd
raw_data = {'HotelName': ['Hotel1', 'Hotel2'],
'RoomType': ['Standard,Standard,Standard,Deluxe,Deluxe', 'Standard,Standard,Deluxe,Deluxe,Grande'],
'RoomFloor': ['10F,20F', '30F,40F,50F'],
'RoomPrice': ['100,105,108,200,205', '90,95,250,240,300']}
data = pd.DataFrame(raw_data)
我猜模块' orderedset'可能会有所帮助,希望以下代码可以解决您的问题:
from ordered_set import OrderedSet # revise 'orderedset' to 'ordered_set'
cols_ordered = ['HotelName', 'RoomType', 'RoomFloor', 'RoomPrice']
data = data[cols_ordered]
data = data[['HotelName', 'RoomType', 'RoomFloor', 'RoomPrice']].applymap(lambda x: x.split(','))
dummies = data.applymap(lambda x: len(x)).apply(max, 1)
for i in range(len(data)):
room_type, room_floor = data[['RoomType', 'RoomFloor']].iloc[i]
type_floor_dict = dict(zip(OrderedSet(room_type), room_floor))
data['RoomFloor'].iloc[i] = [type_floor_dict[t] for t in room_type]
data['HotelName'].iloc[i] *= dummies[i]
new_data = [pd.DataFrame(data.loc[i].tolist(), index=cols_ordered).T for i in data.index]
new_data = pd.concat(new_data, ignore_index=True)
print(new_data)
答案 3 :(得分:-1)
我想出了这个解决方案请看看
def func(row):
dic = []
RoomType = row['RoomType'].split(",")
RoomPrice = row['RoomPrice'].split(",")
RoomFloor = row['RoomFloor'].split(",")
current_room_type = RoomType[0]
j = 0
for index, x in enumerate(RoomType):
if current_room_type != x:
j+=1
current_room_type = x
dic.append({"HotelName": row["HotelName"],"RoomType": x, "RoomPrice": RoomPrice[index], "RoomFloor": RoomFloor[j]})
return dic
print pd.DataFrame(df.apply(func, axis=1).sum())