我在数据框中有一个列,其中列出了在酒店位置找到的便利设施。我需要计算每行中有多少便利设施,然后将其转换为数字,然后再用这些数字创建另一列。
> airbnb$amenities[1:25]
[1] "{TV,Internet,Wifi,\"Air conditioning\",\"Paid parking off premises\",Breakfast,Heating,\"Smoke detector\",\"Carbon monoxide detector\",\"First aid kit\",\"Safety card\",\"Fire extinguisher\",Essentials,Shampoo,\"Lock on bedroom door\",\"24-hour check-in\",Hangers,\"Hair dryer\",Iron,\"Laptop friendly workspace\",\"translation missing: en.hosting_amenity_49\",\"translation missing: en.hosting_amenity_50\",\"Private entrance\",\"Hot water\",\"Patio or balcony\",\"Garden or backyard\",\"Luggage dropoff allowed\",\"Well-lit path to entrance\",\"Host greets you\"}"
[2] "{TV,Wifi,\"Air conditioning\",Kitchen,\"Pets live on this property\",Cat(s),\"Free street parking\",Heating,Washer,Dryer,\"Smoke detector\",Essentials,Shampoo,Hangers,\"Hair dryer\",Iron,\"Laptop friendly workspace\",\"Hot water\",\"Luggage dropoff allowed\",Other}"
[3] "{TV,\"Cable TV\",Wifi,\"Air conditioning\",Pool,Kitchen,\"Free parking on premises\",Breakfast,Elevator,\"Hot tub\",\"Buzzer/wireless intercom\",Heating,\"Family/kid friendly\",Washer,\"Smoke detector\",\"First aid kit\",Essentials,Shampoo,\"24-hour check-in\",Hangers,\"Hair dryer\",Iron,\"translation missing: en.hosting_amenity_50\"}"
[4] "{Internet,Wifi,Pool,Kitchen,\"Free street parking\",\"Buzzer/wireless intercom\",Heating,\"Smoke detector\",Essentials,Hangers,Iron,\"Hot water\",Microwave,\"Coffee maker\",Refrigerator,\"Dishes and silverware\",\"Cooking basics\",\"BBQ grill\",\"Garden or backyard\",\"Long term stays allowed\",\"Host greets you\"}"
[5] "{TV,Internet,Wifi,\"Air conditioning\",Kitchen,\"Paid parking off premises\",Elevator,\"Buzzer/wireless intercom\",Heating,Washer,Dryer,\"Smoke detector\",\"First aid kit\",\"Safety card\",Essentials,Shampoo,Hangers,\"Hair dryer\",Iron,\"Laptop friendly workspace\",\"Hot water\",Microwave,Refrigerator,Dishwasher,\"Dishes and silverware\",\"Cooking basics\",Oven,Stove,\"Long term stays allowed\",Other}"
我熟悉使用grep,gsub之类的东西,但是我对如何在每一行中计数感到困惑。我以为grep('[a-z]',airbnb $ amenities)可能会以某种方式计算每行中的模式,但仍然很困惑。
答案 0 :(得分:2)
一个选项是str_count
来计算定界符(,
),然后将其加1以获取n元语法词的数量
library(stringr)
airbnb$amenityCount <- str_count(airbnb$amenities, ",") + 1