info.csv
数据如下:
device_id,upload_time,latitude,longitude,mileage,other_vals,speed,upload_time_add_8hour,upload_time_year_month,car_id,car_type,car_num,marketer_name
1234567890123,2020-09-27 02:41:02+00:00,38.01946,114.425888,0,,0,2020/9/27 10:41,202009,,,12,
17100000001,2020-09-27 02:41:01+00:00,38.01946,114.425888,0,,0,2020/9/27 10:41,202009,,,12345,
17200000002,2020-09-25 13:46:38+00:00,38.01946,114.425888,0,,0,2020/9/25 21:46,202009,,,123456,
14111111111,2020-09-25 11:18:54+00:00,38.01946,114.425888,0,,0,2020/9/25 19:18,202009,,,12121212,
57c1e18249727a0b,2020-09-25 11:18:42+00:00,38.01946,114.425888,0,,0,2020/9/25 19:18,202009,,,,
57c1e18249727a0b,2020-09-23 10:16:55+00:00,38.01946,114.425888,0.055559317,,0,2020/9/23 18:16,202009,,,,
57c1e18249727a0b,2020-09-23 10:16:15+00:00,38.01946,114.425888,0.055559317,,0,2020/9/23 18:16,202009,,,,
57c1e18249727a0b,2020-09-23 10:15:35+00:00,38.01946,114.425888,0.055559317,,0,2020/9/23 18:15,202009,,,,
57c1e18249727a0b,2020-09-23 10:15:04+00:00,38.01946,114.425888,0.055559317,,0,2020/9/23 18:15,202009,,,,
57c1e18249727a0b,2020-09-23 10:14:55+00:00,38.01946,114.425888,0.055559317,,3.304916399,2020/9/23 18:14,202009,,,,
我使用此代码将数据帧拆分为子数据帧。
import pandas as pd
df = pd.read_csv(r'info.csv', encoding='utf-8')
df_1 = df[df['device_id'].astype(str).map(len) !=11]
df_2 = df[df['device_id'].astype(str).map(len)==11 & df['device_id'].astype(str).startswith('17')]#device_id start with 17
df_3 = df[df['device_id'].astype(str).map(len)==11 & ~df['device_id'].astype(str).startswith('17')] #device_id doesn't start with 17
df = df[pd.notnull(df['car_num'])]
print(len(df_1))
print(len(df_2))
print(len(df_3))
但是错误消息是:
AttributeError:“系列”对象没有属性“ startswith”
如何解决?
答案 0 :(得分:2)
使用.str.startswith
:
df_2 = df[df['device_id'].astype(str).map(len)==11 & df['device_id'].astype(str).str.startswith('17')]#device_id start with 17
一起-您可以避免为每个条件强制转换为字符串,可以将输出分配给辅助变量:
s = df['device_id'].astype(str)
lens = s.str.len()
df_1 = df[(lens!=11)]
df_2 = df[(lens==11) & s.str.startswith('17')]#device_id start with 17
df_3 = df[(lens==11) & ~s.str.startswith('17')] #device_id doesn't start with 17
df = df[df['car_num'].notna()]
答案 1 :(得分:1)
Pandas系列没有名为startswith
的属性。
根据{{3}},应该是Pandas.Series.str.startswith
。
使用.startswith('17')
代替使用.str.startswith('17')
。
import pandas as pd
df = pd.read_csv(r'info.csv', encoding = "utf-8")
df_1 = df[df['device_id'].astype(str).map(len) !=11]
df_2 = df[df['device_id'].astype(str).map(len)==11 & df['device_id'].astype(str).str.startswith('17')]#device_id start with 17
df_3 = df[df['device_id'].astype(str).map(len)==11 & ~df['device_id'].astype(str).str.startswith('17')] #device_id doesn't start with 17
df = df[pd.notnull(df['car_num'])]
print(len(df_1))
print(len(df_2))
print(len(df_3))