我正在尝试根据范围内的数据框创建列表。
这是我的字符串列:
df['ID'] =['' ,'2','4', '','8', '','16-18','25', '30-31']
#spaces with no values represent null
我想创建这样的输出:
df['ID'] = [' ', 'ID 2', 'ID 4', 'ID 8',' ', ['ID 16','ID 17', 'ID 18'],
'ID 25',['ID 30','ID 31']]
有人可以帮忙吗?
答案 0 :(得分:0)
IIUC
df.ID.str.split('-').apply(lambda x : x[0] if len(x)<=1 else list(range(int(x[0]),int(x[1])+1)))
Out[182]:
0
1 2
2 4
3
4 8
5
6 [16, 17, 18]
7 25
8 [30, 31]
Name: ID, dtype: object
答案 1 :(得分:0)
df = pd.DataFrame()
df['ID'] =[ np.nan, 2,4, np.nan,8, np.nan ,'16-18',25, '30-31']
然后首先为范围构建"ID"
s = df.ID.str.split("-")
s2 = s[s.notna()].apply(lambda x: ("ID "+pd.Series(list(range(int(x[0]), int(x[1])+1))).astype(str)).tolist())
,然后是常规情况(NaN和范围除外)
non_na = df.ID[df.ID.notna()]
non_na_range = non_na[~non_na.index.isin(s2.index)]
s3 = "ID " + non_na_range.astype(str)
然后分配
df.loc[s2.index, "ID"] = s2
df.loc[s3.index, "ID"] = s3
输出
ID
0 NaN
1 ID 2
2 ID 4
3 NaN
4 ID 8
5 NaN
6 [ID 16, ID 17, ID 18]
7 ID 25
8 [ID 30, ID 31]