我有这样的数据框: DF1:
Id name Checksum
2001 A e882
2002 B 2884
2002 C ee12,ee84
2003 D ee23
2004 E ee42,ee43
2006 F 2884,2993,3884,3855
2006 G 344,122,288
我想要这样的输出:
Id name Checksum
2001 A e882
2002 B 2884
2002 C ee12
2002 C ee84
2003 D ee23
2004 E ee42
2004 E ee43
2006 F 2884
2006 F 2993
2006 F 3884
2006 F 3855
2006 G 344
2006 G 122
2006 G 288
我想像上面那样创建新的数据框
我如何在python pandas中做到这一点?
答案 0 :(得分:2)
您可以list
使用length
,然后DataFrame
获取constructor
。
上次使用str.split
和len
s = df['Checksum'].str.split(',')
print (s)
0 [e882]
1 [2884]
2 [ee12, ee84]
3 [ee23]
4 [ee42, ee43]
5 [2884, 2993, 3884, 3855]
6 [344, 122, 288]
Name: Checksum, dtype: object
l = s.str.len()
print (l)
0 1
1 1
2 2
3 1
4 2
5 4
6 3
Name: Checksum, dtype: int64
cols = ['Id','name']
df = pd.DataFrame({x : np.repeat(df[x].values, l) for x in cols})
df['Checksum'] = np.concatenate(s)
df = df.reindex_axis(df.columns, axis=1)
print (df)
Id name Checksum
0 2001 A e882
1 2002 B 2884
2 2002 C ee12
3 2002 C ee84
4 2003 D ee23
5 2004 E ee42
6 2004 E ee43
7 2006 F 2884
8 2006 F 2993
9 2006 F 3884
10 2006 F 3855
11 2006 G 344
12 2006 G 122
13 2006 G 288
创建新的chain.from_iterable
:
from itertools import chain
s = df['Checksum'].str.split(',')
l = s.str.len()
cols = ['Id','name']
df = pd.DataFrame({x : np.repeat(df[x].values, l) for x in cols})
df['Checksum'] = list(chain.from_iterable(s))
df = df.reindex_axis(df.columns, axis=1)
print (df)
Id name Checksum
0 2001 A e882
1 2002 B 2884
2 2002 C ee12
3 2002 C ee84
4 2003 D ee23
5 2004 E ee42
6 2004 E ee43
7 2006 F 2884
8 2006 F 2993
9 2006 F 3884
10 2006 F 3855
11 2006 G 344
12 2006 G 122
13 2006 G 288
maven clean install
与<dependency>
<groupId>api</groupId>
<artifactId>api</artifactId>
<version>1.0</version>
</dependency>
:
select * from t where REGEXP_LIKE (col , '(^|,)4(,|$)')