合并python pandas中两个不同大小的数据帧

时间:2017-06-26 13:19:53

标签: python pandas

我有这样的数据框: DF1:

Id   name Checksum
2001  A   e882
2002  B   2884
2002  C   ee12,ee84
2003  D   ee23
2004  E   ee42,ee43
2006  F   2884,2993,3884,3855
2006  G   344,122,288

我想要这样的输出:

Id   name Checksum
2001  A   e882
2002  B   2884
2002  C   ee12
2002  C   ee84
2003  D   ee23
2004  E   ee42
2004  E   ee43
2006  F   2884
2006  F   2993
2006  F   3884
2006  F   3855
2006  G   344
2006  G   122
2006  G   288

我想像上面那样创建新的数据框

我如何在python pandas中做到这一点?

1 个答案:

答案 0 :(得分:2)

您可以list使用length,然后DataFrame获取constructor

上次使用str.splitlen s = df['Checksum'].str.split(',') print (s) 0 [e882] 1 [2884] 2 [ee12, ee84] 3 [ee23] 4 [ee42, ee43] 5 [2884, 2993, 3884, 3855] 6 [344, 122, 288] Name: Checksum, dtype: object l = s.str.len() print (l) 0 1 1 1 2 2 3 1 4 2 5 4 6 3 Name: Checksum, dtype: int64 cols = ['Id','name'] df = pd.DataFrame({x : np.repeat(df[x].values, l) for x in cols}) df['Checksum'] = np.concatenate(s) df = df.reindex_axis(df.columns, axis=1) print (df) Id name Checksum 0 2001 A e882 1 2002 B 2884 2 2002 C ee12 3 2002 C ee84 4 2003 D ee23 5 2004 E ee42 6 2004 E ee43 7 2006 F 2884 8 2006 F 2993 9 2006 F 3884 10 2006 F 3855 11 2006 G 344 12 2006 G 122 13 2006 G 288 创建新的chain.from_iterable

from  itertools import chain

s = df['Checksum'].str.split(',')
l = s.str.len()
cols = ['Id','name']
df = pd.DataFrame({x : np.repeat(df[x].values, l) for x in cols})
df['Checksum'] = list(chain.from_iterable(s))
df = df.reindex_axis(df.columns, axis=1)
print (df)
         Id name Checksum
0   2001    A     e882
1   2002    B     2884
2   2002    C     ee12
3   2002    C     ee84
4   2003    D     ee23
5   2004    E     ee42
6   2004    E     ee43
7   2006    F     2884
8   2006    F     2993
9   2006    F     3884
10  2006    F     3855
11  2006    G      344
12  2006    G      122
13  2006    G      288
maven clean install

<dependency> <groupId>api</groupId> <artifactId>api</artifactId> <version>1.0</version> </dependency>

的交替
select * from t where REGEXP_LIKE (col , '(^|,)4(,|$)')