我有一个具有这种结构的DataFrame [1],我想将字符串和整数列相乘。
+----------------------+------------+-------------------------+-----------+--+
| url | date | word | mentioned | |
|----------------------+------------+-------------------------+-----------+--+
| newspaperarticle.com | 2018-12-22 | [canada,house,micheal] | [2,2,1] | |
| articleUSA.com | 2018-12-23 | [new york,murder,angry] | [2,3,1] | |
+----------------------+------------+-------------------------+-----------+-
我想要列名中的单词数乘以
+----------------------+------------+-------------------------+-------+---+--+
| url | date | word |mentioned
|----------------------+------------+-------------------------+-------+---+--+
| newspaperarticle.com | 2018-12-22 | [canada,canada,house,..] |[2,2,1]
| articleUSA.com | 2018-12-23 | [new york,new york,murder,..] |[2,3,1]
+----------------------+------------+-------------------------+-------+---+--+
到目前为止,我所做的是用不起作用的乘法方法将列相乘。我还尝试了for循环,对单个元素建立索引并将它们相乘,但始终使错误字符串脱离索引。
答案 0 :(得分:3)
您可以explode
并使用series.repeat
,将聚合作为级别= 0上的列表:
s = [df[i].explode() for i in ['word','mentioned']]
df['word'] = s[0].repeat(s[1]).groupby(level=0).agg(list)
print(df)
url date \
0 newspaperarticle.com 2018-12-22
1 articleUSA.com 2018-12-23
word mentioned
0 [canada, canada, house, house, micheal] [2, 2, 1]
1 [new york, new york, murder, murder, murder, a... [2, 3, 1]
注意:这是假设word
和mentioned
列是一系列列表,而不是列表的字符串表示形式。