我想解压缩一个数据框,该数据框包含嵌套在每列词典中的可变数量的“ productID”。 表格示例:
library(tidyverse)
df <- tibble(UserID = rep(c("A", "B", "C"), each = 9L),
Job = as.integer(c(NA,0,1,0,NA,1,0,1,0,
1,0,1,0,1,0,NA,1,0,
NA,0,1,NA,0,1,0,1,NA)))
df %>%
group_by(UserID) %>%
mutate(Pattern = case_when(
Job == 0 & lead(Job) == 1 & lead(Job, 2) == 0 ~ 1,
Job == 0 & lag(Job) == 1 & lag(Job, 2) == 0 ~ 1,
Job == 1 & lead(Job) == 0 & lag(Job) == 0 ~ 1,
TRUE ~ 0
))
#> # A tibble: 27 x 3
#> # Groups: UserID [3]
#> UserID Job Pattern
#> <chr> <int> <dbl>
#> 1 A NA 0
#> 2 A 0 1
#> 3 A 1 1
#> 4 A 0 1
#> 5 A NA 0
#> 6 A 1 0
#> 7 A 0 1
#> 8 A 1 1
#> 9 A 0 1
#> 10 B 1 0
#> # … with 17 more rows
我尝试使用
遍历dfawardedProducts
0 []
1 [{'productID': 14306}]
2 []
3 []
4 []
5 []
6 []
7 [{'productID': 60974}, {'productID': 72961}]
8 [{'productID': 78818}, {'productID': 86765}]
9 [{'productID': 155707}]
10 [{'productID': 54405}, {'productID': 69562}, {...
我想最后得到一个单列数据框,或者列出所有列出的productID。 EG:
df = []
for row, index in activeTitles.iterrows():
df.append(index[0])
答案 0 :(得分:1)
自there is no flatmap operation in Pandas起,您可以执行以下操作:
import pandas as pd
data = pd.Series([[], [{'productID': 14306}], [], [], [], [], [],
[{'productID': 60974}, {'productID': 72961}],
[{'productID': 78818}, {'productID': 86765}],
[{'productID': 155707}], [{'productID': 54405}, {'productID': 69562}]])
products = (data.apply(pd.Series).unstack().dropna()
.apply(lambda p: p['productID']).reset_index(drop=True))
print(products)
# 0 14306
# 1 60974
# 2 72961
# 3 78818
# 4 86765
# 5 155707
# 6 54405
# 7 69562
# dtype: int64
答案 1 :(得分:1)
很高兴在0.25.0上共享新版本的熊猫'explode
s=data.explode().str.get('productID').dropna()
s
Out[91]:
1 14306.0
7 60974.0
7 72961.0
8 78818.0
8 86765.0
9 155707.0
10 54405.0
10 69562.0
dtype: float64
为那些不想更新pandas
的人共享function
unnesting(data.to_frame('pid'),['pid'],1)['pid'].str.get('productID').dropna()
Out[18]:
1 14306
7 60974
7 72961
8 78818
8 86765
9 155707
10 54405
10 69562
Name: pid, dtype: int64
def unnesting(df, explode, axis):
if axis==1:
idx = df.index.repeat(df[explode[0]].str.len())
df1 = pd.concat([
pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
df1.index = idx
return df1.join(df.drop(explode, 1), how='left')
else :
df1 = pd.concat([
pd.DataFrame(df[x].tolist(), index=df.index).add_prefix(x) for x in explode], axis=1)
return df1.join(df.drop(explode, 1), how='left')