Question

我有一个从csv读取的数据框。

          time  node txrx  src  dest  txid  hops
0     34355146     2   TX    2     1     1   NaN
1     34373907     1   RX    2     1     1   1.0
2     44284813     2   TX    2     1     2   NaN
3     44302557     1   RX    2     1     2   1.0
4     44596500     3   TX    3     1     2   NaN
5     44630682     1   RX    3     1     2   2.0
6     50058251     2   TX    2     1     3   NaN
7     50075994     1   RX    2     1     3   1.0
8     51338658     3   TX    3     1     3   NaN
9     51382629     1   RX    3     1     3   2.0

我需要能够创建一个新的数据帧，它接受TX / RX行中的值，为每对创建一行：

从“时间”中抽出时间＆＃39;柱。如果＆＃39; txrx＆＃39;中的值是＆＃34; TX＆＃34;然后将其放入＆＃39; tx_time＆＃39; col，如果值为＆＃34; RX＆＃34;然后将值放入＆＃39; rx_time＆＃39; col（在新数据帧的行内）。
＆＃39;啤酒花的价值＆＃39;取自RX行。
这是针对每个[＆＃39; src＆＃39;，＆＃39; dest＆＃39;，＆＃39; txid＆＃39;]组完成的。
＆＃39;节点＆＃39;列被忽略。

df应如下所示：

      tx_time  rx_time  src  dest  txid  hops
0    34355146 34373907    2     1     1     1
1    44284813 44302557    2     1     2     1
2    44596500 44630682    3     1     2     2
3    50058251 50075994    2     1     3     1
4    51338658 51382629    3     1     3     2

我理解如何执行步骤（3），但我对如何尝试（1）和（2）有点困惑。建议？

Answer 1

使用const images = document.querySelector('#images') const imageSrc = images.getAttribute('src'); function createImageHtml() { return `<div>${imageSrc}</div>`; }

pivot_table

或使用df.bfill().pivot_table(index=['src','dest','txid','hops'],columns=['txrx'],values='time').reset_index() Out[766]: txrx src dest txid hops RX TX 0 2 1 1 1.0 34373907 34355146 1 2 1 2 1.0 44302557 44284813 2 2 1 3 1.0 50075994 50058251 3 3 1 2 2.0 44630682 44596500 4 3 1 3 2.0 51382629 51338658

unstack

PS：使用df.bfill().set_index(['src','dest','txid','hops','txrx']).time.unstack(-1).reset_index() Out[768]: txrx src dest txid hops RX TX 0 2 1 1 1.0 34373907 34355146 1 2 1 2 1.0 44302557 44284813 2 2 1 3 1.0 50075994 50058251 3 3 1 2 2.0 44630682 44596500 4 3 1 3 2.0 51382629 51338658重命名我没有在这里添加，因为这会使代码太长......

Answer 2

defaultdict方法
对于OP来说，这实际上可能更快如果速度很重要，请检查。 YMMV。

from collections import defaultdict

d = defaultdict(lambda: defaultdict(dict))
cols = 'tx_time  rx_time  src  dest  txid  hops'.split()

for t in df.itertuples():
    i = (t.src, t.dest, t.txid)
    d[t.txrx.lower() + '_time'][i] = t.time
    if pd.notnull(t.hops):
        d['hops'][i] = int(t.hops)

pd.DataFrame(d).rename_axis(['src', 'dest', 'txid']) \
  .reset_index().reindex_axis(cols, 1)

    tx_time   rx_time  src  dest  txid  hops
0  34355146  34373907    2     1     1     1
1  44284813  44302557    2     1     2     1
2  50058251  50075994    2     1     3     1
3  44596500  44630682    3     1     2     2
4  51338658  51382629    3     1     3     2

Answer 3

使用concat虽然我认为@Wen的使用pivot的解决方案会更有效率

df_tx = df[::2].reset_index().drop(['index', 'txrx', 'node'], axis = 1).rename(columns = {'time': 'tx_time'})
df_rx = df[1::2].reset_index().drop(['index', 'txrx', 'node'], axis = 1).rename(columns = {'time': 'rx_time'})

pd.concat([df_tx, df_rx ], axis = 1).T.drop_duplicates().T.dropna(1)

你得到了

    tx_time     src dest    txid    rx_time     hops
0   34355146.0  2.0 1.0     1.0     34373907.0  1.0
1   44284813.0  2.0 1.0     2.0     44302557.0  1.0
2   44596500.0  3.0 1.0     2.0     44630682.0  2.0
3   50058251.0  2.0 1.0     3.0     50075994.0  1.0
4   51338658.0  3.0 1.0     3.0     51382629.0  2.0

将条件行数据组合到新数据帧中

3 个答案: