Question

嗯，我知道gnuplot不是数据处理系统，而是绘图软件。但无论如何......

在python-pandas中，我可以通过传递regex to dataframe来选择多个列。 df.filter( regex = '\.x$' )将返回名为＆＃39; sw0.x＆＃39;，＆＃39; sw1.x＆＃39;然后我可以总结并绘制它们。

最近我已经转移到pgfplots（乳胶），我在大数据集上广泛使用gnuplot和pgfplots。很多时候我需要绘制与给定正则表达式匹配的许多列的总和。我想做plot 'data.csv' SUM("\.x$") every 100 with line之类的事情，其中function / macro / what SUM接受正则表达式并返回适当列的总和。

Answer 1

在这种情况下，很可能需要＆＃34;外包＆＃34;这个处理部分给了熊猫。例如，如果您创建脚本filter.py，例如：

#!/usr/bin/env python
import pandas as pd
import sys

df = pd.read_csv(sys.argv[1], sep = ',', header = 0)
s = df.filter(regex='\.x$', axis = 1).sum(axis = 1)
s.to_csv(sys.stdout, sep = '\t')

然后你可以重复使用＆＃34;它在Gnuplot中：

plot "<python filter.py data.csv" w lp

Answer 2

gnuplot 不支持正则表达式，但在某些情况下，您可以通过定义合适的函数来获得类似的功能。 @Dilawar，您没有提供太多有关数据的详细信息。我假设分隔符是空格。正如@ewcz 所写，您始终可以使用外部工具将数据（预）处理为 gnuplot 可以绘制的格式。但是，如果可能的话，如果它不会变得太复杂，为什么不使用 gnuplot 本身呢？

在您的情况下，如果列标题的末尾与某个字符串匹配，您会询问汇总列。为此，您可以简单地使用 strstrt()。检查 help strstrt 和下面的示例，当然可以进一步优化。

代码：

### select columns by matching end of columnheader
reset session

$Data <<EOD
ID sw0.x sw0.y sw0.z sw1.x sw1.y sw1.z
1    0.1   2.1   6.1   0.5   2.5   6.5
2    0.2   2.2   6.2   0.6   2.6   6.6
3    0.3   2.3   6.3   0.7   2.7   6.7
4    0.4   2.4   6.4   0.8   2.8   6.8
5    0.5   2.5   6.5   0.9   2.9   6.9
EOD

stats $Data u 0 nooutput  # get maximum number of columns
colMax = STATS_columns 

# get headers into a string
set table $Dummy
    myHeaders = ''
    plot for [i=1:colMax] $Data u \
        (myHeaders = myHeaders.' '.strcol(i),'') every ::0::0 w table
unset table
myHeader(i) = word(myHeaders,i)      # get the ith item of the header line

# match end of string 1=match, 0=no match
MatchEnd(s,m) = s[strlen(s)-strlen(m)+1:strlen(s)] eq m ? 1 : 0
# sum up the columns which match
SumUp(m) = sum [col=1:colMax] ( MatchEnd(myHeader(col),m) ? column(col) : 0 )

set key top left
plot for [i=2:colMax] $Data u 1:i w lp pt 6 ti columnhead, \
     $Data u 1:(SumUp(".x")) skip 1 w lp pt 7 ps 2 lc "red"   title "Sum up '.x'", \
     $Data u 1:(SumUp(".y")) skip 1 w lp pt 7 ps 2 lc "green" title "Sum up '.z'", \
     $Data u 1:(SumUp(".z")) skip 1 w lp pt 7 ps 2 lc "blue"  title "Sum up '.z'"
### end of code

结果：

由gnuplot中的regex过滤的所选列的总和

2 个答案: