Question

print df

   id         data number       text
0   0            5      5        one
0   1            4      4        one
0   2          2.1    2.1        one
0   3        4.000  4.000        one
0   4        0.100  0.100        one
1   4     buytype2      2    buytype
2   4   selltype29     29   selltype
0   5   selltype24     24   selltype
1   5   feetype0.2    0.2    feetype
2   5      nonfee2      2     nonfee
0   6  buytypenone   NONE    buytype
1   6   sellagent3      3  sellagent
2   6    buyagent4      4   buyagent

print df.pivot(index='id', columns='text', values='number').fillna(0)

text buyagent buytype feetype nonfee    one sellagent selltype
id                                                            
0           0       0       0      0      5         0        0
1           0       0       0      0      4         0        0
2           0       0       0      0    2.1         0        0
3           0       0       0      0  4.000         0        0
4           0       2       0      0  0.100         0       29
5           0       0     0.2      2      0         0       24
6           4    NONE       0      0      0         3        0

如果有标签和/或空格，如何在行的开头获取单词？

#!/usr/bin/python
import subprocess as sp
args = ["awk", r'/^word/ { print $1}','anyfile.py' ]
p = sp.Popen(args, stdin = sp.PIPE, stdout = sp.PIPE, stderr = sp.PIPE )

Answer 1

您可以简单地使用正则表达式，例如

import re
re.match(r'^\s*word', line)

下面，

^表示字符串的开头
\s*表示零个或多个空格字符
word是您要查找的实际字词。

Answer 2

使用startswith -

后使用strip怎么样？

>>>'\n  \t  \r  asasasas ash'.strip().startswith('asa')
>>>True

Answer 3

您可以调整awk使用的搜索模式以接受输入文件中的前导空白字符：

import subprocess as sp
args = ["awk", r'/^\s*word/ { print $1}','anyfile.py' ]
p = sp.Popen(args, stdin = sp.PIPE, stdout = sp.PIPE, stderr = sp.PIPE )
print p.stdout.read()

但在这种情况下，我不明白为什么不直接在Python中执行解析：

with open("anyfile.py") as f:
    for line in f:
        if line.lstrip().startswith("word"):
            print "found match!"

供参考：

Answer 4

仅使用内置字符串方法，它就像：

一样简单

str(p.stdout.read()).split()[0]

这应该授予你字符串中的第一个单词。

Answer 5

听起来你真正想做的是使用ast模块解析python文件的抽象synatax树。与使用正则表达式相比，有更好的方法。这是一个例子：

import ast

class FunctionVisitor(ast.NodeVisitor):

    def __init__(self):
        self.second_arg_names = []

    def visit_FunctionDef(self, func):
        """collect function names and the second argument of functions that 
        have two or more arguments"""
        args = func.args.args
        if len(args) > 1:
            self.second_arg_names.append((func.name, args[1].id))
        self.generic_visit(func)

def find_func_args(filename=__file__):
    """defaults to looking at this file"""
    with open(filename) as f:
        source = f.read()
    tree = ast.parse(source)

    visitor = FunctionVisitor()
    visitor.visit(tree)

    print visitor.second_arg_names
    assert visitor.second_arg_names == [("visit_FunctionDef", "func")]

if __name__ == "__main__":
    find_func_args()

如何使用制表符或空格来检查字符串是否在行的开头？

5 个答案: