读取python中缺少项目的列

时间:2015-12-12 15:50:00

标签: python

我有一个数据文件,其中一些行看起来像这样。数据是空间分隔的。但是空格不一样......

AAA  B      C    D E    F    G    H    I  J  
AAA  B      C    D E    F    G    H    I  J  
AAA  B      C    D E    F    G    H    I  J  

我用过

AAA,B,C,D,E,F,G,H,I = line.split()  

读取数据。

最近我获得的新数据有时缺少列D和/或I和/或J.
列类似于:

AAA  B    C    D E    F    G    H    I  J  
AAA  B    C      E    F    G    H       J  
AAA  B    C      E    F    G    H            

对我来说重要的所有数据都在B,E,F和G列中。我不能使用line.split(),因为左边的变量正在改变。可以重写脚本来读取所有输入数据的情况吗?有什么建议吗?

4 个答案:

答案 0 :(得分:1)

你可以使用熊猫'或者numpy的csv阅读能力:

import numpy as np
data = np.genfromtxt(
    'data.txt',
    missings_values=['-', ],
    names=['AAA', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']
)
print(data['AAA'])

或熊猫:

import pandas as pd
data = pd.read_csv(
    'data.txt', 
    sep='\S+',
    na_values='-',
    names=['AAA', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'],
)

print(data['AAA'])

答案 1 :(得分:1)

如果数据之间的空间量是固定的,而缺失的数据只是一个空格,那么你可以这样做:

>>> s="AAA    B    C         E    F    G    H         J  "
>>> s.split("    ")
['AAA', 'B', 'C', '', ' E', 'F', 'G', 'H', '', ' J  ']

修改

假设两个连续数据之间的空间在所有文件中都是常量,我给你这个

将此文件作为示例: missing.txt

AAA  B      C    D E    F    G    H    I  J  
AAA  B      C    D E    F    G    H    I  J  
AAA  B      C      E    F    G    H       J  
AAA  B      C      E    F    G    H 

100  2      3    4 5    6    7    8    9  10 
100  2      3      5    6    7    8    9  10 
100  2      3      5    6    7    8       10 
100  2      3      5    6    7    8        

100.1  2.1      3.1    4.1 5.1    6.1    7.1    8.1    9.1  10.1 
100.1  2.1      3.1      5.1    6.1    7.1    8.1    9.1  10.1 
100.1  2.1      3.1      5.1    6.1    7.1    8.1       10.1 
100.1  2.1      3.1      5.1    6.1    7.1    8.1         

hello  this      is    a example    of    a    normal    file  right?
hello  this      is      example    of    a    normal    file  right?
hello  this      is      example    of    a    normal       right?
hello  this      is      example    of    a    normal        

并使用此功能

def read_data_line(path_file, data_size=10, line_format=None, temp_char="@", ignore=True):
    """Generator to read data_size data from a file that may have some missing

       path_file:   path to the file
       line_format: list with the space between 2 consecutive data
       temp_char:   character that this function will use as placeholder for 
                    the missing data during procesing
       data_size:   amount of data expected per line of the file
       ignore:      in case that 'line_format' is not given, ignore all 
                    lines that don't have the correct format, otherwise 
                    is expected that the first line have the correct 
                    format to use it a model for the rest of the file

       Expected format of the content of the file:
       A  B      C    D E    F    G    H    I  J

       with A,B,...,J strings without space or 'temp_char' or numbers

       This function assume that the space between 2 consecutive 
       data is constant in all the file

       usage

       >>> datos = list(read_data_line("/some_folder/some_file.txt")

       or

       >>> for line in read_data_line("/some_folder/some_file.txt"):
               print(line)"""
    with open(path_file,"r") as data_raw: #this is the usual way of managing files
        for line in data_raw: #here you read each line of the file one by one
            datos = line.split()
            if not line_format and len(datos)==data_size: #I have all the data, and I assume this structure is the norm
                line = line.strip()
                for d in datos:
                    line = line.replace(d,temp_char,1)
                line_format = [ len(x) for x in line.split(temp_char)[1:-1] ]
            if len(datos) < data_size: #missisng data
                if line_format:
                    for t in line_format:
                        line = line.replace(" "*t,temp_char,1)
                    datos = list(map(str.strip,line.split(temp_char)))
                else:
                    if ignore:
                        continue
                    raise RuntimeError("Imposible determinate the structure of file")
            yield datos

输出

>>> for x in read_data_line("missing.txt"):
    print(x)


['AAA', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']
['AAA', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']
['AAA', 'B', 'C', '', 'E', 'F', 'G', 'H', '', 'J']
['AAA', 'B', 'C', '', 'E', 'F', 'G', 'H']
['']
['100', '2', '3', '4', '5', '6', '7', '8', '9', '10']
['100', '2', '3', '', '5', '6', '7', '8', '9', '10']
['100', '2', '3', '', '5', '6', '7', '8', '', '10']
['100', '2', '3', '', '5', '6', '7', '8', '', '']
['']
['100.1', '2.1', '3.1', '4.1', '5.1', '6.1', '7.1', '8.1', '9.1', '10.1']
['100.1', '2.1', '3.1', '', '5.1', '6.1', '7.1', '8.1', '9.1', '10.1']
['100.1', '2.1', '3.1', '', '5.1', '6.1', '7.1', '8.1', '', '10.1']
['100.1', '2.1', '3.1', '', '5.1', '6.1', '7.1', '8.1', '', '']
['']
['hello', 'this', 'is', 'a', 'example', 'of', 'a', 'normal', 'file', 'right?']
['hello', 'this', 'is', '', 'example', 'of', 'a', 'normal', 'file', 'right?']
['hello', 'this', 'is', '', 'example', 'of', 'a', 'normal', '', 'right?']
['hello', 'this', 'is', '', 'example', 'of', 'a', 'normal', '', '']
>>> 
希望能解决你的问题

答案 2 :(得分:0)

如果您的数据之间有一致的空格数,并且缺少的数据被替换为空格(如示例中所示),您仍然可以执行非常类似的操作:

'use strict';

const React = require('react-native');
const {
  StyleSheet,
  Text,
  View,
  BackAndroid,
  TextInput,
  TouchableNativeFeedback,
  ScrollView
} = React;

const ActionButton = require('./action-button'),
    Dimensions = require('Dimensions');

module.exports = React.createClass({
  handleBackButtonPress () {
    if (this.props.navigator) {
      this.props.navigator.pop();
      return true;
    }

    return false;
  },

  componentWillMount () {
    BackAndroid.addEventListener('hardwareBackPress', this.handleBackButtonPress);
  },

  componentWillUnmount () {
    BackAndroid.removeEventListener('hardwareBackPress', this.handleBackButtonPress);
  },

  onInputFocus (refName) {
    setTimeout(() => {
      let scrollResponder = this.refs.getScrollResponder();
      scrollResponder.scrollNativeHandleToKeyboard(
        React.findNodeHandle(this.refs[refName]),
        110,
        true
      );
    }, 50);
  },

  render: function() {
    return (
      <View style={styles.scrollWrapper}>
        <ScrollView ref='scrollView' style={styles.scroller}>
          <View style={styles.container}>
            <View style={styles.header}>
              <Text>New Post</Text>

                <View style={styles.actions}>
                  <ActionButton handler={this.handleBackButtonPress} icon={'fontawesome|close'}
                    size={15} width={15} height={15} />
                </View>
            </View>
            <View style={styles.content}>
              <TextInput underlineColorAndroid={'white'}
                placeholder={'Who\'s your professor?'}
                ref='professor'
                onFocus={this.onInputFocus.bind(this, 'professor')}
                />

              <TextInput multiline={true}
                underlineColorAndroid={'white'}
                placeholder={'What do you think?'}
                ref='post'
                onFocus={this.onInputFocus.bind(this, 'post')}
                />
            </View>
            <View style={styles.footer}>
              <TouchableNativeFeedback
                background={TouchableNativeFeedback.SelectableBackground()}>

                <View style={{width: 50, height: 25, backgroundColor: 'green'}}>
                  <Text>Submit</Text>
                </View>
              </TouchableNativeFeedback>
            </View>
          </View>
        </ScrollView>
      </View>
    );
  }
});

const styles = StyleSheet.create({
  scrollWrapper: {
    flex: 1
  },
  scroller: {
    flex: 1
  },
  container: {
    flex: 1,
    flexDirection: 'column',
    justifyContent: 'flex-start',
    backgroundColor: 'white',
    padding: 5,
  },
  post: {
    flex: 1,
  },
  actions: {
    flex: 1,
    flexDirection: 'row',
    justifyContent: 'flex-end',
    alignSelf: 'center'
  },
  header: {
    flex: 1,
    position: 'absolute',
    top: 0,
    left: 0,
    right: 0,
    height: 35,
    padding: 5,
    flexDirection: 'row'
  },
  content: {
    flex: 1,
    position: 'absolute',
    top: 35
  },
  footer: {
    flex: 1,
    position: 'absolute',
    bottom: 0,
    left: 0,
    right: 0,
    height: 35
  }
});

在每个字母之间为每个空格添加a,_,b,_,c,_,d,_,e = "A B C E".split(' ') 的位置。或者,如果您的缺失数据未被空格替换,则拆分每个字母之间的空格数并执行您之前所做的操作(此示例适用于每个数据之间有3个空格):

_

缺少的字母将填充AAA,B,C,D,E,F,G,H,I = line.split(' ') ,这是两组''并排的结果。

答案 3 :(得分:0)

感谢您的回答,我找到了解决问题的方法。 因为数据格式化为具有rigids列(ex%8.3f)的列,我认为下一个代码是我唯一可以执行顶部读取变量输入数据的代码。我不知道这是否是更好的解决方案。

data= "AAA   B  C   D E   F     G     H     I J  
       AAA   B  C     E   F     G     H     I J  
       AAA   B  C     E   F     G     H        "
for line in data_raw.splitlines(): 
    aaa = line[0:2].strip()
    b = line[4:6].strip()
    c = line[7:10].strip()
    d = line[11:14].strip()
    e = line[15:16].strip()
    f = line[17:20].strip()
    g = line[21:26].strip()
    h = line[27:32].strip()
    i = line[37:38].strip()
    j = line[39:40].strip()
    print b, f,g,h

输出:

B E F G  
B E F G
B E F G