Question

我有一个数据文件，其中一些行看起来像这样。数据是空间分隔的。但是空格不一样......

AAA  B      C    D E    F    G    H    I  J  
AAA  B      C    D E    F    G    H    I  J  
AAA  B      C    D E    F    G    H    I  J

我用过

AAA,B,C,D,E,F,G,H,I = line.split()

读取数据。

最近我获得的新数据有时缺少列D和/或I和/或J.
列类似于：

AAA  B    C    D E    F    G    H    I  J  
AAA  B    C      E    F    G    H       J  
AAA  B    C      E    F    G    H

对我来说重要的所有数据都在B，E，F和G列中。我不能使用line.split（），因为左边的变量正在改变。可以重写脚本来读取所有输入数据的情况吗？有什么建议吗？

Answer 1

你可以使用熊猫＆＃39;或者numpy的csv阅读能力：

import numpy as np
data = np.genfromtxt(
    'data.txt',
    missings_values=['-', ],
    names=['AAA', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']
)
print(data['AAA'])

或熊猫：

import pandas as pd
data = pd.read_csv(
    'data.txt', 
    sep='\S+',
    na_values='-',
    names=['AAA', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'],
)

print(data['AAA'])

Answer 2

如果数据之间的空间量是固定的，而缺失的数据只是一个空格，那么你可以这样做：

>>> s="AAA    B    C         E    F    G    H         J  "
>>> s.split("    ")
['AAA', 'B', 'C', '', ' E', 'F', 'G', 'H', '', ' J  ']

修改

假设两个连续数据之间的空间在所有文件中都是常量，我给你这个

将此文件作为示例： missing.txt

AAA B C D E F G H I J AAA B C D E F G H I J AAA B C E F G H J AAA B C E F G H 100 2 3 4 5 6 7 8 9 10 100 2 3 5 6 7 8 9 10 100 2 3 5 6 7 8 10 100 2 3 5 6 7 8 100.1 2.1 3.1 4.1 5.1 6.1 7.1 8.1 9.1 10.1 100.1 2.1 3.1 5.1 6.1 7.1 8.1 9.1 10.1 100.1 2.1 3.1 5.1 6.1 7.1 8.1 10.1 100.1 2.1 3.1 5.1 6.1 7.1 8.1 hello this is a example of a normal file right? hello this is example of a normal file right? hello this is example of a normal right? hello this is example of a normal

并使用此功能

def read_data_line(path_file, data_size=10, line_format=None, temp_char="@", ignore=True): """Generator to read data_size data from a file that may have some missing path_file: path to the file line_format: list with the space between 2 consecutive data temp_char: character that this function will use as placeholder for the missing data during procesing data_size: amount of data expected per line of the file ignore: in case that 'line_format' is not given, ignore all lines that don't have the correct format, otherwise is expected that the first line have the correct format to use it a model for the rest of the file Expected format of the content of the file: A B C D E F G H I J with A,B,...,J strings without space or 'temp_char' or numbers This function assume that the space between 2 consecutive data is constant in all the file usage >>> datos = list(read_data_line("/some_folder/some_file.txt") or >>> for line in read_data_line("/some_folder/some_file.txt"): print(line)""" with open(path_file,"r") as data_raw: #this is the usual way of managing files for line in data_raw: #here you read each line of the file one by one datos = line.split() if not line_format and len(datos)==data_size: #I have all the data, and I assume this structure is the norm line = line.strip() for d in datos: line = line.replace(d,temp_char,1) line_format = [ len(x) for x in line.split(temp_char)[1:-1] ] if len(datos) < data_size: #missisng data if line_format: for t in line_format: line = line.replace(" "*t,temp_char,1) datos = list(map(str.strip,line.split(temp_char))) else: if ignore: continue raise RuntimeError("Imposible determinate the structure of file") yield datos

输出

>>> for x in read_data_line("missing.txt"): print(x) ['AAA', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'] ['AAA', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'] ['AAA', 'B', 'C', '', 'E', 'F', 'G', 'H', '', 'J'] ['AAA', 'B', 'C', '', 'E', 'F', 'G', 'H'] [''] ['100', '2', '3', '4', '5', '6', '7', '8', '9', '10'] ['100', '2', '3', '', '5', '6', '7', '8', '9', '10'] ['100', '2', '3', '', '5', '6', '7', '8', '', '10'] ['100', '2', '3', '', '5', '6', '7', '8', '', ''] [''] ['100.1', '2.1', '3.1', '4.1', '5.1', '6.1', '7.1', '8.1', '9.1', '10.1'] ['100.1', '2.1', '3.1', '', '5.1', '6.1', '7.1', '8.1', '9.1', '10.1'] ['100.1', '2.1', '3.1', '', '5.1', '6.1', '7.1', '8.1', '', '10.1'] ['100.1', '2.1', '3.1', '', '5.1', '6.1', '7.1', '8.1', '', ''] [''] ['hello', 'this', 'is', 'a', 'example', 'of', 'a', 'normal', 'file', 'right?'] ['hello', 'this', 'is', '', 'example', 'of', 'a', 'normal', 'file', 'right?'] ['hello', 'this', 'is', '', 'example', 'of', 'a', 'normal', '', 'right?'] ['hello', 'this', 'is', '', 'example', 'of', 'a', 'normal', '', ''] >>>
希望能解决你的问题

Answer 3

如果您的数据之间有一致的空格数，并且缺少的数据被替换为空格（如示例中所示），您仍然可以执行非常类似的操作：

'use strict';

const React = require('react-native');
const {
  StyleSheet,
  Text,
  View,
  BackAndroid,
  TextInput,
  TouchableNativeFeedback,
  ScrollView
} = React;

const ActionButton = require('./action-button'),
    Dimensions = require('Dimensions');

module.exports = React.createClass({
  handleBackButtonPress () {
    if (this.props.navigator) {
      this.props.navigator.pop();
      return true;
    }

    return false;
  },

  componentWillMount () {
    BackAndroid.addEventListener('hardwareBackPress', this.handleBackButtonPress);
  },

  componentWillUnmount () {
    BackAndroid.removeEventListener('hardwareBackPress', this.handleBackButtonPress);
  },

  onInputFocus (refName) {
    setTimeout(() => {
      let scrollResponder = this.refs.getScrollResponder();
      scrollResponder.scrollNativeHandleToKeyboard(
        React.findNodeHandle(this.refs[refName]),
        110,
        true
      );
    }, 50);
  },

  render: function() {
    return (
      <View style={styles.scrollWrapper}>
        <ScrollView ref='scrollView' style={styles.scroller}>
          <View style={styles.container}>
            <View style={styles.header}>
              <Text>New Post</Text>

                <View style={styles.actions}>
                  <ActionButton handler={this.handleBackButtonPress} icon={'fontawesome|close'}
                    size={15} width={15} height={15} />
                </View>
            </View>
            <View style={styles.content}>
              <TextInput underlineColorAndroid={'white'}
                placeholder={'Who\'s your professor?'}
                ref='professor'
                onFocus={this.onInputFocus.bind(this, 'professor')}
                />

              <TextInput multiline={true}
                underlineColorAndroid={'white'}
                placeholder={'What do you think?'}
                ref='post'
                onFocus={this.onInputFocus.bind(this, 'post')}
                />
            </View>
            <View style={styles.footer}>
              <TouchableNativeFeedback
                background={TouchableNativeFeedback.SelectableBackground()}>

                <View style={{width: 50, height: 25, backgroundColor: 'green'}}>
                  <Text>Submit</Text>
                </View>
              </TouchableNativeFeedback>
            </View>
          </View>
        </ScrollView>
      </View>
    );
  }
});

const styles = StyleSheet.create({
  scrollWrapper: {
    flex: 1
  },
  scroller: {
    flex: 1
  },
  container: {
    flex: 1,
    flexDirection: 'column',
    justifyContent: 'flex-start',
    backgroundColor: 'white',
    padding: 5,
  },
  post: {
    flex: 1,
  },
  actions: {
    flex: 1,
    flexDirection: 'row',
    justifyContent: 'flex-end',
    alignSelf: 'center'
  },
  header: {
    flex: 1,
    position: 'absolute',
    top: 0,
    left: 0,
    right: 0,
    height: 35,
    padding: 5,
    flexDirection: 'row'
  },
  content: {
    flex: 1,
    position: 'absolute',
    top: 35
  },
  footer: {
    flex: 1,
    position: 'absolute',
    bottom: 0,
    left: 0,
    right: 0,
    height: 35
  }
});

在每个字母之间为每个空格添加a,_,b,_,c,_,d,_,e = "A B C E".split(' ')的位置。或者，如果您的缺失数据未被空格替换，则拆分每个字母之间的空格数并执行您之前所做的操作（此示例适用于每个数据之间有3个空格）：

缺少的字母将填充AAA,B,C,D,E,F,G,H,I = line.split(' ')，这是两组''并排的结果。

Answer 4

感谢您的回答，我找到了解决问题的方法。因为数据格式化为具有rigids列（ex％8.3f）的列，我认为下一个代码是我唯一可以执行顶部读取变量输入数据的代码。我不知道这是否是更好的解决方案。

data= "AAA   B  C   D E   F     G     H     I J  
       AAA   B  C     E   F     G     H     I J  
       AAA   B  C     E   F     G     H        "
for line in data_raw.splitlines(): 
    aaa = line[0:2].strip()
    b = line[4:6].strip()
    c = line[7:10].strip()
    d = line[11:14].strip()
    e = line[15:16].strip()
    f = line[17:20].strip()
    g = line[21:26].strip()
    h = line[27:32].strip()
    i = line[37:38].strip()
    j = line[39:40].strip()
    print b, f,g,h

输出：

B E F G  
B E F G
B E F G

读取python中缺少项目的列

4 个答案: