我有一个数据文件,其中一些行看起来像这样。数据是空间分隔的。但是空格不一样......
AAA B C D E F G H I J
AAA B C D E F G H I J
AAA B C D E F G H I J
我用过
AAA,B,C,D,E,F,G,H,I = line.split()
读取数据。
最近我获得的新数据有时缺少列D和/或I和/或J.
列类似于:
AAA B C D E F G H I J
AAA B C E F G H J
AAA B C E F G H
对我来说重要的所有数据都在B,E,F和G列中。我不能使用line.split(),因为左边的变量正在改变。可以重写脚本来读取所有输入数据的情况吗?有什么建议吗?
答案 0 :(得分:1)
你可以使用熊猫'或者numpy的csv阅读能力:
import numpy as np
data = np.genfromtxt(
'data.txt',
missings_values=['-', ],
names=['AAA', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']
)
print(data['AAA'])
或熊猫:
import pandas as pd
data = pd.read_csv(
'data.txt',
sep='\S+',
na_values='-',
names=['AAA', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'],
)
print(data['AAA'])
答案 1 :(得分:1)
如果数据之间的空间量是固定的,而缺失的数据只是一个空格,那么你可以这样做:
>>> s="AAA B C E F G H J "
>>> s.split(" ")
['AAA', 'B', 'C', '', ' E', 'F', 'G', 'H', '', ' J ']
修改强>
假设两个连续数据之间的空间在所有文件中都是常量,我给你这个
将此文件作为示例: missing.txt
AAA B C D E F G H I J
AAA B C D E F G H I J
AAA B C E F G H J
AAA B C E F G H
100 2 3 4 5 6 7 8 9 10
100 2 3 5 6 7 8 9 10
100 2 3 5 6 7 8 10
100 2 3 5 6 7 8
100.1 2.1 3.1 4.1 5.1 6.1 7.1 8.1 9.1 10.1
100.1 2.1 3.1 5.1 6.1 7.1 8.1 9.1 10.1
100.1 2.1 3.1 5.1 6.1 7.1 8.1 10.1
100.1 2.1 3.1 5.1 6.1 7.1 8.1
hello this is a example of a normal file right?
hello this is example of a normal file right?
hello this is example of a normal right?
hello this is example of a normal
并使用此功能
def read_data_line(path_file, data_size=10, line_format=None, temp_char="@", ignore=True):
"""Generator to read data_size data from a file that may have some missing
path_file: path to the file
line_format: list with the space between 2 consecutive data
temp_char: character that this function will use as placeholder for
the missing data during procesing
data_size: amount of data expected per line of the file
ignore: in case that 'line_format' is not given, ignore all
lines that don't have the correct format, otherwise
is expected that the first line have the correct
format to use it a model for the rest of the file
Expected format of the content of the file:
A B C D E F G H I J
with A,B,...,J strings without space or 'temp_char' or numbers
This function assume that the space between 2 consecutive
data is constant in all the file
usage
>>> datos = list(read_data_line("/some_folder/some_file.txt")
or
>>> for line in read_data_line("/some_folder/some_file.txt"):
print(line)"""
with open(path_file,"r") as data_raw: #this is the usual way of managing files
for line in data_raw: #here you read each line of the file one by one
datos = line.split()
if not line_format and len(datos)==data_size: #I have all the data, and I assume this structure is the norm
line = line.strip()
for d in datos:
line = line.replace(d,temp_char,1)
line_format = [ len(x) for x in line.split(temp_char)[1:-1] ]
if len(datos) < data_size: #missisng data
if line_format:
for t in line_format:
line = line.replace(" "*t,temp_char,1)
datos = list(map(str.strip,line.split(temp_char)))
else:
if ignore:
continue
raise RuntimeError("Imposible determinate the structure of file")
yield datos
输出
>>> for x in read_data_line("missing.txt"):
print(x)
['AAA', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']
['AAA', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']
['AAA', 'B', 'C', '', 'E', 'F', 'G', 'H', '', 'J']
['AAA', 'B', 'C', '', 'E', 'F', 'G', 'H']
['']
['100', '2', '3', '4', '5', '6', '7', '8', '9', '10']
['100', '2', '3', '', '5', '6', '7', '8', '9', '10']
['100', '2', '3', '', '5', '6', '7', '8', '', '10']
['100', '2', '3', '', '5', '6', '7', '8', '', '']
['']
['100.1', '2.1', '3.1', '4.1', '5.1', '6.1', '7.1', '8.1', '9.1', '10.1']
['100.1', '2.1', '3.1', '', '5.1', '6.1', '7.1', '8.1', '9.1', '10.1']
['100.1', '2.1', '3.1', '', '5.1', '6.1', '7.1', '8.1', '', '10.1']
['100.1', '2.1', '3.1', '', '5.1', '6.1', '7.1', '8.1', '', '']
['']
['hello', 'this', 'is', 'a', 'example', 'of', 'a', 'normal', 'file', 'right?']
['hello', 'this', 'is', '', 'example', 'of', 'a', 'normal', 'file', 'right?']
['hello', 'this', 'is', '', 'example', 'of', 'a', 'normal', '', 'right?']
['hello', 'this', 'is', '', 'example', 'of', 'a', 'normal', '', '']
>>>
希望能解决你的问题
答案 2 :(得分:0)
如果您的数据之间有一致的空格数,并且缺少的数据被替换为空格(如示例中所示),您仍然可以执行非常类似的操作:
'use strict';
const React = require('react-native');
const {
StyleSheet,
Text,
View,
BackAndroid,
TextInput,
TouchableNativeFeedback,
ScrollView
} = React;
const ActionButton = require('./action-button'),
Dimensions = require('Dimensions');
module.exports = React.createClass({
handleBackButtonPress () {
if (this.props.navigator) {
this.props.navigator.pop();
return true;
}
return false;
},
componentWillMount () {
BackAndroid.addEventListener('hardwareBackPress', this.handleBackButtonPress);
},
componentWillUnmount () {
BackAndroid.removeEventListener('hardwareBackPress', this.handleBackButtonPress);
},
onInputFocus (refName) {
setTimeout(() => {
let scrollResponder = this.refs.getScrollResponder();
scrollResponder.scrollNativeHandleToKeyboard(
React.findNodeHandle(this.refs[refName]),
110,
true
);
}, 50);
},
render: function() {
return (
<View style={styles.scrollWrapper}>
<ScrollView ref='scrollView' style={styles.scroller}>
<View style={styles.container}>
<View style={styles.header}>
<Text>New Post</Text>
<View style={styles.actions}>
<ActionButton handler={this.handleBackButtonPress} icon={'fontawesome|close'}
size={15} width={15} height={15} />
</View>
</View>
<View style={styles.content}>
<TextInput underlineColorAndroid={'white'}
placeholder={'Who\'s your professor?'}
ref='professor'
onFocus={this.onInputFocus.bind(this, 'professor')}
/>
<TextInput multiline={true}
underlineColorAndroid={'white'}
placeholder={'What do you think?'}
ref='post'
onFocus={this.onInputFocus.bind(this, 'post')}
/>
</View>
<View style={styles.footer}>
<TouchableNativeFeedback
background={TouchableNativeFeedback.SelectableBackground()}>
<View style={{width: 50, height: 25, backgroundColor: 'green'}}>
<Text>Submit</Text>
</View>
</TouchableNativeFeedback>
</View>
</View>
</ScrollView>
</View>
);
}
});
const styles = StyleSheet.create({
scrollWrapper: {
flex: 1
},
scroller: {
flex: 1
},
container: {
flex: 1,
flexDirection: 'column',
justifyContent: 'flex-start',
backgroundColor: 'white',
padding: 5,
},
post: {
flex: 1,
},
actions: {
flex: 1,
flexDirection: 'row',
justifyContent: 'flex-end',
alignSelf: 'center'
},
header: {
flex: 1,
position: 'absolute',
top: 0,
left: 0,
right: 0,
height: 35,
padding: 5,
flexDirection: 'row'
},
content: {
flex: 1,
position: 'absolute',
top: 35
},
footer: {
flex: 1,
position: 'absolute',
bottom: 0,
left: 0,
right: 0,
height: 35
}
});
在每个字母之间为每个空格添加a,_,b,_,c,_,d,_,e = "A B C E".split(' ')
的位置。或者,如果您的缺失数据未被空格替换,则拆分每个字母之间的空格数并执行您之前所做的操作(此示例适用于每个数据之间有3个空格):
_
缺少的字母将填充AAA,B,C,D,E,F,G,H,I = line.split(' ')
,这是两组''
并排的结果。
答案 3 :(得分:0)
感谢您的回答,我找到了解决问题的方法。 因为数据格式化为具有rigids列(ex%8.3f)的列,我认为下一个代码是我唯一可以执行顶部读取变量输入数据的代码。我不知道这是否是更好的解决方案。
data= "AAA B C D E F G H I J
AAA B C E F G H I J
AAA B C E F G H "
for line in data_raw.splitlines():
aaa = line[0:2].strip()
b = line[4:6].strip()
c = line[7:10].strip()
d = line[11:14].strip()
e = line[15:16].strip()
f = line[17:20].strip()
g = line[21:26].strip()
h = line[27:32].strip()
i = line[37:38].strip()
j = line[39:40].strip()
print b, f,g,h
输出:
B E F G
B E F G
B E F G