解析文本文件中的值

时间:2016-12-23 02:43:26

标签: python regex parsing

我的.txt file看起来像这样:

id        nm        lat        lon        countryCode
5555555  London    55.876456   99.546231   UK

我需要解析每个字段并将它们添加到SQLite数据库中。到目前为止,我已经设法将id,name和countryCode列转移到我的数据库中,但是我很难找到解决方案来单独解析每个记录的lat和lon。

我尝试使用正则表达式,但没有运气。我还想过让解析器检查最后一个非空白字符是否是一个字母,以确定该字符串是lat而不是lon,但不知道如何正确实现它。我可以使用正则表达式来解决它,还是应该使用自定义解析器?如果是这样,怎么样?

3 个答案:

答案 0 :(得分:5)

你可以用这样的熊猫来做到这一点:

import pandas as pd
import sqlite3

con = sqlite3.connect('path/new.db')
con.text_factory = str

df = pd.read_csv('file_path', sep='\t')
df.to_sql('table_01', con)

如果线条不好而你可以跳过它们,那就用这个:

df = pd.read_csv('file_path', sep='\t', error_bad_lines=False)

Read more.

答案 1 :(得分:3)

查看文本文件,看起来每行的格式始终相同。因此,为什么不这样分开:

split()

使用import tensorflow as tf global_step_tensor = tf.Variable(10, trainable=False, name='global_step') cluster = tf.train.ClusterSpec({"local": ["localhost:2222", "localhost:2223","localhost:2224", "localhost:2225"]}) x = tf.constant(2) with tf.device("/job:local/task:0"): y1 = x + 300 with tf.device("/job:local/task:1"): y2 = x**2 with tf.device("/job:local/task:2"): y3 = 5*x with tf.device("/job:local/task:3"): y0 = x - 66 y = y0 + y1 + y2 + y3 ChiefSessionCreator = tf.train.ChiefSessionCreator(scaffold=None, master='localhost:2222', config='grpc://localhost:2222', checkpoint_dir='/home/tensorflow/codes/checkpoints') saver_hook = tf.train.CheckpointSaverHook(checkpoint_dir='/home/tensorflow/codes/checkpoints', save_secs=10, save_steps=None, saver=y, checkpoint_basename='model.ckpt', scaffold=None) summary_hook = tf.train.SummarySaverHook(save_steps=None, save_secs=10, output_dir='/home/tensorflow/codes/savepoints', summary_writer=None, scaffold=None, summary_op=y) with tf.train.MonitoredTrainingSession(master='localhost:2222', is_chief=True, checkpoint_dir='/home/tensorflow/codes/checkpoints', scaffold=None, hooks=[saver_hook, summary_hook], chief_only_hooks=None, save_checkpoint_secs=10, save_summaries_steps=None, config='grpc://localhost:2222') as sess: while not sess.should_stop(): sess.run(model) while not sess.should_stop(): print(sess.run(y0)) print('\n') while not sess.should_stop(): print(sess.run(y1)) print('\n') while not sess.should_stop(): print(sess.run(y2)) print('\n') while not sess.should_stop(): print(sess.run(y3)) print('\n') while not sess.should_stop(): result = sess.run(y) print(result) ,您不必担心字符串的每个标记之间有多少空格。

答案 2 :(得分:1)

使用str.split

txt = '5555555  London    55.876456   99.546231   UK'
(id, nm, lat, lon, countryCode) = txt.split()