我的.txt file看起来像这样:
id nm lat lon countryCode
5555555 London 55.876456 99.546231 UK
我需要解析每个字段并将它们添加到SQLite数据库中。到目前为止,我已经设法将id,name和countryCode列转移到我的数据库中,但是我很难找到解决方案来单独解析每个记录的lat和lon。
我尝试使用正则表达式,但没有运气。我还想过让解析器检查最后一个非空白字符是否是一个字母,以确定该字符串是lat而不是lon,但不知道如何正确实现它。我可以使用正则表达式来解决它,还是应该使用自定义解析器?如果是这样,怎么样?
答案 0 :(得分:5)
你可以用这样的熊猫来做到这一点:
import pandas as pd
import sqlite3
con = sqlite3.connect('path/new.db')
con.text_factory = str
df = pd.read_csv('file_path', sep='\t')
df.to_sql('table_01', con)
如果线条不好而你可以跳过它们,那就用这个:
df = pd.read_csv('file_path', sep='\t', error_bad_lines=False)
答案 1 :(得分:3)
查看文本文件,看起来每行的格式始终相同。因此,为什么不这样分开:
split()
使用import tensorflow as tf
global_step_tensor = tf.Variable(10, trainable=False, name='global_step')
cluster = tf.train.ClusterSpec({"local": ["localhost:2222", "localhost:2223","localhost:2224", "localhost:2225"]})
x = tf.constant(2)
with tf.device("/job:local/task:0"):
y1 = x + 300
with tf.device("/job:local/task:1"):
y2 = x**2
with tf.device("/job:local/task:2"):
y3 = 5*x
with tf.device("/job:local/task:3"):
y0 = x - 66
y = y0 + y1 + y2 + y3
ChiefSessionCreator = tf.train.ChiefSessionCreator(scaffold=None, master='localhost:2222', config='grpc://localhost:2222', checkpoint_dir='/home/tensorflow/codes/checkpoints')
saver_hook = tf.train.CheckpointSaverHook(checkpoint_dir='/home/tensorflow/codes/checkpoints', save_secs=10, save_steps=None, saver=y, checkpoint_basename='model.ckpt', scaffold=None)
summary_hook = tf.train.SummarySaverHook(save_steps=None, save_secs=10, output_dir='/home/tensorflow/codes/savepoints', summary_writer=None, scaffold=None, summary_op=y)
with tf.train.MonitoredTrainingSession(master='localhost:2222', is_chief=True, checkpoint_dir='/home/tensorflow/codes/checkpoints',
scaffold=None, hooks=[saver_hook, summary_hook], chief_only_hooks=None, save_checkpoint_secs=10, save_summaries_steps=None, config='grpc://localhost:2222') as sess:
while not sess.should_stop():
sess.run(model)
while not sess.should_stop():
print(sess.run(y0))
print('\n')
while not sess.should_stop():
print(sess.run(y1))
print('\n')
while not sess.should_stop():
print(sess.run(y2))
print('\n')
while not sess.should_stop():
print(sess.run(y3))
print('\n')
while not sess.should_stop():
result = sess.run(y)
print(result)
,您不必担心字符串的每个标记之间有多少空格。
答案 2 :(得分:1)
txt = '5555555 London 55.876456 99.546231 UK'
(id, nm, lat, lon, countryCode) = txt.split()