我正在尝试从客户的数字笔记本中提取联系人信息,该笔记本中的条目如下:
'\ r \ n联系人已导入:\ r \ n商务电话:9547711900 Line1:2440
东商业大道。\ r \ n城市:Ft。劳德代尔州\ r \ n省份:FL \ r \ n 邮政编码:33308 \ r \ n \ r \ nArt Womack建议Steve Paul Dentist 商业大道 \ r \ nA_womack@me.com> \ r \ nBond?冠?单板?\ r \ n \ r \ n \ r \ n'
拆分后,我的目标是要有一个包含相关数据的元素列表(大多数中间包含一个':'),以便稍后将其转换为python字典。
我已经尝试用'\ r'和'\ r'字符分解字符串,但是我一直缺少Line1:yadayada信息。
我想要类似的东西
['BusinessPhone : 9547711900','BusinessPhone : 9547711900',
'Line1 : 2440 East Commercial Blvd.', 'City : Ft. Lauderdale',
'State : FL', 'PostalCode : 3330']
答案 0 :(得分:1)
您可以尝试:
>>> from io import StringIO
>>> import pandas as pd
>>> data = """
... '\r\nContact Imported:\r\nBusinessPhone : 9547711900 Line1 : 2440
... East Commercial Blvd.\r\n City : Ft. Lauderdale\r\n State : FL\r\n PostalCode : 33308\r\n\r\nArt Womack recommends Steve Paul Dentist on Commercial Blvd area.\r\nA_womack@me.com>\r\nBond? Crowns? Veneer?\r\n\r\n\r\n'
... """
您可以尝试使用pd.read_csv
阅读:
>>> df = pd.read_csv(StringIO(data))
>>> df
'
0 Contact Imported:
1 BusinessPhone : 9547711900 Line1 : 2440
2 East Commercial Blvd.
3 City : Ft. Lauderdale
4 State : FL
5 PostalCode : 33308
6 Art Womack recommends Steve Paul Dentist on Co...
7 A_womack@me.com>
8 Bond? Crowns? Veneer?
9 '
如@jezrael所建议,如有必要,请将df转换为list:
# df.values.tolist()
OR
>>> df.values
array([['Contact Imported:'],
['BusinessPhone : 9547711900 Line1 : 2440'],
['East Commercial Blvd.'],
[' City : Ft. Lauderdale'],
[' State : FL'],
[' PostalCode : 33308'],
['Art Womack recommends Steve Paul Dentist on Commercial Blvd area.'],
['A_womack@me.com>'],
['Bond? Crowns? Veneer?'],
["'"]], dtype=object)
答案 1 :(得分:0)
您如何尝试清除数据?您可以使用'\ r \ n'作为分隔符来分解您拥有的样本数据。您可以在拆分后根据列表中的字符串是否为空来过滤列表。这可以作为基本的数据清理过程来完成。您已自行决定相关的部分。
清洁的基本代码可以是:
mystr = '\r\nContact Imported:\r\nBusinessPhone : 9547711900 Line1 : 2440 East Commercial Blvd.\r\n City : Ft. Lauderdale\r\n State : FL\r\n PostalCode : 33308\r\n\r\nArt Womack recommends Steve Paul Dentist on Commercial Blvd area.\r\nA_womack@me.com>\r\nBond? Crowns? Veneer?\r\n\r\n\r\n'
data = mystr.split('\r\n')
data_filtered = list(filter(lambda x: x, data))
for d in data_filtered:
print(d.strip())
这将输出:
Contact Imported:
BusinessPhone : 9547711900 Line1 : 2440 East Commercial Blvd.
City : Ft. Lauderdale
State : FL
PostalCode : 33308
Art Womack recommends Steve Paul Dentist on Commercial Blvd area.
A_womack@me.com>
Bond? Crowns? Veneer?
您仍然需要弄清楚什么是重要的。
编辑:基于给定的字符串,您可以使用以下代码:
def convert(x):
d = x.split(':')
newlist = []
if len(d) > 2:
# Hack will work only in few cases, including this case
vals = d[1].strip().split(' ')
newlist.append(f'{d[0]}:{vals[0]}')
newlist.append(f'{vals[1]}:{d[2]}')
return newlist
return [x]
mystr = '\r\nContact Imported:\r\nBusinessPhone : 9547711900 Line1 : 2440 East Commercial Blvd.\r\n City : Ft. Lauderdale\r\n State : FL\r\n PostalCode : 33308\r\n\r\nArt Womack recommends Steve Paul Dentist on Commercial Blvd area.\r\nA_womack@me.com>\r\nBond? Crowns? Veneer?\r\n\r\n\r\n'
data = mystr.split('\r\n')
data_filtered = list(filter(lambda x: x, data))
data_filtered_2 = list((map(lambda x: convert(x), data_filtered)))
data_combined = []
for i in data_filtered_2:
data_combined += i
for d in data_combined:
print(d.strip())