python - 应用%LIKE%

时间:2017-04-26 03:06:19

标签: python pandas

我是Python的新手,我正在尝试加入两个CSV文件(由“;”分隔)

CSV1
Sender;Recipient
Adam;123
Alex;234
John;123
Adam;888

CSV2
Name;Phone
Winnie;123,234,456
Celeste;777,888,999

预期输出:

Sender;Recipient;RecipientName
Adam;123;Winnie
Alex;234;Winnie
John;123;Winnie
Adam;888;Celeste
Phone中的

CSV2由逗号分隔。所以当我匹配时,我需要进行某种搜索或%LIKE%

我知道我可以使用join来执行vlookup类型但是如何实现%LIKE%

3 个答案:

答案 0 :(得分:3)

  • 使用str.splitPhone列转换为列表
  • 使用str.len()查找每个列表的长度。我们将使用它来展开'Name'
  • 将所有这些列表合并为一个。确保过滤掉零长度列表
  • 使用repeat爆炸'Name'
  • 创建一个字典,其中键是电话号码,值是名称
  • 创建d1的副本,我们已使用map和我们制作的新词典添加新列。
p = d2.Phone.str.split(',')
p = p[p.astype(bool)]
l = p.str.len()
p2 = np.concatenate(p.values).astype(int)
nm = d2.Name.repeat(l)
m = dict(zip(p2, nm))

df = d1.assign(RecipientName=d1.Recipient.map(m))
print(df)

  Sender  Recipient RecipientName
0   Adam        123        Winnie
1   Alex        234        Winnie
2   John        123        Winnie
3   Adam        888       Celeste

df.to_csv('out.csv', sep=';', header=None)

Sender;Recipient;RecipientName
Adam;123;Winnie
Alex;234;Winnie
John;123;Winnie
Adam;888;Celeste

答案 1 :(得分:1)

Series的{​​{3}}解决方案:

from  itertools import chain

#split values by `,` to lists
lens = df2['Phone'].str.split(',')
#if some zero list remove it
df2 = df2.dropna(subset=['Phone'])

#explode Names by length of lists, flat values by chain.from_iterable
s = pd.Series(np.repeat(df2.Name.values, lens), 
              index= list(chain.from_iterable(df2.Phone.values)))
#convert index to int for match
s.index = s.index.astype(int)
print (s)
123     Winnie
234     Winnie
456     Winnie
777    Celeste
888    Celeste
999    Celeste
dtype: object
#map values to new column
df1['RecipientName'] = df1['Recipient'].map(s)
print(df1)
  Sender  Recipient RecipientName
0   Adam        123        Winnie
1   Alex        234        Winnie
2   John        123        Winnie
3   Adam        888       Celeste

#write to csv
df.to_csv('out.csv', sep=';', header=None)

Sender;Recipient;RecipientName
Adam;123;Winnie
Alex;234;Winnie
John;123;Winnie
Adam;888;Celeste

map的解决方案类似:

df2['Phone'] = df2['Phone'].str.split(',')
df2 = df2.dropna(subset=['Phone'])

s = pd.Series(np.repeat(df2.Name.values, df2.Phone.str.len()), 
              index= list(chain.from_iterable(df2.Phone.values)))
s.index = s.index.astype(int)
s.name = 'RecipientName'
print (s)

df1 = df1.join(s, on='Recipient')
print(df1)
  Sender  Recipient RecipientName
0   Adam        123        Winnie
1   Alex        234        Winnie
2   John        123        Winnie
3   Adam        888       Celeste

编辑:

我的数据样本:

import pandas as pd
from pandas.compat import StringIO

temp=u"""
Sender;Recipient
Adam;123
Alex;234
John;123
Adam;888"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df1 = pd.read_csv(StringIO(temp), sep=";")
print (df1)
  Sender  Recipient
0   Adam        123
1   Alex        234
2   John        123
3   Adam        888

temp=u"""
Name;Phone
Winnie;123,234,456
Celeste;777,888,999"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df2 = pd.read_csv(StringIO(temp), sep=";")
print (df2)
      Name        Phone
0   Winnie  123,234,456
1  Celeste  777,888,999

答案 2 :(得分:0)

这是一些伪代码和关于如何做到这一点的想法。

我首先要解析CSV2文件。跳过第一行,然后按以下几行解析名称&电话,然后维护一个字典,其中的姓名与每个电话号码相关联。

numbers_to_names = {}
for line in open("csv2", "r").splitlines():
    name, phone_numbers = line.split(";")
    for phone_number in phone_numbers.split(","):
        numbers_to_names[phone_number] = name

然后当再次浏览CSV1时,跳过第一行,然后解析发件人和收件人,并结合之前的字典结果。

for line in open("csv1", "r").splitlines():
    sender, recipient = line.split(";")
    print "%s;%s;%s" % (sender, recipient, numbers_to_names[recipient])