我有一个包含22000行作者姓名的CSV文件。
我想将它们拆分并追加到如下所示的新列中。
原始数据集预览 :
+------------------------------------+
| author_full_name |
+------------------------------------+
| Kahana, M J; Adler, M |
|Gautam, H; Potdar, G G; Vidya, T N C|
+------------------------------------+
预期产量 :
+------------------------------------+------------------------------------------+
| author_full_name | author_first_names| author_last_names |
+------------------------------------+------------------------------------------+
| Kahana, M J; Adler, M | M J; M | Kahana; Adler |
|Gautam, H; Potdar, G G; Vidya, T N C| H; G G; T N C | Gautam; Potdar; Vidya|
+------------------------------------+------------------------------------------+
我如何用熊猫来做到这一点?
答案 0 :(得分:1)
这里的逻辑本质上是先除以;
,然后将各个值除以,
,然后将它们的firstvalue作为; ast name和2nd values作为first
>>> [x.split(",")[0] for x in "Gautam, H; Potdar, G G; Vidya, T N C".split(";")]
>>> ['Gautam', ' Potdar', ' Vidya']
在熊猫中使用apply:
import pandas as pd
df = pd.DataFrame({"Name":["Gautam, H; Potdar, G G; Vidya, T N C","Kahana, M J; Adler, M "]})
df['author_last_names'] = df['Name'].apply(lambda x: ";".join([ele.split(",")[1] for ele in x.split(";")]))
df['author_first_names'] = df['Name'].apply(lambda x: ";".join([ele.split(",")[0] for ele in x.split(";")]))
df
输出:
------------------------------------|-----------------|------------------------
Gautam, H; Potdar, G G; Vidya, T N C H; G G; T N C Gautam; Potdar; Vidya
Kahana, M J; Adler, M M J; M Kahana; Adler
------------------------------------|-----------------|------------------------