我有一个数据框df,我想在演员和流派列中添加'/' 这样每个单元格包含3个'/'
id movie cast genres runtime
1 Furious a/b/c/d a/b 23
2 Minions a/b/c a/b/c 55
3 Mission a/b a 67
4 Kingsman a/b/c/d a/b/c/d 23
5 Star Wars a a/b/c 45
所以,它的输出看起来像这样
id movie cast genres runtime
1 Furious a/b/c/d a/b// 23
2 Minions a/b/c/ a/b/c/ 55
3 Mission a/b// a/// 67
4 Kingsman a/b/c/d a/b/c/d 23
5 Star Wars a/// a/b/c/ 45
答案 0 :(得分:1)
这是定义自定义函数的一种方法:
def add_values(df, *cols):
for col in cols:
# amount of "/" to add at each row
c = df[col].str.count('/').rsub(3)
# translate the above to as many "/" as required
ap = [i * '/' for i in c.tolist()]
# Add the above to the corresponding column
df[col] = [i + j for i,j in zip(df[col], ap)]
return df
add_values(df, 'cast', 'genres')
id movie cast genres runtime
0 1 Furious a/b/c/d a/b// 23
1 2 Minions a/b/c/ a/b/c/ 55
2 3 Mission a/b// a/// 67
3 4 Kingsman a/b/c/d a/b/c/d 23
4 5 StarWars a/// a/b/c/ 45
答案 1 :(得分:0)
在每列的每个元素上使用此功能来更新它们。
def update_string(string):
total_occ = 3 #total no. of occurrences of character '/'
for element in string: # for each element,
if element == "/": # if there is '/', decrease 'total_occ'
total_occ=total_occ-1;
for i in range(total_occ): # add remaining no. of '/' at the end
string+="/"
return string
x = "a/b"
print(update_string(x))
输出为:
a/b//
答案 2 :(得分:0)
您可以用/
分割,用空字符串填充结果列表,直到大小为4,然后再次与/
合并。
使用.apply
更改整个列中的值。
尝试一下:
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO("""id movie cast genres runtime
1 Furious a/b/c/d a/b 23
2 Minions a/b/c a/b/c 55
3 Mission a/b a 67
4 Kingsman a/b/c/d a/b/c/d 23
5 Star Wars a a/b/c 45"""), sep=r"\s\s+")
def pad_cells(value):
parts = value.split("/")
parts += [""] * (4 - len(parts))
return "/".join(parts)
df["cast"] = df["cast"].apply(pad_cells)
df["genres"] = df["genres"].apply(pad_cells)
print(df)
答案 3 :(得分:0)
您在这里:
= ^ .. ^ =
public function __construct(\Swift_Mailer $mailer, UrlGeneratorInterface $router, \Twig_Environment $twig, array $parameters)
{
$this->mailer = $mailer;
$this->router = $router;
$this->twig = $twig;
$this->parameters = $parameters;
}
public function sendConfirmationEmailMessage(UserInterface $user)
{
$template = $this->parameters['template']['confirmation'];
//...
输出:
import pandas as pd
from io import StringIO
# create raw data
raw_data = StringIO("""
id movie cast genres runtime
1 Furious a/b/c/d a/b 23
2 Minions a/b/c a/b/c 55
3 Mission a/b a 67
4 Kingsman a/b/c/d a/b/c/d 23
5 Star_Wars a a/b/c 45
""")
# load data into data frame
df = pd.read_csv(raw_data, sep=' ')
# iterate over rows and add character
for index, row in df.iterrows():
count_character_cast = row['cast'].count('/')
if count_character_cast < 3:
df.set_value(index, 'cast', row['cast']+'/'*(3-int(count_character_cast)))
count_character_genres = row['genres'].count('/')
if count_character_genres < 3:
df.set_value(index, 'genres', row['genres'] + '/' * (3 - int(count_character_genres)))
答案 4 :(得分:0)
具有itertools功能和Dataframe.applymap
功能的简短解决方案:
In [217]: df
Out[217]:
id movie cast genres runtime
0 1 Furious a/b/c/d a/b 23
1 2 Minions a/b/c a/b/c 55
2 3 Mission a/b a 67
3 4 Kingsman a/b/c/d a/b/c/d 23
4 5 Star Wars a a/b/c 45
In [218]: from itertools import chain, zip_longest
In [219]: def ensure_slashes(x):
...: return ''.join(chain.from_iterable(zip_longest(x.split('/'), '///', fillvalue='')))
...:
...:
In [220]: df[['cast','genres']] = df[['cast','genres']].applymap(ensure_slashes)
In [221]: df
Out[221]:
id movie cast genres runtime
0 1 Furious a/b/c/d a/b// 23
1 2 Minions a/b/c/ a/b/c/ 55
2 3 Mission a/b// a/// 67
3 4 Kingsman a/b/c/d a/b/c/d 23
4 5 Star Wars a/// a/b/c/ 45
要应用的关键功能是:
def ensure_slashes(x):
return ''.join(chain.from_iterable(zip_longest(x.split('/'), '///', fillvalue='')))
答案 5 :(得分:0)
好的,所以这个想法是创建一个执行必要工作的函数并将其应用于所需的列:
该函数将用空字符串替换当前的斜杠,并在单元格中创建字符串的zip,并使用正好3个元素的恒定斜杠列表。
结果是此zip和Hoppla有效的元素的隐喻:)
import pandas as pd
import re
df = pd.DataFrame({
'id': [1, 2, 3, 4, 5],
'movie': ['furious', 'Mininons', 'mission', 'Kingsman', 'star Wars'],
'cast': ['a/b/c/d', 'a/b/c', 'a/b', 'a/b/c/d', 'a'],
'genres': ['a/b', 'a/b/c', 'a', 'a/b/c/d', 'a/b/c'],
'runtime': [23, 55, 67, 23, 45],
})
def slash_func(x):
slash_list = ['/'] * 3
x = re.sub('/', '', str(x))
list_ = list(x)
for i in range(3 - len(list_)):
list_.append('')
output_list = [v[0]+v[1] for v in list(zip(list_, slash_list))]
return ''.join(output_list)
df['cast'] = df['cast'].apply(lambda x: slash_func(x))
df['genres'] = df['genres'].apply(lambda x: slash_func(x))
输出:
id movie cast genres runtime
1 furious a/b/c/ a/b// 23
2 Mininons a/b/c/ a/b/c/ 55
3 mission a/b// a/// 67
4 Kingsman a/b/c/ a/b/c/ 23
5 star Wars a/// a/b/c/ 45