获取列数包含pandas中的字符串

时间:2017-12-05 19:41:26

标签: python python-3.x pandas numpy

我有两个数据帧df1和df2。

DF1:

PartNumber
0000D3447E
0000D3447E
0000D3447E12
0000D3447E
0000D3447E
0000D3447E
0000D3447E2345
0000F2892E
0000F2892E
0000F2892E
0000F2892E34
0000F2892E
0000F2892E
0000F2892E12

DF2:

PartNumber
0000D3447E39S
0000D3447E39S
0000D3447E39S
0000D3447E39S
0000D3447E39S
0000D3447E39S
0000D3447E39S2245
0000F2892EDI1
0000F2892EDI1
0000F2892EDI1
0000F2892EDI124
0000F2892EDI1
0000F2892EDI1
0000F2892EDI1
0000D1617EAD6
0000D1617EAD6
0000D1617EAD6137
0000D1617EAD6
0000D1617EAD6
0000D1617EAD612
0000D1617EAD6
0000D3447EYG1
0000D3447EYG1
0000D3447EYG1
0000D3447EYG1
0000D3447EYG1
0000D3447EYG1
0000D3447EYG1

我需要从df1获取'0000D3447E'并获取包含此字符串的df2的列数,并通过创建新列来更新df1中的计数。

给定答案df1 ['count_of_colors'] = df1 ['PartNumber']。map(df2 ['PartNumber']。str [:10] .value_counts())
 可以解决问题,但字符串不是10个字符的常量。所以,我得到错误的字符串计数,因为字符串被限制是str [:10]字符。

感谢。

3 个答案:

答案 0 :(得分:2)

您可以使用地图

df1['count_of_colors'] = df1['PartNumber'].map(df2['PartNumber'].str[:10].value_counts())

DF1:

    PartNumber  count_of_colors
0   0000D3447E  14
1   0000D3447E  14
2   0000D3447E  14
3   0000D3447E  14
4   0000D3447E  14
5   0000D3447E  14
6   0000D3447E  14
7   0000F2892E  7
8   0000F2892E  7
9   0000F2892E  7
10  0000F2892E  7
11  0000F2892E  7
12  0000F2892E  7
13  0000F2892E  7

编辑:使用str.extract在df2中提取完全匹配,然后使用相同的解决方案

pat = '({})'.format('|'.join(df1['PartNumber'].unique()))

df2['PartMatch'] = df2['PartNumber'].str.extract(pat, expand = False)

df1['count_of_colors'] = df1['PartNumber'].map(df2['PartMatch'].value_counts())

您获得相同的输出,并且没有硬编码的字符数

答案 1 :(得分:0)

我想你只需要这个

df1['count_of_colors'] =df1['PartNumber'].map(df2['PartNumber'].value_counts())

答案 2 :(得分:0)

我的代码在这里。它适用于我..我无法复制你提供的所有数据,但是样本就在这里

public class Slot {

    private static Pattern textPattern = Pattern.compile("(\\w+) ([0-9:]+)-([0-9:]+)");
    private static DateTimeFormatter dayFormatter = DateTimeFormatter.ofPattern("EEE");

    public static Slot parse(String text) {
        Matcher textMatcher = textPattern.matcher(text);
        if (textMatcher.matches()) {
            DayOfWeek day = DayOfWeek.from(dayFormatter.parse(textMatcher.group(1)));
            LocalTime startTime = LocalTime.parse(textMatcher.group(2));
            LocalTime endTime = LocalTime.parse(textMatcher.group(3));
            return new Slot(day, startTime, endTime);
        } else {
            throw new IllegalArgumentException("Unparsable slot " + text + ", expected format Sun 12:00-14:00");
        }
    }

    private final DayOfWeek day;
    private final LocalTime startTime;
    private final LocalTime endTime;

    Slot(DayOfWeek day, LocalTime startTime, LocalTime endTime) {
        if (! endTime.isAfter(startTime)) {
            throw new IllegalArgumentException("End time must be after start time");
        }
        this.day = Objects.requireNonNull(day);
        this.startTime = startTime;
        this.endTime = endTime;
    }

    public DayOfWeek getDay() {
        return day;
    }

    public LocalTime getStartTime() {
        return startTime;
    }

    public LocalTime getEndTime() {
        return endTime;
    }

    /** 
     * @param nextSlot
     * @return A new Slot object representing the vacant Slot between this Slot and nextSlot,
     *      or an empty Optional if no gap
     */
    public Optional<Slot> slotBetween(Slot nextSlot) {
        if (! nextSlot.getDay().equals(day)) {
            throw new IllegalArgumentException("Cannot compare slots on different days of week");
        }
        if (nextSlot.getStartTime().isBefore(endTime)) {
            throw new IllegalArgumentException("Error: overlap between slots " + this + " and " + nextSlot);
        }
        if (nextSlot.getStartTime().isAfter(endTime)) { // there is a gap
            return Optional.of(new Slot(day, endTime, nextSlot.getStartTime()));
        } else {
            return Optional.empty();
        }
    }

    @Override
    public String toString() {
        return dayFormatter.format(day) + ' ' + startTime + '-' + endTime;
    }

}