Question

我在下面的专栏中有以下类型的字符串。我想解析每个字符串的最后一个_之后的所有内容，如果没有_，则将该字符串保持原样。（因为我在下面的尝试中只会排除没有_的字符串）

到目前为止，我已经在下面尝试过，{@ {3}}。但这只是解析第一个_

之后的所有内容

d6['SOURCE_NAME'] = d6['SOURCE_NAME'].str.split('_').str[0]

以下是我的SOURCE_NAME列中的一些示例字符串。

Stackoverflow_1234
Stack_Over_Flow_1234
Stackoverflow
Stack_Overflow_1234

预期：

Stackoverflow
Stack_Over_Flow
Stackoverflow
Stack_Overflow

任何帮助将不胜感激。

Answer 1

结合使用str.rsplit和str.get来获得所需的结果。 str.rsplit只是从末尾拆分一个字符串，而str.get获取pd.Series对象中迭代器的第n个元素。

答案

d6['SOURCE_NAME'] = df['SOURCE_NAME'].str.rsplit('_', n=1).str.get(0)

n中的rsplit参数限制了输出的分割数，因此您只能将所有内容保留在最后一个'_'之前。

尽管使用pd.Series.apply的解决方案几乎快了一半，但我喜欢这一解决方案，因为它的语法更具表现力。如果您想使用pd.Series.apply解决方案（更快），请检查计时部分！

pandas documentation。

示例

strs = ['Stackoverflow_1234',
        'Stack_Over_Flow_1234',
        'Stackoverflow',
        'Stack_Overflow_1234']
df = pd.DataFrame(data={'SOURCE_NAME': strs})

这将导致

print(df)
            SOURCE_NAME
0    Stackoverflow_1234
1  Stack_Over_Flow_1234
2         Stackoverflow
3   Stack_Overflow_1234

使用建议的解决方案：

df['SOURCE_NAME'].str.rsplit('_', 1).str.get(0)

0      Stackoverflow
1    Stack_Over_Flow
2      Stackoverflow
3     Stack_Overflow
Name: SOURCE_NAME, dtype: object

时间

有趣的是，使用pd.Series.str不一定比使用pd.Series.apply快：

import pandas as pd

df = pd.DataFrame(data={'SOURCE_NAME': ['stackoverflow_1234_abcd'] * 1000})

%timeit df['SOURCE_NAME'].apply(lambda x: x.rsplit('_', 1)[0])
497 µs ± 30.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df['SOURCE_NAME'].str.rsplit('_', n=1).str.get(0)
1.04 ms ± 4.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# increasing the number of rows x 100
df = pd.concat([df] * 100)

%timeit df['SOURCE_NAME'].apply(lambda x: x.rsplit('_', 1)[0])
31.7 ms ± 1.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit df['SOURCE_NAME'].str.rsplit('_', n=1).str.get(0)
84.1 ms ± 6.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Answer 2

您可以尝试这样应用lambda：

public class MainActivity extends AppCompatActivity {

    private TextToSpeech textToSpeech;
    private TextView outputTextView;
    private static final int READ_REQUEST_CODE = 7;
//    private static final String FILE_PATH = "/sdcard/Download/Electronic_Tech.pdf";
    private String filePath;
    private Intent intent;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        FloatingActionButton fab = findViewById(R.id.fab);
        outputTextView = findViewById(R.id.output_text);

        textToSpeech = new TextToSpeech(getApplicationContext(), new TextToSpeech.OnInitListener() {
            @Override
            public void onInit(int i) {
                textToSpeech.setLanguage(Locale.US);
            }
        });

        /* permission read external storage */
        ActivityCompat.requestPermissions(this, new String[]{Manifest.permission.READ_EXTERNAL_STORAGE,
                Manifest.permission.WRITE_EXTERNAL_STORAGE}, PackageManager.PERMISSION_GRANTED);

        fab.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View view) {
                intent = new Intent(Intent.ACTION_GET_CONTENT);
                intent.setType("*/*");
                startActivityForResult(intent, READ_REQUEST_CODE);
            }
        });
    }

    @Override
    protected void onActivityResult(int requestCode, int resultCode, Intent resultData) {
        if (requestCode == READ_REQUEST_CODE && resultCode == Activity.RESULT_OK) {
            if(resultData != null) {
                filePath = resultData.getData().getPath();
                Toast.makeText(MainActivity.this, filePath , Toast.LENGTH_LONG).show();
                openPdfFile();
            }
        }
    }


    public void openPdfFile() {
        Log.v("OPEN", filePath);
        File file = new File(filePath);
        String stringParser;
        try {
            PdfReader pdfReader = new PdfReader(file.getPath());
            stringParser = PdfTextExtractor.getTextFromPage(pdfReader, 1).trim();
            pdfReader.close();
            outputTextView.setText(stringParser);
            textToSpeech.speak(stringParser, TextToSpeech.QUEUE_FLUSH,null, null);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

希望有帮助！

Answer 3

使用rsplit（）返回您要实现的目标，您可以告诉它将字符串拆分多少次。

s = "Stack_Over_Flow_1234"
s.rsplit('_', 1)[0] # Split my string one time and get the first part of it

然后返回'Stack_Over_Flow'

Answer 4

您可以使用string.split（'_'）函数将字符串分成每个下划线周围的子字符串列表，然后重新组合它们而无需最后一个元素。这是使用您的示例的片段：

a = ["Stackoverflow_1234", "Stack_Over_Flow_1234", "Stackoverflow", "Stack_Overflow_1234"]

for e in a:

    # Split the string into a list, separated at '_'
    splitStr = e.split("_")

    # If there is only 1 element, we can use it directly
    if len(splitStr) == 1:
        print(splitStr[0])

    # Slice off the final substring and join the remaining 
    # substrings back together with underscores
    else:
        print("_".join(splitStr[:-1]))

熊猫，删除最后一个“ _”之后的所有内容

4 个答案:

答案

示例

时间