我在下面的专栏中有以下类型的字符串。我想解析每个字符串的最后一个_
之后的所有内容,如果没有_
,则将该字符串保持原样。 (因为我在下面的尝试中只会排除没有_
的字符串)
到目前为止,我已经在下面尝试过,{@ {3}}。但这只是解析第一个_
d6['SOURCE_NAME'] = d6['SOURCE_NAME'].str.split('_').str[0]
以下是我的SOURCE_NAME列中的一些示例字符串。
Stackoverflow_1234
Stack_Over_Flow_1234
Stackoverflow
Stack_Overflow_1234
预期:
Stackoverflow
Stack_Over_Flow
Stackoverflow
Stack_Overflow
任何帮助将不胜感激。
答案 0 :(得分:3)
结合使用str.rsplit
和str.get
来获得所需的结果。 str.rsplit
只是从末尾拆分一个字符串,而str.get
获取pd.Series对象中迭代器的第n个元素。
d6['SOURCE_NAME'] = df['SOURCE_NAME'].str.rsplit('_', n=1).str.get(0)
n
中的rsplit
参数限制了输出的分割数,因此您只能将所有内容保留在最后一个'_'之前。
尽管使用pd.Series.apply
的解决方案几乎快了一半,但我喜欢这一解决方案,因为它的语法更具表现力。如果您想使用pd.Series.apply
解决方案(更快),请检查计时部分!
strs = ['Stackoverflow_1234',
'Stack_Over_Flow_1234',
'Stackoverflow',
'Stack_Overflow_1234']
df = pd.DataFrame(data={'SOURCE_NAME': strs})
这将导致
print(df)
SOURCE_NAME
0 Stackoverflow_1234
1 Stack_Over_Flow_1234
2 Stackoverflow
3 Stack_Overflow_1234
使用建议的解决方案:
df['SOURCE_NAME'].str.rsplit('_', 1).str.get(0)
0 Stackoverflow
1 Stack_Over_Flow
2 Stackoverflow
3 Stack_Overflow
Name: SOURCE_NAME, dtype: object
有趣的是,使用pd.Series.str
不一定比使用pd.Series.apply
快:
import pandas as pd
df = pd.DataFrame(data={'SOURCE_NAME': ['stackoverflow_1234_abcd'] * 1000})
%timeit df['SOURCE_NAME'].apply(lambda x: x.rsplit('_', 1)[0])
497 µs ± 30.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df['SOURCE_NAME'].str.rsplit('_', n=1).str.get(0)
1.04 ms ± 4.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# increasing the number of rows x 100
df = pd.concat([df] * 100)
%timeit df['SOURCE_NAME'].apply(lambda x: x.rsplit('_', 1)[0])
31.7 ms ± 1.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit df['SOURCE_NAME'].str.rsplit('_', n=1).str.get(0)
84.1 ms ± 6.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
答案 1 :(得分:1)
您可以尝试这样应用lambda:
public class MainActivity extends AppCompatActivity {
private TextToSpeech textToSpeech;
private TextView outputTextView;
private static final int READ_REQUEST_CODE = 7;
// private static final String FILE_PATH = "/sdcard/Download/Electronic_Tech.pdf";
private String filePath;
private Intent intent;
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
FloatingActionButton fab = findViewById(R.id.fab);
outputTextView = findViewById(R.id.output_text);
textToSpeech = new TextToSpeech(getApplicationContext(), new TextToSpeech.OnInitListener() {
@Override
public void onInit(int i) {
textToSpeech.setLanguage(Locale.US);
}
});
/* permission read external storage */
ActivityCompat.requestPermissions(this, new String[]{Manifest.permission.READ_EXTERNAL_STORAGE,
Manifest.permission.WRITE_EXTERNAL_STORAGE}, PackageManager.PERMISSION_GRANTED);
fab.setOnClickListener(new View.OnClickListener() {
@Override
public void onClick(View view) {
intent = new Intent(Intent.ACTION_GET_CONTENT);
intent.setType("*/*");
startActivityForResult(intent, READ_REQUEST_CODE);
}
});
}
@Override
protected void onActivityResult(int requestCode, int resultCode, Intent resultData) {
if (requestCode == READ_REQUEST_CODE && resultCode == Activity.RESULT_OK) {
if(resultData != null) {
filePath = resultData.getData().getPath();
Toast.makeText(MainActivity.this, filePath , Toast.LENGTH_LONG).show();
openPdfFile();
}
}
}
public void openPdfFile() {
Log.v("OPEN", filePath);
File file = new File(filePath);
String stringParser;
try {
PdfReader pdfReader = new PdfReader(file.getPath());
stringParser = PdfTextExtractor.getTextFromPage(pdfReader, 1).trim();
pdfReader.close();
outputTextView.setText(stringParser);
textToSpeech.speak(stringParser, TextToSpeech.QUEUE_FLUSH,null, null);
} catch (IOException e) {
e.printStackTrace();
}
}
}
希望有帮助!
答案 2 :(得分:1)
使用rsplit()返回您要实现的目标,您可以告诉它将字符串拆分多少次。
s = "Stack_Over_Flow_1234"
s.rsplit('_', 1)[0] # Split my string one time and get the first part of it
然后返回'Stack_Over_Flow'
答案 3 :(得分:1)
您可以使用string.split('_')函数将字符串分成每个下划线周围的子字符串列表,然后重新组合它们而无需最后一个元素。这是使用您的示例的片段:
a = ["Stackoverflow_1234", "Stack_Over_Flow_1234", "Stackoverflow", "Stack_Overflow_1234"]
for e in a:
# Split the string into a list, separated at '_'
splitStr = e.split("_")
# If there is only 1 element, we can use it directly
if len(splitStr) == 1:
print(splitStr[0])
# Slice off the final substring and join the remaining
# substrings back together with underscores
else:
print("_".join(splitStr[:-1]))