我正在阅读在线商店网站的源代码,在每个产品页面上,我需要找到一个显示产品SKU及其数量的JSON字符串。
以下是2个样本:
public class LocalizationUpdaterActivity extends Activity {
private String[] languages = { "English", "Francais", "Espanol", "Ivrit" };
/**
* Called when the activity is first created.
*/
@Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_langues);
SharedPreferences sp = this.getApplicationContext().getSharedPreferences("loginSaved", Context.MODE_PRIVATE);
final SharedPreferences.Editor editor = sp.edit();
Spinner spinner = (Spinner) findViewById(R.id.spinner1);
spinner.setPrompt("select language");
ArrayAdapter<String> adapter = new ArrayAdapter<String>(this,
android.R.layout.simple_spinner_item, languages);
adapter.setDropDownViewResource(android.R.layout.simple_spinner_dropdown_item);
spinner.setAdapter(adapter);
spinner.setOnItemSelectedListener(new AdapterView.OnItemSelectedListener() {
public void onItemSelected(AdapterView arg0, View arg1,
int arg2, long arg3) {
Configuration config = new Configuration();
switch (arg2) {
case 0:
config.locale = Locale.ENGLISH;
editor.putString("Langues", "en_US");
break;
case 1:
config.locale = Locale.FRENCH;
editor.putString("Langues", "fr_FR");
break;
case 2:
config.locale = new Locale("es_ES");
editor.putString("Langues", "es_ES");
break;
case 3:
config.locale = new Locale("he", "IL");
editor.putString("Langues", "he_IL");
break;
default:
config.locale = Locale.ENGLISH;
editor.putString("Langues", "en_US");
break;
}
popup("Warning !","The App will retart to apply the changes");
getResources().updateConfiguration(config, null);
}
public void onNothingSelected(AdapterView arg0) {
// TODO Auto-generated method stub
}
});
}
public void killApplication(Activity activity) {
//Broadcast the command to kill all activities
Intent intent = new Intent("kill");
intent.setType("content://all");
activity.sendBroadcast(intent);
}
public void restartApplication() {
killApplication(this);
//Start the launch activity
Intent i = this.getBaseContext().getPackageManager().getLaunchIntentForPackage(this.getBaseContext().getPackageName());
this.startActivity(i);
}
public void popup(String titre, String texte) {
AlertDialog.Builder alertDialogBuilder = new AlertDialog.Builder(this,
AlertDialog.THEME_HOLO_DARK);
alertDialogBuilder.setTitle(titre).setMessage(texte)
.setCancelable(false)
.setNegativeButton("Ok", new DialogInterface.OnClickListener() {
@Override
public void onClick(DialogInterface dialog, int id) {
LocalizationUpdaterActivity.this.restartApplication();
}
});
alertDialogBuilder.create();
alertDialogBuilder.show();
}
public static void CopyStream(InputStream is, OutputStream os) {
final int buffer_size = 1024;
try {
byte[] bytes = new byte[buffer_size];
for (;;) {
int count = is.read(bytes, 0, buffer_size);
if (count == -1)
break;
os.write(bytes, 0, count);
}
} catch (Exception ex) {
}
}
}
上面的示例显示了3个SKU。
gameDisplay = pygame.display.set_mode((display_width, display_height))
上面的示例显示了更多SKU。
JSON字符串中的SKU数量范围从1到无穷大。
现在,我需要一个正则表达式模式从每个页面中提取此JSON字符串。那时,我可以轻松使用'{"sku-SV023435_B_M":7,"sku-SV023435_BL_M":10,"sku-SV023435_PU_M":11}'
。
更新: 在这里我发现了另一个问题,抱歉我的问题没有完成,还有另一个类似的json字符串,它是以sku-开头的,请看一下你会理解的下面链接的源代码,唯一的区别就是那一个的值是字母数字,我们要求的是数字。另请注意我们的最终目标是提取数量的SKU,也许您有一个最直接的解决方案。
@ chris85
第二次更新:
这是另一个奇怪的问题,有点偏离主题。
当我使用下面的代码打开URL内容时,源代码中没有json字符串!
'{"sku-11430_B_S":"20","sku-11430_B_M":"17","sku-11430_B_L":"30","sku-11430_B_XS":"13","sku-11430_BL_S":"7","sku-11430_BL_M":"17","sku-11430_BL_L":"4","sku-11430_BL_XS":"16","sku-11430_O_S":"8","sku-11430_O_M":"6","sku-11430_O_L":"22","sku-11430_O_XS":"20","sku-11430_LBL_S":"27","sku-11430_LBL_M":"25","sku-11430_LBL_L":"22","sku-11430_LBL_XS":"10","sku-11430_Y_S":"24","sku-11430_Y_M":36,"sku-11430_Y_L":"20","sku-11430_Y_XS":"6","sku-11430_RR_S":"4","sku-11430_RR_M":"35","sku-11430_RR_L":"47","sku-11430_RR_XS":"6"}',
但是当我用浏览器打开网址时,json就在那里!真的很困惑:(
答案 0 :(得分:0)
您需要使用preg_match_all()
执行正则表达式匹配操作(文档here)。
以下内容应该为您完成。它将匹配以“sku”开头并以“,”结尾的每个子字符串。
preg_match_all("/sku\-.+?:[0-9]*/", $input)
工作示例here。
或者,如果要提取整个字符串,可以使用:
preg_match_all("/{.sku\-.*}/, $input")
这将抓住开始和结束括号之间的所有内容。
工作示例here。
请注意$input
表示输入字符串。
答案 1 :(得分:0)
简单的/'(\{"[^\}]+\})'/
将匹配所有这些JSON字符串。演示:https://regex101.com/r/wD5bO4/2
返回数组的第一个元素将包含json_decode
的JSON字符串:
preg_match_all ("/'(\{\"[^\}]+\})'/", $html, $matches);
$html
是要解析的HTML,JSON将在$ matches [0] [1],$ matches [1] [1],$ matches [2] [1]等。
答案 2 :(得分:0)
由于json的编码方式,尝试直接使用regexp从json中提取特定数据通常总是一个坏主意。最好的方法是对整个json数据进行regexp,然后使用php函数json_decode进行解码。
缺少数据的问题是由于缺少必需的cookie。请参阅下面的代码中的我的评论。
<?php
function getHtmlFromDresslinkUrl($url)
{
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
//You must send the currency cookie to the website for it to return the json you want to scrape
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Cookie: currencies_code=USD;',
));
$output=curl_exec($ch);
curl_close($ch);
return $output;
}
$html = getHtmlFromDresslinkUrl("http://www.dresslink.com/womens-candy-color-basic-coat-slim-suit-jacket-blazer-p-8131.html");
//Get the specific arguments for this js function call only
$items = preg_match("/DL\.items\_list\.initItemAttr\((.+)\)\;/", $html, $matches);
if (count($matches) > 0) {
$arguments = $matches[1];
//Split by argument seperator.
//I know, this isn't great but it seems to work.
$args_array = explode(", ", $arguments);
//You need the 5th argument
$fourth_arg = $args_array[4];
//Strip quotes
$fourth_arg = trim($fourth_arg, "'");
//json_decode
$qty_data = json_decode($fourth_arg, true);
//Then you can work with the php array
foreach ($qty_data as $name => $qtty) {
echo "Found " . $qtty . " of " . $name . "<br />";
}
}
?>
特别感谢@ chris85让我再次阅读这个问题。对不起,但我无法取消我的downvote。