是否有可能刮掉Metacritic电影页面?

时间:2015-09-15 21:00:46

标签: c# html wpf web-scraping

使用此代码:

public void MetaCriticScrap()
{
    var http = (HttpWebRequest)WebRequest.Create("http://www.metacritic.com/movie/boyhood");
    var response = http.GetResponse();

    var stream = response.GetResponseStream();
    var sr = new StreamReader(stream);
    var content = sr.ReadToEnd();
    var wr = new StreamWriter("scrap.txt");
    wr.WriteLine(content);
    wr.Close();

    System.Console.WriteLine(content);
    response.Close();
}

现在允许我刮掉这个页面。我不认为这是实际的代码问题,因为当我放置任何其他网址(谷歌搜索等)时,它完全没问题。

错误讯息:

Exception thrown: 'System.Net.WebException' in System.dll

完整例外设置:

 System.Net.WebException was unhandled
 HResult=-2146233079
 Message=The remote server returned an error: (429) Unknown.
 Source=System
 StackTrace:
   at System.Net.HttpWebRequest.GetResponse()
   at ShouldIWatch.DisplayPage.MetaCriticScrap() in DisplayPage.xaml.cs:line 80
   at ShouldIWatch.DisplayPage..ctor() in DisplayPage.xaml.cs:line 31
   at ShouldIWatch.MainWindow.DannyBrownButton(Object sender, RoutedEventArgs e) in MainWindow.xaml.cs:line 89
   at System.Windows.RoutedEventHandlerInfo.InvokeHandler(Object target, RoutedEventArgs routedEventArgs)
   at System.Windows.EventRoute.InvokeHandlersImpl(Object source, RoutedEventArgs args, Boolean reRaised)
   at System.Windows.UIElement.RaiseEventImpl(DependencyObject sender, RoutedEventArgs args)
   at System.Windows.UIElement.RaiseEvent(RoutedEventArgs e)
   at System.Windows.Controls.Primitives.ButtonBase.OnClick()
   at System.Windows.Controls.Button.OnClick()
   at System.Windows.Controls.Primitives.ButtonBase.OnAccessKey(AccessKeyEventArgs e)
   at System.Windows.Input.AccessKeyManager.ProcessKey(List`1 targets, String key, Boolean existsElsewhere, Boolean userInitiated)
   at System.Windows.Input.AccessKeyManager.ProcessKeyForSender(Object sender, String key, Boolean existsElsewhere, Boolean userInitiated)
   at System.Windows.Input.AccessKeyManager.OnKeyDown(KeyEventArgs e)
   at System.Windows.Input.AccessKeyManager.PostProcessInput(Object sender, ProcessInputEventArgs e)
   at System.Windows.Input.InputManager.RaiseProcessInputEventHandlers(ProcessInputEventHandler postProcessInput, ProcessInputEventArgs processInputEventArgs)
   at System.Windows.Input.InputManager.ProcessStagingArea()
   at System.Windows.Input.InputManager.ProcessInput(InputEventArgs input)
   at System.Windows.Input.InputProviderSite.ReportInput(InputReport inputReport)
   at System.Windows.Interop.HwndKeyboardInputProvider.ReportInput(IntPtr hwnd, InputMode mode, Int32 timestamp, RawKeyboardActions actions, Int32 scanCode, Boolean isExtendedKey, Boolean isSystemKey, Int32 virtualKey)
   at System.Windows.Interop.HwndKeyboardInputProvider.ProcessKeyAction(MSG& msg, Boolean& handled)
   at System.Windows.Interop.HwndSource.CriticalTranslateAccelerator(MSG& msg, ModifierKeys modifiers)
   at System.Windows.Interop.HwndSource.OnPreprocessMessage(Object param)
   at System.Windows.Threading.ExceptionWrapper.InternalRealCall(Delegate callback, Object args, Int32 numArgs)
   at System.Windows.Threading.ExceptionWrapper.TryCatchWhen(Object source, Delegate callback, Object args, Int32 numArgs, Delegate catchHandler)
   at System.Windows.Threading.Dispatcher.LegacyInvokeImpl(DispatcherPriority priority, TimeSpan timeout, Delegate method, Object args, Int32 numArgs)
   at System.Windows.Threading.Dispatcher.Invoke(DispatcherPriority priority, Delegate method, Object arg)
   at System.Windows.Interop.HwndSource.OnPreprocessMessageThunk(MSG& msg, Boolean& handled)
   at System.Windows.Interop.HwndSource.WeakEventPreprocessMessage.OnPreprocessMessage(MSG& msg, Boolean& handled)
   at System.Windows.Interop.ComponentDispatcherThread.RaiseThreadMessage(MSG& msg)
   at System.Windows.Threading.Dispatcher.PushFrameImpl(DispatcherFrame frame)
   at System.Windows.Threading.Dispatcher.PushFrame(DispatcherFrame frame)
   at System.Windows.Application.RunDispatcher(Object ignore)
   at System.Windows.Application.RunInternal(Window window)
   at System.Windows.Application.Run(Window window)
   at System.Windows.Application.Run()
   at ShouldIWatch.app.Main() in obj\Debug\App.g.cs:line 0
   at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
   at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
   at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
   at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.ThreadHelper.ThreadStart()
   InnerException: 

我的最终目标是继续使用MetaCritc页面并获得MetaScore和用户评分。无论如何这样做或者我必须放弃? :(

感谢阅读。新的C#开发人员。

2 个答案:

答案 0 :(得分:1)

为什么不使用他们的API并避免报废工作?

通过使用他们的API,你会得到一个像这样的json响应,它包含你需要的信息,这要容易得多

{
"result": {
"name": "Star Trek Into Darkness",
"score": "72",
"genre": [
  "Action",
  "Adventure",
  "Sci-Fi",
  "Thriller"
],
"thumbnail": "http://static.metacritic.com/images/products/movies/4/c7350d7a54a3301ee5c3d218df59ad45-98.jpg",
"userscore": 7.8,
"summary": "After the crew of the Enterprise find an unstoppable force of terror from within their own organization, Captain Kirk leads a manhunt to a war-zone world to capture a one man weapon of mass destruction.",
"runtime": "132 min",
"director": "J.J. Abrams",
"cast": "Benedict Cumberbatch, Chris Pine, Zachary Quinto, Zoe Saldana",
"rating": "PG-13",
"rlsdate": "2013-05-15",
"url": "http://www.metacritic.com/movie/star-trek-into-darkness"
}
}

答案 1 :(得分:1)

我自己测试了代码并收到了同样的错误。在我操纵UserAgent之后,他们让你下载页面。

var http = (HttpWebRequest)WebRequest.Create("http://www.metacritic.com/movie/boyhood");
http.UserAgent = "Mozilla.. Haha, not really.";
try {
    var response = http.GetResponse();

    var stream = response.GetResponseStream();
    var sr = new StreamReader(stream);
    var content = sr.ReadToEnd();
    var wr = new StreamWriter("scrap.txt");
    wr.WriteLine(content);
    wr.Close();
    Debug.WriteLine(content);
    response.Close();
}
catch (WebException ex)
{
    //Get the returned data to see what kind of error occured
    string s = new StreamReader(ex.Response.GetResponseStream()).ReadToEnd();
    Debug.WriteLine(s);
}

给了我

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<html xmlns:og="http://opengraphprotocol.org/schema/"
      xmlns:fb="http://ogp.me/ns/fb#">
<head>
            <title>Boyhood Reviews - Metacritic</title>

.. 但实际上,你应该考虑使用另一个答案指出的API。但这会回答你的问题&#34;为什么我会收到429错误&#34;。