爬虫如何获取执行完js后的html源文件

Kenney · 发表于 2014-12-9 11:06:09

$DP95MGX$WIAZU_{N@R3Q0.png$ 实际上天猫的页面已经有数据，我这抓取不到咋办呢？

网上说这个方法在浏览器加载文档完成后可以得到js加载的数据好像不行大家帮忙看看哪里出了问题
private void PrintHelpPage()
      {
         // Create a WebBrowser instance.
         WebBrowser webBrowserForPrinting = new WebBrowser();

         // Add an event handler that prints the document after it loads.
         webBrowserForPrinting.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(PrintDocument);
         // Set the Url property to load the document.
         webBrowserForPrinting.Url = new Uri("http://rate.taobao.com/user-rate-UMCkWvCIyvCcy.htm?spm=a220o.1000855.d4918101.2.SvdDdK&qq-pf-to=pcqq.c2c");
      }

      private void PrintDocument(object sender, WebBrowserDocumentCompletedEventArgs e)
      {
         //MessageBox.Show("000");
         //MessageBox.Show();//不可以用InnerItem
         // Print the document now that it is fully loaded.
         //((WebBrowser)sender).Print();
         // Dispose the WebBrowser now that the task is complete. ((WebBrowser)sender).Dispose();
         richTextBox1.Text = ((WebBrowser)sender).Document.GetElementById("relalist").InnerHtml;
      }

Kenney · 发表于 2014-12-9 11:06:49

飞哥求助啊

站长苏飞 · 发表于 2014-12-9 11:37:08

直接取是不可能取到的，只能分析JS算法来生成。

		自动登录	找回密码
密码			马上注册

[其他] 爬虫如何获取执行完js后的html源文件