|
1金钱
最近做抓取软件,碰到这个网址貌似有点问题:
http://www.qdeic.gov.cn/newslist.aspx?id=20
这是我的源码:
/// <summary>
/// 根据URL获取HTML代码
/// </summary>
/// <param name="url">URL链接</param>
/// <param name="html">返回的HTML代码</param>
/// <returns>false 返回出错原因</returns>
public static int GetHtmlByUrl(string url, out string html)
{
html = "";
int success = Constant.HttpSuccessCode;
try
{
HttpHelper http = new HttpHelper();
HttpItem item = new HttpItem()
{
URL = url,
};
HttpResult result = http.GetHtml(item);
if (result.StatusCode.ToString().ToUpper() != "OK")//状态码不为OK,则抛出异常
{
success = Convert.ToInt32(result.StatusCode);
throw new Exception(result.StatusDescription);
}
string cookie = result.Cookie;
item = new HttpItem()
{
URL = url,
Cookie = cookie
};
result = http.GetHtml(item);
html = result.Html;
}
catch (Exception ex)
{
html = ex.Message;
}
return success;
}
我稍微带动了点原来的代码以更符合我的要求。经测试,抓取上述网址时出现乱码。请查证
|
|