苏飞论坛

 找回密码
 马上注册

QQ登录

只需一步,快速开始

分布式系统框架(V2.0) 轻松承载百亿数据,千万流量!讨论专区 - 源码下载 - 官方教程

HttpHelper爬虫框架(V2.7-含.netcore) HttpHelper官方出品,爬虫框架讨论区 - 源码下载 - 在线测试和代码生成

HttpHelper爬虫类(V2.0) 开源的爬虫类,支持多种模式和属性 源码 - 代码生成器 - 讨论区 - 教程- 例子

查看: 9590|回复: 4

[其他] c#对于网站日志的分析

[复制链接]
发表于 2017-7-28 14:32:38 | 显示全部楼层 |阅读模式
10金钱
如题所说,用winfom分析网站日志并判断是否为正确spider

看过网上的方法是直接查找文本中出现的类似 BaiduSpider 但是并不准确,说是得用 cmd 命令下的 nslookup 进行ip ping服务得出,测试也是正确的,

通过nslookup匹配spider的方式非常慢,有没有更高效的办法呢,求解

还有具体内容如何分割较好呢,要分割成为 :访问时间 IP地址 Http状态 spider 地址

以下是测试内容 ,此日志内容是 CentOS 系统宝塔面板分割出来的一部分,IIS的暂时未得

[HTML] 纯文本查看 复制代码
211.149.164.223 - - [22/Jul/2017:09:00:58 +0800] "GET / HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +[url=http://www.baidu.com/search/spider.html]http://www.baidu.com/search/spider.html[/url])"
211.149.164.223 - - [22/Jul/2017:09:00:58 +0800] "GET /index.html HTTP/1.1" 200 2915 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +[url=http://www.baidu.com/search/spider.html]http://www.baidu.com/search/spider.html[/url])"
212.83.158.95 - - [22/Jul/2017:09:46:49 +0800] "GET /admin.php HTTP/1.0" 404 2624 "-" "Opera/9.80 (Windows NT 6.1; U; ru) Presto/2.8.131 Version/11.10"
212.83.158.95 - - [22/Jul/2017:09:46:50 +0800] "GET /administrator/index.php HTTP/1.0" 404 2624 "-" "Opera/9.80 (Windows NT 6.1; U; ru) Presto/2.8.131 Version/11.10"
212.83.158.95 - - [22/Jul/2017:09:46:50 +0800] "GET /wp-login.php HTTP/1.0" 404 2624 "-" "Opera/9.80 (Windows NT 6.1; U; ru) Presto/2.8.131 Version/11.10"
119.123.78.82 - - [22/Jul/2017:10:58:48 +0800] "GET /m HTTP/1.1" 301 178 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
119.123.78.82 - - [22/Jul/2017:10:58:48 +0800] "GET /m/ HTTP/1.1" 200 79 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
119.123.78.82 - - [22/Jul/2017:10:58:48 +0800] "GET /favicon.ico HTTP/1.1" 404 1358 "http://zhengbanxt.win/m/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
180.163.2.118 - - [22/Jul/2017:10:58:58 +0800] "GET /m HTTP/1.1" 301 178 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3; rv:11.0) like Gecko"
119.123.78.82 - - [22/Jul/2017:10:58:58 +0800] "GET /m/index.php HTTP/1.1" 200 79 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
180.163.2.118 - - [22/Jul/2017:10:58:58 +0800] "GET /m/ HTTP/1.1" 200 79 "http://zhengbanxt.win/m" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3; rv:11.0) like Gecko"
117.185.27.114 - - [22/Jul/2017:10:59:00 +0800] "GET /m HTTP/1.1" 301 178 "" "Mozilla/5.0 (iPhone; CPU iPhone OS 8_4 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12H143 Safari/600.1.4"
117.185.27.114 - - [22/Jul/2017:10:59:00 +0800] "GET /m/ HTTP/1.1" 200 79 "" "Mozilla/5.0 (iPhone; CPU iPhone OS 8_4 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12H143 Safari/600.1.4"
101.226.99.197 - - [22/Jul/2017:11:07:14 +0800] "GET /m/index.php HTTP/1.1" 200 79 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.124 Safari/537.36"
101.226.66.187 - - [22/Jul/2017:11:07:26 +0800] "GET /m/index.php HTTP/1.1" 200 79 "" "Mozilla/5.0 (iPhone; CPU iPhone OS 8_4 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12H143 Safari/600.1.4"
219.129.216.239 - - [22/Jul/2017:11:44:38 +0800] "GET / HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"
222.186.3.229 - - [22/Jul/2017:11:44:38 +0800] "GET / HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +[url=http://www.baidu.com/search/spider.html]http://www.baidu.com/search/spider.html[/url])"
222.186.3.229 - - [22/Jul/2017:11:44:38 +0800] "GET /index.html HTTP/1.1" 200 2915 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +[url=http://www.baidu.com/search/spider.html]http://www.baidu.com/search/spider.html[/url])"
49.4.142.15 - - [22/Jul/2017:11:44:52 +0800] "GET / HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +[url=http://www.baidu.com/search/spider.html]http://www.baidu.com/search/spider.html[/url])"
49.4.142.15 - - [22/Jul/2017:11:44:55 +0800] "GET /index.html HTTP/1.1" 200 2915 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +[url=http://www.baidu.com/search/spider.html]http://www.baidu.com/search/spider.html[/url])"
119.123.78.82 - - [22/Jul/2017:15:46:47 +0800] "GET / HTTP/1.1" 301 5 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 2.0.50727; SLCC2; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)"
119.123.78.82 - - [22/Jul/2017:15:46:47 +0800] "GET /index.html HTTP/1.1" 200 8168 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 2.0.50727; SLCC2; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)"
119.123.78.82 - - [22/Jul/2017:15:46:57 +0800] "GET /favicon.ico HTTP/1.1" 404 2624 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 2.0.50727; SLCC2; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)"
119.123.78.82 - - [22/Jul/2017:15:47:06 +0800] "GET /Win7xitong/ HTTP/1.1" 200 21839 "http://www.zhengbanxt.win/index.html" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 2.0.50727; SLCC2; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)"
119.123.78.82 - - [22/Jul/2017:15:47:09 +0800] "GET /favicon.ico HTTP/1.1" 404 2624 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 2.0.50727; SLCC2; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)"
119.123.78.82 - - [22/Jul/2017:15:47:21 +0800] "GET /Win8xitong/ HTTP/1.1" 200 24771 "http://www.zhengbanxt.win/Win7xitong/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 2.0.50727; SLCC2; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)"
119.123.78.82 - - [22/Jul/2017:15:47:42 +0800] "GET /Win10xitong/ HTTP/1.1" 200 14697 "http://www.zhengbanxt.win/Win8xitong/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 2.0.50727; SLCC2; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)"
219.129.216.239 - - [22/Jul/2017:16:55:28 +0800] "GET / HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"
183.2.247.134 - - [22/Jul/2017:16:55:29 +0800] "GET / HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +[url=http://www.baidu.com/search/spider.html]http://www.baidu.com/search/spider.html[/url])"
183.2.247.134 - - [22/Jul/2017:16:55:29 +0800] "GET /index.html HTTP/1.1" 200 2915 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +[url=http://www.baidu.com/search/spider.html]http://www.baidu.com/search/spider.html[/url])"
219.129.216.239 - - [22/Jul/2017:16:55:55 +0800] "GET / HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"
104.131.254.50 - - [22/Jul/2017:18:41:42 +0800] "POST /xmlrpc.php HTTP/1.1" 404 2624 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"
219.129.216.239 - - [23/Jul/2017:03:26:11 +0800] "GET / HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"
49.4.142.15 - - [23/Jul/2017:03:26:11 +0800] "GET / HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +[url=http://www.baidu.com/search/spider.html]http://www.baidu.com/search/spider.html[/url])"
49.4.142.15 - - [23/Jul/2017:03:26:12 +0800] "GET /index.html HTTP/1.1" 200 2915 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +[url=http://www.baidu.com/search/spider.html]http://www.baidu.com/search/spider.html[/url])"
52.41.95.148 - - [23/Jul/2017:05:07:31 +0800] "GET / HTTP/1.1" 301 5 "-" "Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0"
52.41.95.148 - - [23/Jul/2017:05:07:32 +0800] "GET /index.html HTTP/1.1" 200 2915 "-" "Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0"



1. 开通SVIP会员,免费下载本站所有源码,不限次数据,不限时间
2. 加官方QQ群,加官方微信群获取更多资源和帮助
3. 找站长苏飞做网站、商城、CRM、小程序、App、爬虫相关、项目外包等点这里
发表于 2017-7-28 15:15:25 | 显示全部楼层
[C#] 纯文本查看 复制代码
//百度爬虫标识符号    : Baiduspider
        //谷歌爬虫标识符号    : Googlebot
        //搜狗爬虫标识符号   : Sogou+web+spider
        //搜搜爬虫标识符号    : Sosospider
        private void button1_Click(object sender, EventArgs e)
        {
            int Baidubot = 0, Googlebot = 0, Sogoubot = 0, Sosobot = 0;

            //log 日志的目录
            string url = textBox1.Text.Trim();  

            FileStream fs = new FileStream(url, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);

            #region 循环读取文本,并统计各个爬虫次数
            using (StreamReader sr = new StreamReader(fs, System.Text.Encoding.Default))
            {
                string line = string.Empty;
                while (!string.IsNullOrEmpty(line = sr.ReadLine()))
                {
                    if (line.Contains("Baiduspider"))
                    {
                        ++Baidubot;
                    }
                    else if (line.Contains("Googlebot"))
                    {
                        ++Googlebot;
                    }
                    else if (line.Contains("Sogou+web+spider"))
                    {
                        ++Sogoubot;
                    }
                    else if (line.Contains("Sosospider"))
                    {
                        ++Sosobot;
                    }
                }
            }
            #endregion

            label2.Text = "搜索引擎光顾次数:\n\r\n\r";
            label2.Text += "百度:" + Baidubot + "\n\r\n\r";
            label2.Text += "谷歌:" + Googlebot + "\n\r\n\r";
            label2.Text += "搜狗:" + Sogoubot + "\n\r\n\r";
            label2.Text += "搜搜:" + Sosobot + "\n\r\n\r";
        }


直接用这个完事http://www.microsoft.com/en-us/download/details.aspx?id=24659
回复

使用道具 举报

发表于 2017-7-28 15:17:15 | 显示全部楼层
回复

使用道具 举报

 楼主| 发表于 2017-7-28 15:35:37 | 显示全部楼层
站长苏飞 发表于 2017-7-28 15:15
[mw_shl_code=csharp,true]//百度爬虫标识符号    : Baiduspider
        //谷歌爬虫标识符号    : Goog ...

上面的这个案例查的不准啊,没有匹配过ip 是否某蜘蛛服务器
回复

使用道具 举报

发表于 2017-7-28 16:43:53 | 显示全部楼层
需求完全相同的这个很难找,建议自己稍作修改,
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 马上注册

本版积分规则

QQ|手机版|小黑屋|手机版|联系我们|关于我们|广告合作|苏飞论坛 ( 豫ICP备18043678号-2)

GMT+8, 2024-12-27 05:34

© 2014-2021

快速回复 返回顶部 返回列表