在这里有这个写法:# Crawl-delay: 5
Crawl-delay: 5前面的#号,是表示这一行是注释,这一行不起作用的意思吗?
就表示,爬取的时候不用设定爬取的时间间隔为5秒,想爬多快都可以吗?
比方说设定时间间隔为1毫秒,或者根本不设定时间间隔,这样符合robots.txt
协议吗?
///////////////////////////////////////////////////////
User-agent: *
Disallow: /subject_search
Disallow: /amazon_search
Disallow: /search
Disallow: /group/search
Disallow: /event/search
Disallow: /celebrities/search
Disallow: /location/drama/search
Disallow: /forum/
Disallow: /new_subject
Disallow: /service/iframe
Disallow: /j/
Disallow: /link2/
Disallow: /recommend/
Disallow: /doubanapp/card
Disallow: /update/topic/
Disallow: /share/
Allow: /ads.txt
Sitemap:
https://www.douban.com/sitemap_index.xmlSitemap:
https://www.douban.com/sitemap_updated_index.xml# Crawl-delay: 5
User-agent: Wandoujia Spider
Disallow: /
User-agent: Mediapartners-Google
Disallow: /subject_search
Disallow: /amazon_search
Disallow: /search
Disallow: /group/search
Disallow: /event/search
Disallow: /celebrities/search
Disallow: /location/drama/search
Disallow: /j/
--
FROM 60.7.174.*