add page for scrapy from scrapy.org (#939)

2016-08-23 12:15:37 +00:00 · 2016-08-23 12:15:37 +00:00 · 9f5d5b571f
parent 111d8aae0f
commit 9f5d5b571f
1 changed files with 31 additions and 0 deletions
--- a/pages/common/scrapy.md
+++ b/pages/common/scrapy.md
@ -0,0 +1,31 @@
+# scrapy
+
+> Web-crawling Framework.
+
+- Create a project:
+
+`scrapy startproject {{project_name}}`
+
+- Create a spider (in project directory):
+
+`scrapy genspider {{spider_name}} {{website_domain}}`
+
+- Edit spider (in project directory):
+
+`scrapy edit {{spider_name}}`
+
+- Run spider (in project directory):
+
+`scrapy crawl {{spider_name}}`
+
+- Fetch a webpage as scrapy sees it and print source in stdout:
+
+`scrapy fetch {{url}}`
+
+- Open a webpage in the default browser as scrapy sees it (disable javascript for extra fidelity):
+
+`scrapy view {{url}}`
+
+- Open scrapy shell for url, which allows interaction with the page source in python shell (or ipython if available):
+
+`scrapy shell {{url}}`