Reference¶
okscraper.base¶
-
class
okscraper.base.
BaseScraper
(*args, **kwargs)[source]¶ Abstract Scraper class - should be extended by concrete scraper objects
You must declare the following:
def __init__(self, *args, **kwargs): self.source = (an object derived from a class based on okscraper.sources.BaseSource) self.storage = (an object derived from a class based on okscraper.storages.BaseStorage) def _scrape(self): # here you do the actual scraping based on source and storing to storage
-
class
okscraper.base.
ParsingFromFileTestCase
(methodName='runTest')[source]¶ base class for testing scrapers with input from a file
minimal implementation sample:
class MyScraperTestCase(ParsingFromFileTestCase): def _getScraperClass(self): return MyScraper def _getFilename(self): # this is a file containing test data return 'my_data_<<id>>.xml' def testParsing(self): self.assertScrape( args=(220), expectedData={'id': 220, 'name':'Hello World',} )
okscraper.storages¶
okscraper.sources¶
okscraper.cli.runner¶
-
class
okscraper.cli.runner.
Runner
(module_name, scraper_class_name=None, *args, **kwargs)[source]¶ Provides functionality for running a scraper from the command line
it gets a module_name and looks for a scrapers module under that module name
e.g. if module_name = lobbyists then the scrapers module is under lobbyists.scrapers
it then looks for a MainScraper class in that module and scrapes that class
alternatively - if scraper_class_name is provided it uses that scraper class
also - you can pass arbitrary args and kwargs which are passed to the scraper