A service (also known as a "daemon") is a process that performs tasks in the background and responds to system events.
Services can be written using any language. We use Python in these examples as it is one of the most versatile languages out there.
For more information, be sure to read our blogpost on the subject.
To run any of the examples, you will need Python 3. We also recommend using a virtual environment.
python3 -m venv venvFirst, create a script that scrapes a website. Make sure the script also handles OS signals so that it exits gracefully.
linux_scrape.py:
importjsonimportreimportsignalfrompathlibimportPathimportrequestsfrombs4importBeautifulSoupclassSignalHandler: shutdown_requested=Falsedef__init__(self): signal.signal(signal.SIGINT, self.request_shutdown) signal.signal(signal.SIGTERM, self.request_shutdown) defrequest_shutdown(self, *args): print('Request to shutdown received, stopping') self.shutdown_requested=Truedefcan_run(self): returnnotself.shutdown_requestedsignal_handler=SignalHandler() urls= [ 'https://books.toscrape.com/catalogue/sapiens-a-brief-history-of-humankind_996/index.html', 'https://books.toscrape.com/catalogue/shakespeares-sonnets_989/index.html', 'https://books.toscrape.com/catalogue/sharp-objects_997/index.html', ] index=0whilesignal_handler.can_run(): url=urls[index%len(urls)] index+=1print('Scraping url', url) response=requests.get(url) soup=BeautifulSoup(response.content, 'html.parser') book_name=soup.select_one('.product_main').h1.textrows=soup.select('.table.table-striped tr') product_info={row.th.text: row.td.textforrowinrows} data_folder=Path('./data') data_folder.mkdir(parents=True, exist_ok=True) json_file_name=re.sub('[\': ]', '-', book_name) json_file_path=data_folder/f'{json_file_name}.json'withopen(json_file_path, 'w') asbook_file: json.dump(product_info, book_file)Then, create a systemd configuration file.
/etc/systemd/system/book-scraper.service:
[Unit] Description=A script for scraping the book information After=syslog.target network.target [Service] WorkingDirectory=/home/oxylabs/python-script-service/src/systemd ExecStart=/home/oxylabs/python-script-service/venv/bin/python3 main.py Restart=always RestartSec=120 [Install] WantedBy=multi-user.target Make sure to adjust the paths based on your actual script location.
A fully working example can be found here.
To create a Windows service, you will need to implement methods such as SvcDoRun and SvcStop and handle events sent by the operating system.
windows_scrape.py:
importsysimportservicemanagerimportwin32eventimportwin32serviceimportwin32serviceutilimportjsonimportrefrompathlibimportPathimportrequestsfrombs4importBeautifulSoupclassBookScraperService(win32serviceutil.ServiceFramework): _svc_name_='BookScraperService'_svc_display_name_='BookScraperService'_svc_description_='Constantly updates the info about books'def__init__(self, args): win32serviceutil.ServiceFramework.__init__(self, args) self.event=win32event.CreateEvent(None, 0, 0, None) defGetAcceptedControls(self): result=win32serviceutil.ServiceFramework.GetAcceptedControls(self) result|=win32service.SERVICE_ACCEPT_PRESHUTDOWNreturnresultdefSvcDoRun(self): urls= [ 'https://books.toscrape.com/catalogue/sapiens-a-brief-history-of-humankind_996/index.html', 'https://books.toscrape.com/catalogue/shakespeares-sonnets_989/index.html', 'https://books.toscrape.com/catalogue/sharp-objects_997/index.html', ] index=0whileTrue: result=win32event.WaitForSingleObject(self.event, 5000) ifresult==win32event.WAIT_OBJECT_0: breakurl=urls[index%len(urls)] index+=1print('Scraping url', url) response=requests.get(url) soup=BeautifulSoup(response.content, 'html.parser') book_name=soup.select_one('.product_main').h1.textrows=soup.select('.table.table-striped tr') product_info={row.th.text: row.td.textforrowinrows} data_folder=Path('C:\\Users\\User\\Scraper\\dist\\scrape\\data') data_folder.mkdir(parents=True, exist_ok=True) json_file_name=re.sub('[\': ]', '-', book_name) json_file_path=data_folder/f'{json_file_name}.json'withopen(json_file_path, 'w') asbook_file: json.dump(product_info, book_file) defSvcStop(self): self.ReportServiceStatus(win32service.SERVICE_STOP_PENDING) win32event.SetEvent(self.event) if__name__=='__main__': iflen(sys.argv) ==1: servicemanager.Initialize() servicemanager.PrepareToHostSingle(BookScraperService) servicemanager.StartServiceCtrlDispatcher() else: win32serviceutil.HandleCommandLine(BookScraperService)Next, install dependencies and run a post-install script.
PS C:\> cd C:\Users\User\Scraper PS C:\Users\User\Scraper> .\venv\Scripts\pip install pypiwin32 PS C:\Users\User\Scraper> .\venv\Scripts\pywin32_postinstall.py -install Bundle your script into an executable.
PS C:\Users\User\Scraper> venv\Scripts\pyinstaller --hiddenimport win32timezone -F scrape.py And finally, install your newly-created service.
PS C:\Users\User\Scraper> .\dist\scrape.exe install Installing service BookScraper Changing service configuration Service updated PS C:\Users\User\Scraper> .\dist\scrape.exe start Starting service BookScraper PS C:\Users\User\Scripts> A fully working example can be found here.
Instead of dealing with the Windows service layer directly, you can use the NSSM (Non-Sucking Service Manager).
Install NSSM by visiting the official website. Extract it to a folder of your choice and add the folder to your PATH environment variable for convenience.
Once you have NSSM installed, simplify your script by getting rid of all Windows-specific methods and definitions.
simple_scrape.py:
importjsonimportrefrompathlibimportPathimportrequestsfrombs4importBeautifulSoupurls= ['https://books.toscrape.com/catalogue/sapiens-a-brief-history-of-humankind_996/index.html', 'https://books.toscrape.com/catalogue/shakespeares-sonnets_989/index.html', 'https://books.toscrape.com/catalogue/sharp-objects_997/index.html', ] index=0whileTrue: url=urls[index%len(urls)] index+=1print('Scraping url', url) response=requests.get(url) soup=BeautifulSoup(response.content, 'html.parser') book_name=soup.select_one('.product_main').h1.textrows=soup.select('.table.table-striped tr') product_info={row.th.text: row.td.textforrowinrows} data_folder=Path('C:\\Users\\User\\Scraper\\data') data_folder.mkdir(parents=True, exist_ok=True) json_file_name=re.sub('[\': ]', '-', book_name) json_file_path=data_folder/f'{json_file_name}.json'withopen(json_file_path, 'w') asbook_file: json.dump(product_info, book_file)Bundle your script into an executable.
PS C:\Users\User\Scraper> venv\Scripts\pyinstaller -F simple_scrape.py And finally, install the script using NSSM.
PS C:\> nssm.exe install SimpleScrape C:\Users\User\Scraper\dist\simple_scrape.exe PS C:\Users\User\Scraper> .\venv\Scripts\pip install pypiwin32 A fully working script can be found here.
