If I can get a running, fully functional scraper up, I will post it.Įdit 0.2: Removed deprecated elements of spider test, added explanation of this python module.Įdit 0.3: Discovering and exploring BeautifulSoup, which is looking like a much more precise and concise method of sorting through the mess of data. OS: Windows 10 Python Version: 2.7.11 (all dependencies resolved and installed) Scrapy version: 1.1.0Įdit 0.1: Nothing is working so far, seems there are deprecated parts to my code and even the best written pieces given have their own issues. Code doesn't run the 'click to next page' part well yet, but I'm headed in the right direction I'm using a Frankenstein of Scrapy, BeautifulSoup, Xpath, and Selenium to get closer to my true objective. (9.6.16 Update): It's very late and I'm going to upload this code just in case I lose it. However if all you need is a straightforward scraping tool this might be the project for you. This is all good and fine, however it still doesn't solve my problem, as I intend to scrape the contact information and descriptions as well from the job data. Look in the C:\Python27\Lib\site-packages\craigslist or similar directory for the _init_.py file, which holds some extra settings. It will allow you to scrape for all kinds of data. (9.5.16 Update): If you are looking to use a scraper to do many different things, I recommend using the module referred by /u/dante76, available here on GitHub. I'm soooo close - Please help me r/learnpython! I just want the power of python to help me find a job. I'm just trying to automate my own work, I don't work for a company or anything like that. I've been trying any sort of example I can find, reading every scrap of code on Github, and all I can do is pull the titles and the links. Next = driver.find_element_by_class_name('next') #Define link to the next page Item = titles.xpath("/html/body/section/section/section/section").extract() Item = titles.xpath("/html/body/section/section/header/div/div/div/ul/li/div").extract() Item = titles.xpath("/html/body/section/section/h2/span/span").extract() Reply = driver.find_element_by_class_name('reply_button') #Define link for reply button to open Please see the newest code below: import timeįrom import Keysįrom scrapy.linkextractors import LinkExtractorįrom lector import HtmlXPathSelectorįirst_page_xpath = '/html/body/section/section/header/div/a'ĭriver = webdriver.Chrome('C:\Python27\Chrome Driver\chromedriver_win32\chromedriver.exe')įirst.click() #Clicks link for first page The only problem is getting the data loop to cooperate! After this its supposed to scrape the rest of the data, which is my current error, and then it clicks to the next ad. I feel like I'm on the last leg of this journey and I just need a bit more help.Ĭurrently, my code clicks from the search page into the first results which allows the next button to be unhidden, then it clicks the reply button to reveal the contact information elements (this wasn't really necessary, I just thought it was cool to see). If you comment out the def parse_items loop, everything goes swimmingly. (: I have create code that is capable of everything I need, except extracting the data the way I want ?) I have referenced the following tutorials while trying to build this script: the documentation for Scrapy
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |