Hey all,
So i am scraping data from one website but i have to run proxy's, so i load a proxy, check it with 100 different headers and if one lets me in i get what i need and move onto the next one SIMPLES. Now for the issue, its so so slow and being that i can only use UK proxys its taken my pool from 9000 to 250 so that's having a bit of a impact. I thought maybe adding Concurrent to my script would work and it does but not enough, i am currently doing 4 pages a second and i have got 100,000 to do :S Now dont get me wrong, i have come from 1-2 pages a every 2-3 seconds so its getting better but i need some fresh ideas to speed it up
So i am scraping data from one website but i have to run proxy's, so i load a proxy, check it with 100 different headers and if one lets me in i get what i need and move onto the next one SIMPLES. Now for the issue, its so so slow and being that i can only use UK proxys its taken my pool from 9000 to 250 so that's having a bit of a impact. I thought maybe adding Concurrent to my script would work and it does but not enough, i am currently doing 4 pages a second and i have got 100,000 to do :S Now dont get me wrong, i have come from 1-2 pages a every 2-3 seconds so its getting better but i need some fresh ideas to speed it up