Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!
  • Guest, before posting your code please take these rules into consideration:
    • It is required to use our BBCode feature to display your code. While within the editor click < / > or >_ and place your code within the BB Code prompt. This helps others with finding a solution by making it easier to read and easier to copy.
    • You can also use markdown to share your code. When using markdown your code will be automatically converted to BBCode. For help with markdown check out the markdown guide.
    • Don't share a wall of code. All we want is the problem area, the code related to your issue.


    To learn more about how to use our BBCode feature, please click here.

    Thank you, Code Forum.

Node.JS Scraping and Automation

jason

Coder
Hey There!

Basically, I want to make a website that can scrape data from other websites without using an API in the same way that cURL in PHP scrapes data from other websites. I would like to know if it is possible to do this type of programming in NodeJs. My final year project will be a NodeJs project that focuses on automation and that I wish to do in NodeJs. Would anyone please guide me in this regard.

Thanks.
 
Sure, I will provide you reference number to checking the functionality of the site.
02113160274355U Here is the reference number for checking purpose.
Go the mention website and enter above reference number and website show you some information regarding the reference number from where you can check the website functionality.
 
I don't have experience in python but I'm shocked from your output. How you can do that within short time. There's no doubt you're a professional programmer. It would be great if I could learn these high-end programming skills. I would appreciate if you could suggest where I should begin. Can you tell me what are the prerequisites for learning Python, and from which topic should I begin learning it?

Which framework do you use for scraping this?
 
I was going to do a video but i have been so busy lately but i might do next week and upload it explaining everything :D

In a nutshell, Coding is about thinking outside the box. We know this data is coming from somewhere all I did was request the page with data and then converted the returned page into BS4 and got the data.

Here is my code :)
Python:
from bs4 import BeautifulSoup
import html
import requests

def getbill(refrence,bill):

    #Request data and convert it into BS4

    data = {
        'referenceId': refrence,
        'billname': bill,
        'srzEnonce': '008ef72084',
    }

    response = requests.post('https://www.pakistanbills.com/lesco-bill-online/',data=data)
    HTML = html.unescape(response.text)
    soup = BeautifulSoup(HTML, features="lxml")

    #Get refrence
    found = {}
    found['Reference No'] = soup.find("div",{"id": "refNum"}).text.split(":")[1].strip()

    #Get the table and convert to dict
    table = soup.find("table",{"class":"billData"})

    #get trs in table
    for tr in table.find_all("tr"):

        #Get fieldname and value
        values = tr.find_all("td")
        fieldname = values[0].text.strip()
        value = values[1].text.strip()

        #turn into dict
        found[fieldname] = value

    print(found)


getbill("000000000000000","lescobill")
 
Last edited:
That's very interesting, thanks. I had not heard about screen scraping stuff like Beautifulsoup, will look into it and have some fun. Just one question for now: how did you figure out the format of the post data to get the specific bill ?
 
Acchm...

Is this actually test data or a real, live billing site in pakistan.

Let's not forget that scrapping is often used to iterate over "only accidentally" public data, with the intention of leaking/selling it later.

Don't want everyone on the thread to commit a potential crime accidentally... or maybe I've spent too long working in high secure environments and I'm paranoid.

I mean... being able to scrap, import and re-present someone's bill on you own website. I wonder what that could be used for.
 
Last edited:
As far as I am aware as you do not need to login to the site it makes the information public and fair game :D and my script does not bypass any security features or use any vulnerabilities I see no issues :D

So to get the data I used Chrome developer menu, and watched the network as I submit the forum on the website then copied their request and used it as my own. I never knew anyone would be interested in what i do lol maybe i should start explaining what i do more :D
 
As far as I am aware as you do not need to login to the site it makes the information public and fair game :D and my script does not bypass any security features or use any vulnerabilities I see no issues :D
I guess we are all ethical hackers here 😄

So to get the data I used Chrome developer menu, and watched the network as I submit the forum on the website then copied their request and used it as my own. I never knew anyone would be interested in what i do lol maybe i should start explaining what i do more :D
Ah yes of course, easy as pie. I hadn't thought about that. Personally I'm always interested in learning some new stuff, as long as it is not too complicated and abstract.
 
It has been tested in court many times already. It depends entirely on what you are scrapping and what you are doing with it.

Was it not the (fatal) case of Swartz who got put in jail long term for incrementing an accountID in the GET request of the AT&T website, resulting in him being able to see other peoples bills, including billing information and PII. He ended up topping himself IIRC.

What he did wrong was to kick the hornets nest and published the scraped data on a forum, instead of contacting AT&T first. Can't recall exactly.

If you do a quick google around the topic you'll find out how thin the ice is you stand on.

I work in Tier one secure enterprise, with certifications, so I am NOT clicking any of the URLs in this thread.

Worst case scenario:
Our friend, or someone he knows is part of a scam caller ring. They want to devise a scam where by, given the accidentally public bills on this website, they can get a reference number, use it to look up a bill, get the contact info, call the person, screenshare the scrapped, modified bill and invite them to pay it over the phone.

This is a classic screen share scam used on old people around the world and is usually conducted by remote access and editing the HTML in the developer view while blanking the users screen. With a valid, scrapped a modified bill presented on the screen, even a phone screen, the caller could be dubed and it doesn't even require remote access!

I would advise a course in Professional Ethics and legalities in Software as well. You will be surprised how many coders have gone to jail and how easy it is to end up there if you "kick the wrong hornets nest". Rule No. 1: If it costs someone money, they will want at least your arm. If it costs a lot of people money, they will want your head. Nothing else matters.

In Uni our case study was about a senior developer at a robot factory. He wrote the code that moved the robots around. He was known to be arrogant, cavalier and often took short cuts in testing.
He allegedly formulated the limit zoning code at lunch on a napkin and discussed it with a college.
Later that month a test robot (the kind that welds and paints cars), flipped 359* left instead 1* right and crushed an employee against a wall killing him.**
Long story short, the engineer was found liable. The code written was said to have matched the scrap code on the napkin that day and had several serious corner case issues that could result in "out of bounds" operation of the robot.
He went to jail for quite a long time.

** In another thread about the % operator. Someone said the point was mute. I disagree. Sometimes it's very, very important to understand the details, although we agree it's better to rewrite it a different way that is less ambiguous.
 
Last edited:
Hey There!

Basically, I want to make a website that can scrape data from other websites without using an API in the same way that cURL in PHP scrapes data from other websites. I would like to know if it is possible to do this type of programming in NodeJs. My final year project will be a NodeJs project that focuses on automation and that I wish to do in NodeJs. Would anyone please guide me in this regard.

Thanks.
HI there,

One question: what would be the purpose for this scraping website? Out of curiosity. Also, one thing to keep in mind, is the fact not all websites have the same css stylings nor formatting.
 
Acchm...

Is this actually test data or a real, live billing site in pakistan.

Let's not forget that scrapping is often used to iterate over "only accidentally" public data, with the intention of leaking/selling it later.

Don't want everyone on the thread to commit a potential crime accidentally... or maybe I've spent too long working in high secure environments and I'm paranoid.

I mean... being able to scrap, import and re-present someone's bill on you own website. I wonder what that could be used for.
That is why I am here lol.
 

New Threads

Latest posts

Buy us a coffee!

Back
Top Bottom