Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!
  • Guest, before posting your code please take these rules into consideration:
    • It is required to use our BBCode feature to display your code. While within the editor click < / > or >_ and place your code within the BB Code prompt. This helps others with finding a solution by making it easier to read and easier to copy.
    • You can also use markdown to share your code. When using markdown your code will be automatically converted to BBCode. For help with markdown check out the markdown guide.
    • Don't share a wall of code. All we want is the problem area, the code related to your issue.


    To learn more about how to use our BBCode feature, please click here.

    Thank you, Code Forum.

Node.JS Pulling images from a website and putting it into my on

Charles3928

New Coder
Hey! I need some help pulling images from this website: https://www.mindat.org/gm/4085, which includes images of certain rocks. I want to create a identification game for my website to have it choose a random image and have the user try to guess it for points. Could I get some help? Thanks
 
Hey! I need some help pulling images from this website: https://www.mindat.org/gm/4085, which includes images of certain rocks. I want to create a identification game for my website to have it choose a random image and have the user try to guess it for points. Could I get some help? Thanks
Hi there,
One solution would be to physically grab all the image links, hardcode them into an array, where you can then randomize the index that gets displayed. Alternatively, if you would like to automate this, you could request the page html, store it as a string, parse through the html-string and pull out all the image tags and store those in the array.
 
Ha, another mindat.org fan ! Great site, especially for having photos of each location as well as listing all minerals found there.
But how to get a random image, or a list of all images, from this site I don't know. Unless they offer a function specifically for listing all images. You may want to contact them to ask. Maybe they have an API for it, although I doubt it. Note that your game could only work if you can also get the name of the mineral(s) in question. The name of an image offers no clue about that. That seems an extra hurdle.
Having said that, there surely exists software to trawl an entire site and extracting everything you want. Google bots do it all the time.
 
You might could do something like this. You would need to add the site url. The site admin may not be happy with hot linking the images.
JavaScript:
import axios from 'axios';

async function getPage() {
    try {
        const response = await axios.get('https://www.mindat.org/gm/4085');
        return response.data;
    } catch (err) {
        console.log(err);
    }
}

var re = new RegExp(/<img.*?>/g);
var page = await getPage();
var imgs = [];
var img = '';


while(null != (img=re.exec(page))){
    imgs.push(img[0]);
}

for(let i=0; i< imgs.length; i++){
    console.log(imgs[i]);
}
 
Gets all images on the page and replaces src in <img src..... /> to full image url. Returns list of images.
JavaScript:
// Import axios
import axios from 'axios';

// Function for getting webpage
async function getPage() {
    try {
        const response = await axios.get('https://www.mindat.org/gm/4085');
        return response.data;
    } catch (err) {
        console.log(err);
    }
}

// Set some variables for regrex
var add_url = new RegExp(/src="(.*?)"/g)
var re = new RegExp(/<img.*?>/g);

// Set a variable for getPage function
var page = await getPage();

// Set empty list for images and empty img variable
var imgs = [];
var img = '';

// Set a variable for website to append for images / set empty url variable
var addurl = 'src = "https://www.minedat.org';
var url = ''

// Use while loop to get all images and append to get correct image url
// Append to image list
while(null != (img=re.exec(page))){
    url = img[0].match(add_url);
    img[0] = img[0].replace(url, addurl+url[0].slice(5))
    imgs.push(img[0]);
}

// Print images list to console
console.log(imgs)
 
Ii was my understanding that the OP wanted to grab a random image from this site. Not specifically the one from this particular page, that was just an example. There are dozens of web scraping tools which might be able to retrieve a list of all images on the site.

But it seems the OP is not much interested in our replies...
 
True. I don't think I would want to download or even scrape all the pages. The 4085 is just a category or the like. There are over 4000 pages. Would probably be better to just grab a random images from one of the random cats. Would still need to build a list of images though.
 
This is all the cats I could find. Updated code to pick a random cat and choose a random image from the list. Some pages do not have images and I do not know if it will throw an error. Pull mineral name from page.

JavaScript:
// Import axios
import axios from 'axios';
// Function for getting webpage
async function getPage() {
    try {
        var num = Math.floor(Math.random() * 4456)
        const response = await axios.get('https://www.mindat.org/gm/' + num);
        return response.data;
    } catch (err) {
        console.log(err);
    }
}
// Set some variables for regrex
var add_url = new RegExp(/src="(.*?)"/g);
var re = new RegExp(/<img.*?>/g);
var mine = new RegExp(/<h1>(.*?)<\/h1>/g);
// Set a variable for getPage function
var page = await getPage();
// Set empty list for images and empty img variable
var imgs = [];
var img = '';
// Set a variable for website to append for images / set empty url variable
var addurl = 'src = "https://www.mindat.org';
var url = ''
// Use while loop to get all images and append to get correct image url
// Append to image list
while (null != (img = re.exec(page))) {
    url = img[0].match(add_url);
    img[0] = img[0].replace(url, addurl + url[0].slice(5))
    imgs.push(img[0]);
}
// Fing mineral name
var name = page.match(mine);
// Remove html tags
var tags = /(<([^>]+)>)/ig
name = name[0].replace(tags, '')
// Pull random image from list
var randnum = Math.floor(Math.random() * imgs.length)
console.log(name);
console.log(imgs[randnum])
// Loop through imgs list
// for(let i=0; i<imgs.length; i++){
//     console.log(imgs[i])
// }
 
Last edited:
I'm not sure how you find cats on a minerals site 😄 But it's an interesting idea just to try all pages with that name pattern. It's a fair assumption that you'll get many of the images that way. I'll play around with your code, thanks.
 
@menator01: When I execute your code (using the original url https://www.mindat.org/gm/4085 )in the Node.js REPL I get this error

AxiosError: Request failed with status code 403

Axios also dumps the response data, which after cleaning all the quotes and linefeeds and pasting it in Chrome looks like this :

a.jpg

The same URL works fine in a browser of course. What can be the matter here ? Does this Node.js code have to run in a browser ?
 
I've not figured out how to run in browser. Been running in command line. Can't seem to get axios to play well with a browser.
 

Attachments

  • Screenshot from 2022-10-11 14-20-56.png
    Screenshot from 2022-10-11 14-20-56.png
    195.4 KB · Views: 4
Hmmmm... the axios.get call works fine when I replace the URL with https://www.google.com. So it's not my setup that is wrong, but definitely something peculiar with mindat.org. But why are you not getting the same !?
 
If you are using the original code I had an error in the URL for images in the regrex. Had minedat instead of mindat. Check that. It's in the addurl variable.
Yes I know. That was the first thing that caught my eye and I had already corrected it. But it did not even get that far, it was failing on the axios.get call.
 
I can only get it to run in vs studio. No success in a shell using nodejs command
So in a shell with nodejs command, what exactly happens for you ? Do you also get the 403 error ?

The plot thickens... I tried it from Visual Studio Code, but get the same error:

g.jpg
 

New Threads

Latest posts

Buy us a coffee!

Back
Top Bottom