Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!
  • Guest, before posting your code please take these rules into consideration:
    • It is required to use our BBCode feature to display your code. While within the editor click < / > or >_ and place your code within the BB Code prompt. This helps others with finding a solution by making it easier to read and easier to copy.
    • You can also use markdown to share your code. When using markdown your code will be automatically converted to BBCode. For help with markdown check out the markdown guide.
    • Don't share a wall of code. All we want is the problem area, the code related to your issue.


    To learn more about how to use our BBCode feature, please click here.

    Thank you, Code Forum.

Trying to Scrape an Array

I'm trying to scrape the following:


HTML:
<tr>
        <td style="font-family:eurof;font-size:14px;padding-top:0px;padding-bottom:5px;"><a style="font-family:eurof;font-size:14px;" href="jewelry">JEWELRY</a> &nbsp;&gt;&nbsp; <a style="font-family:eurof;font-size:14px;" href="jewelry/anklet">ANKLET</a> &nbsp;&gt;&nbsp; <a style="font-family:eurof;font-size:14px;" href="jewelry/anklet/fashion">FASHION</a> &nbsp;&gt;&nbsp; <a style="font-family:eurof;font-size:14px;" href="jewelry/anklet/fashion/"></a>
        </td>
      </tr>
I use:

Code:
var categories = [];
const cheerio = require("cheerio");
const $ = await cheerio.load(content);
categories.push($('a[style="font-family:eurof;font-size:14px;"]').text());

console.log(categories);

The results of my console.log are:

[ 'JEWELRYANKLETFASHION' ]

I want to get

[ 'JEWELRY','ANKLET','FASHION' ]
 
Last edited by a moderator:
Hello,

Please use the BBCode, the notice is at the top of every forum as a helpful reminder.

Also can you please share the solution followed by the source. Some coders may not feel comfortable leaving the site and makes it quick and easy.
 
That's strange that you are getting a return like that.
I replicated it in Vanilla JS and had no issues:
JavaScript:
<script>
window.onload = function(){
    var categories = [];
    const cheerio = require("cheerio");
    const $ = await cheerio.load(content);
    document.querySelectorAll("a[style='font-family:eurof;font-size:14px;']").forEach(function(link){
        if(typeof link.innerText.toLowerCase() == "string" && link.innerText.trim().length > 0){
            categories.push(link.innerText);
        }

    });
    console.log(categories);
}

</script>

To test it working without Cheerio code:
HTML:
        <td style="font-family:eurof;font-size:14px;padding-top:0px;padding-bottom:5px;"><a style="font-family:eurof;font-size:14px;" href="jewelry">JEWELRY</a> &nbsp;&gt;&nbsp; <a style="font-family:eurof;font-size:14px;" href="jewelry/anklet">ANKLET</a> &nbsp;&gt;&nbsp; <a style="font-family:eurof;font-size:14px;" href="jewelry/anklet/fashion">FASHION</a> &nbsp;&gt;&nbsp; <a style="font-family:eurof;font-size:14px;" href="jewelry/anklet/fashion/"></a>

JavaScript:
<script>

window.onload = function(){

    var categories = [];

    document.querySelectorAll("a[style='font-family:eurof;font-size:14px;']").forEach(function(link){

        if(typeof link.innerText.toLowerCase() == "string" && link.innerText.trim().length > 0){

            categories.push(link.innerText);

        }



    });

    console.log(categories);

}



</script>
 
Thanks, Ghost.
When I use your code, I get the error:
ReferenceError: document is not defined

I tried removing the document key word and got

ReferenceError: querySelectorAll is not defined
 
Thanks, Ghost.
When I use your code, I get the error:
ReferenceError: document is not defined
I tried removing the document key word and got
ReferenceError: querySelectorAll is not defined

That's odd, but must be due to the system you are working in.
Try this - it uses jQuery and loops through each result it finds.
Your code is clumping them together because you are pushing ALL of the selected elements found into categories at once, instead of going through each found element.
JavaScript:
var categories = [];
const cheerio = require("cheerio");
const $ = await cheerio.load(content);
$('a[style="font-family:eurof;font-size:14px;"]').each(function(resultnum){
    if(typeof $(this).text().toLowerCase() == "string" && $(this).text().length > 0){
        categories.push($(this).text());
    }
})
console.log(categories);
 
Thank you Ghost. That worked but now I need to get the elements of the array separately to put in the return statement. I tried:
return {categories[0],categories[1], categories[2]}
but I got the error Unexpected Token
Do you know how to do this?
 
I circumvented the error by putting the return statement right after the block so I didn't get the error

ReferenceError: singleObject is not defined

return {singleObject};

The problem now is that I get the entire object in one cell of my spreadsheet like this:

{"JEWELRY":"some value","ANKLET":"some value","FASHION":"some value","":"some value"}

I want each property of the object to have its own cell.
 
I finally got it to work with a method that I had tried before I moved the return statement. It didn't work previously but it does now.

let category1 = categories[0];
let category2 = categories[1];
let category3 = categories[2];

return {category1, category2, category3};
 

New Threads

Latest posts

Buy us a coffee!

Back
Top Bottom