Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!
  • Guest, before posting your code please take these rules into consideration:
    • It is required to use our BBCode feature to display your code. While within the editor click < / > or >_ and place your code within the BB Code prompt. This helps others with finding a solution by making it easier to read and easier to copy.
    • You can also use markdown to share your code. When using markdown your code will be automatically converted to BBCode. For help with markdown check out the markdown guide.
    • Don't share a wall of code. All we want is the problem area, the code related to your issue.


    To learn more about how to use our BBCode feature, please click here.

    Thank you, Code Forum.

Python how to grouping/merge rows in to single row

88arvin

New Coder
Greetings!

Actually, I converted the PDF file containing the tables into a Pandas dataframe and then into Excel. Some cells in a PDF document contain multiline text.
I've previously converted PDFs into a Pandas dataframe and then into Excel, but in those PDFs, the cells with multiline text had a \n at the end of the line, so I managed to make the multiline text into a single line/cell, but in this PDF there is no \n.

So I want the text into one line/cell, but I am not able to do so. Can anybody please help me with the same?

I hope I am able to make you understand my question.

I am also attaching pictures of what I have in my Pandas dataframe and what I want for your reference.

This is what I getting after exporting dataframe into excel

INPUT1.png



And this is I want
OUTPUT1.png


Thanks in advance
 
So there are 3 stages of data here " PDF, Pandas data frame, and Excel sheet. If I understand correctly, the problem is already apparent in your Pandas data frame, and if you were to correct that, the Excel sheet would also be fine ? Please confirm, or else explain more.
So exactly how do you convert a PDF to a Pandas data frame ? Can we see the PDF and the code for that ?
 
So there are 3 stages of data here " PDF, Pandas data frame, and Excel sheet. If I understand correctly, the problem is already apparent in your Pandas data frame, and if you were to correct that, the Excel sheet would also be fine ? Please confirm, or else explain more.
So exactly how do you convert a PDF to a Pandas data frame ? Can we see the PDF and the code for that ?
I converted the PDF into Pandas using Tabula. I'm sorry, but because it's a bank statement(confidential), I can't give you guys access to the PDF.
 
Can you please tell me how to add a new column before the PostDate column and enter a serial number where the PostDate column contains a value, and keep the space empty where the PostColumn contains nothing.


Code:
   NewColumn                PostDate
      1                    01-04-2012
      2                    03-04-2012
      3                    05-04-2012



      4                    10-04-2012
 
Back
Top Bottom