AI
I'm having trouble with text shifting during PDF extraction. I'm using [Name of tool/library you're using] and when I extract text from certain PDFs, the content seems to shift or become misaligned. For example, text that should be in one column ends up in another, or lines overlap.
Has anyone else encountered this issue? Are there any known solutions or workarounds? I've tried [Mention any troubleshooting steps you've already taken, e.g., different extraction methods, pre-processing the PDF], but haven't had any luck. Any help or suggestions would be greatly appreciated! I'm also interested in hearing about other PDF extraction tools that handle complex layouts well. Thanks in advance!
I am using pdf-parse library , I've attached a couple of screenshots to illustrate the problem.
and also i am checking with python library its extracting like this
Z WEI DRIT TEL
ALLER FÜHRENDEN FAHRZEUGHERSTELLER
ENTSCHEIDEN SICH FÜR
GETRIEBEÖLE VON CASTROL*
WÄHLEN SIE CASTROL TRANSMAX.
VERLÄNGERT DIE GETRIEBELEBENSDAUER.
* Basierend auf LMCA-Daten für die OEMs mit den
meisten Verkäufen (gesamte Neuwagenverkäufe)
im Jahr 2019. Verwendung im Rahmen der
OEM-Werksbefüllung.
Cyan
CR16761_H267503_P605434 Castrol Magenta
Yellow
Germany 594.00 x 420.00 mm Black
600.00 x 426.00 mm
Hogarth Worldwide
01/06/2023 18:46
If anyone has any insights into this issue, your help would be greatly appreciated. Thanks in advance!
I'm having trouble with text shifting during PDF extraction. I'm using [Name of tool/library you're using] and when I extract text from certain PDFs, the content seems to shift or become misaligned. For example, text that should be in one column ends up in another, or lines overlap.
Has anyone else encountered this issue? Are there any known solutions or workarounds? I've tried [Mention any troubleshooting steps you've already taken, e.g., different extraction methods, pre-processing the PDF], but haven't had any luck. Any help or suggestions would be greatly appreciated! I'm also interested in hearing about other PDF extraction tools that handle complex layouts well. Thanks in advance!
I am using pdf-parse library , I've attached a couple of screenshots to illustrate the problem.
and also i am checking with python library its extracting like this
Z WEI DRIT TEL
ALLER FÜHRENDEN FAHRZEUGHERSTELLER
ENTSCHEIDEN SICH FÜR
GETRIEBEÖLE VON CASTROL*
WÄHLEN SIE CASTROL TRANSMAX.
VERLÄNGERT DIE GETRIEBELEBENSDAUER.
* Basierend auf LMCA-Daten für die OEMs mit den
meisten Verkäufen (gesamte Neuwagenverkäufe)
im Jahr 2019. Verwendung im Rahmen der
OEM-Werksbefüllung.
Cyan
CR16761_H267503_P605434 Castrol Magenta
Yellow
Germany 594.00 x 420.00 mm Black
600.00 x 426.00 mm
Hogarth Worldwide
01/06/2023 18:46
If anyone has any insights into this issue, your help would be greatly appreciated. Thanks in advance!