Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!
  • Guest, before posting your code please take these rules into consideration:
    • It is required to use our BBCode feature to display your code. While within the editor click < / > or >_ and place your code within the BB Code prompt. This helps others with finding a solution by making it easier to read and easier to copy.
    • You can also use markdown to share your code. When using markdown your code will be automatically converted to BBCode. For help with markdown check out the markdown guide.
    • Don't share a wall of code. All we want is the problem area, the code related to your issue.


    To learn more about how to use our BBCode feature, please click here.

    Thank you, Code Forum.

C# Couldn’t get equations in html when convert word .docx file to html file in C#.

coductexam

New Coder
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.

At the time of converting from word file to html my equations which are in the word document file was convert into image.
C#:
Globals.ThisAddIn.Application.ActiveDocument.Select();
Microsoft.Office.Interop.Word.Document doc = Globals.ThisAddIn.Application.ActiveDocument;

string result = Path.GetTempPath();

string tmpFileName = Globals.ThisAddIn.Application.ActiveDocument.FullName;
doc.SaveEncoding = Microsoft.Office.Core.MsoEncoding.msoEncodingUSASCII;
if (File.Exists(result + "temp.html"))
{
    File.Delete(result + "temp.html");
}
doc.SaveAs(result + "temp.html", WdSaveFormat.wdFormatFilteredHTML);

doc.Close(Microsoft.Office.Interop.Word.WdSaveOptions.wdDoNotSaveChanges);

HtmlAgilityPack.HtmlDocument mangledHTML = new HtmlAgilityPack.HtmlDocument();
mangledHTML.Load(result + "temp.html");


if (File.Exists(result + "newtemp.html"))
{
    File.Delete(result + "newtemp.html");
}

mangledHTML.Save(result + "newtemp.html");
// Remove standalone CRLF

string badHTML = File.ReadAllText(result + "newtemp.html");
badHTML = badHTML.Replace("\r\n\r\n", "ackThbbtt ");
badHTML = badHTML.Replace("\r\n", " ");
badHTML = badHTML.Replace("ackThbbtt ", "\r\n");
badHTML = badHTML.Replace('�', ' ');
if (File.Exists(result + "finaltemp.html"))
{
    File.Delete(result + "finaltemp.html");
}
File.WriteAllText(result + "finaltemp.html", badHTML);

// Clean up temp files, show the finished result in Notepad
File.Delete(result + "temp.html");
File.Delete(result + "newtemp.html");

Microsoft.Office.Interop.Word.Document orignalDoc = new Document();
orignalDoc = Globals.ThisAddIn.Application.Documents.Open(tmpFileName);

Basically, what I want to do is I want to store all word document paragraph data separately in database and I also want it’s all property like font size, font width, font name and font style. So that I can show it in my application as it is as I written in word document file.

To represent it as it is I need to convert it html format and the by sepreting all paragraphs I can store it in database. But when in my word document has paragraph which have equations then
C#:
Globals.ThisAddIn.Application.ActiveDocument.Select();

Microsoft.Office.Interop.Word.Document doc = Globals.ThisAddIn.Application.ActiveDocument;



string result = Path.GetTempPath();



string tmpFileName = Globals.ThisAddIn.Application.ActiveDocument.FullName;

doc.SaveEncoding = Microsoft.Office.Core.MsoEncoding.msoEncodingUSASCII;

This code converts my word documents all equations in Images and as it convert in image I can’t show the equation properly in my application.

So I tried to convert this equations in MATHML form but I couldn’t solve this.
 

New Threads

Buy us a coffee!

Back
Top Bottom