Richie C
New Coder
Hi,
I have an html page were user upload an .srt (subtitle file) and I want to clean/remove unnecessary text.
Example uploaded .srt file:
9
00:00:31,690 --> 00:00:35,550
have taken years to complete. Due to this, no
10
00:00:35,550 --> 00:00:38,100
single donkey could have possibly completed the
11
00:00:38,100 --> 00:00:41,790
journey, so this resulted in unplanned matings,
The result that I want is:
have taken years to complete. Due to this, no single donkey could have possibly completed the journey, so this resulted in unplanned matings,
The code I use at the moment is:
But this leave the text like this:
9 have taken years to complete. Due to this, no 10 single donkey could have possibly completed the 11 journey, so this resulted in unplanned matings,
What's the best way to remove the ascending numbers? (they can go from 1 digit to 3 digits long)
Thanks
I have an html page were user upload an .srt (subtitle file) and I want to clean/remove unnecessary text.
Example uploaded .srt file:
9
00:00:31,690 --> 00:00:35,550
have taken years to complete. Due to this, no
10
00:00:35,550 --> 00:00:38,100
single donkey could have possibly completed the
11
00:00:38,100 --> 00:00:41,790
journey, so this resulted in unplanned matings,
The result that I want is:
have taken years to complete. Due to this, no single donkey could have possibly completed the journey, so this resulted in unplanned matings,
The code I use at the moment is:
JavaScript:
reader.onload = (e) => {
//console.log(files[i].name, e.target.result);
var fileName = files[i].name;
var text = e.target.result;
text = text.replace(/WEBVTT[\r\n]/,"");
text = text.replace(/NOTE duration:.*[\r\n]/,"");
text = text.replace(/NOTE language:.*[\r\n]/,"");
text = text.replace(/NOTE Confidence:.+\d/g,"");
text = text.replace(/NOTE recognizability.+\d/g,"");
text = text.replace(/[\r\n].+-.+-.+-.+-.+/g,"");
text = text.replace(/[\r\n].+ --> .+[\r\n]/g,"");
text = text.replace(/.[\r\n]. --> .+[\r\n]/g,"");
text = text.replace(/[\n](.)/g," $1");
text = text.replace(/[\r\n]+/g,"");
text = text.replace(/^ /,"");
var heading = document.createElement('h3');
document.body.appendChild(heading);
heading.innerHTML = "Transcript for '" + files[i].name + "'";
var copyButton = document.createElement('button');
document.body.appendChild(copyButton);
copyButton.onclick = function() {copyToClip(text,fileName); };
copyButton.innerHTML = "Copy transcript";
copyButton.className = "copyButton";
var div = document.createElement('div');
document.body.appendChild(div);
div.className = "cleanVTTText";
div.innerHTML = text;
But this leave the text like this:
9 have taken years to complete. Due to this, no 10 single donkey could have possibly completed the 11 journey, so this resulted in unplanned matings,
What's the best way to remove the ascending numbers? (they can go from 1 digit to 3 digits long)
Thanks