Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!
  • Guest, before posting your code please take these rules into consideration:
    • It is required to use our BBCode feature to display your code. While within the editor click < / > or >_ and place your code within the BB Code prompt. This helps others with finding a solution by making it easier to read and easier to copy.
    • You can also use markdown to share your code. When using markdown your code will be automatically converted to BBCode. For help with markdown check out the markdown guide.
    • Don't share a wall of code. All we want is the problem area, the code related to your issue.


    To learn more about how to use our BBCode feature, please click here.

    Thank you, Code Forum.

JavaScript Javascript word tokenizer library with support for multiple languages (as many as possible)

Ehzuiq

New Coder
I am looking for a word tokenizer library for node.js, that supports as many languages as possible. I'd like to pass in a string like: tokenize('Hello, world!', 'en') and have it return ['Hello', 'world']. The number of supported languages is more important than precision.
 
String tokenizing and language translation are two very distinct functions. I don't think you will find a library combining the two. Also, a library supporting "as many languages as possible" might be very large. Why not first run your string through the Google Translate API or an alternative, and then tokenize the result with the default split function ?
 

New Threads

Latest posts

Buy us a coffee!

Back
Top Bottom