Thai natural language processing library in Rust, with Python and Node bindings.
Thai natural language processing library in Rust,
with Python and Node bindings. Formerly oxidized-thainlp.
To use as a library in a Rust project:
cargo add nlpo3
To use as a library in a Python project:
pip install nlpo3
Vec<String>
See nlpo3-nodejs.
Example:
from nlpo3 import load_dict, segment
load_dict("path/to/dict.file", "dict_name")
segment("สวัสดีครับ", "dict_name")
See more at nlpo3-python.
To use as a library in a Rust project:
cargo add nlpo3
It will add “nlpo3” to Cargo.toml
:
[dependencies]
# ...
nlpo3 = "1.4.0"
Create a tokenizer using a dictionary from file,
then use it to tokenize a string (safe mode = true, and parallel mode = false):
use nlpo3::tokenizer::newmm::NewmmTokenizer;
use nlpo3::tokenizer::tokenizer_trait::Tokenizer;
let tokenizer = NewmmTokenizer::new("path/to/dict.file");
let tokens = tokenizer.segment("ห้องสมุดประชาชน", true, false).unwrap();
Create a tokenizer using a dictionary from a vector of Strings:
let words = vec!["ปาลิเมนต์".to_string(), "คอนสติติวชั่น".to_string()];
let tokenizer = NewmmTokenizer::from_word_list(words);
Add words to an existing tokenizer:
tokenizer.add_word(&["มิวเซียม"]);
Remove words from an existing tokenizer:
tokenizer.remove_word(&["กระเพรา", "ชานชลา"]);
Example:
echo "ฉันกินข้าว" | nlpo3 segment
See more at nlpo3-cli.
Generic test:
cargo test
Build API document and open it to check:
cargo doc --open
Build (remove --release
to keep debug information):
cargo build --release
Check target/
for build artifacts.
nlpO3 is copyrighted by its authors
and licensed under terms of the Apache Software License 2.0 (Apache-2.0).
See file LICENSE for details.