Instructions
Add the element Tiktoken
Fields
🔹Text to Chunk: The input document or section of text you want to chunk.
🔹Chunk Size: Maximum number of tokens per chunk. Recommended: 300–800 tokens for embeddings.
🔹Chunk Overlap: Number of overlapping tokens between chunks. Ensures context continuity. Recommended: 30–100.
Exposed states
🔹Chunks: List of chunked text pieces.
🔹Token Count
🔹Token IDs:
✅ Best Practices
✔️ Use 500–800 tokens per chunk for embeddings (keeps semantic coherence).
✔️ Keep overlap between 30–100 tokens to avoid losing context across chunks.
✔️ Pre-chunk large documents before uploading to your Vector DB.
✔️ For PDFs, first extract clean text (avoid page headers/footers before chunking).