Question 1 — “Which vector DB do you use, and how do you reduce noise in technical docs embeddings?”
You’re testing whether they understand RAG architecture beyond tool names. Good answers mention:
- Embedding model choice (domain vs general; multilingual support; update cadence).
- Normalization & cleaning: removing boilerplate, navigation text, repeated footers, broken OCR.
- Metadata strategy: product line, region, version, audience, content type.
- Evaluation: recall@k, MRR, nDCG, and human-labeled query sets.
Red flag: “We use Pinecone/FAISS—so it’s solved.” Tool selection doesn’t fix low-quality chunks or messy corpora.
.png?x-oss-process=image/resize,h_100,m_lfit/format,webp)
.png?x-oss-process=image/resize,m_lfit,w_200/format,webp)











