llamaindex Implementation of GraphRAG[1-工具选择]

status

type

date

slug

summary

category

icon

password

以下文字从https://docs.llamaindex.ai/en/stable/examples/cookbooks/GraphRAG_v1/中粘贴性总结总结性粘贴

后续还会更新

Approximate implementation of GraphRAG

两步:

Graph Generation - Creates Graph, builds communities and its summaries over the given document.

Source Documents to Text Chunks: Source documents are divided into smaller text chunks for easier processing. —> SentenceSplitter with a chunk size of 1024 and chunk overlap of 20 tokens
Text Chunks to Element Instances: Each text chunk is analyzed to identify and extract entities and relationships, resulting in a list of tuples that represent these elements.( 建立triples) (how local? 这里看了一眼用的是openai 正在找用本地ollma 方式 )
Element Instances to Element Summaries: The extracted entities and relationships are summarized into descriptive text blocks for each element using the LLM. —> GraphRAGExtractor
Element Summaries to Graph Communities: These entities, relationships and summaries form a graph, which is subsequently partitioned into communities using algorithms using Heirarchical Leiden(关系网上关系检测 ,社区聚类,递归收敛类似k-means ) to establish a hierarchical structure.
Graph Communities to Community Summaries: The LLM generates summaries for each community, providing insights into the dataset’s overall topical structure and semantics.

—> GraphRAGStore

Answer to the Query - Use summaries of the communities created from step-1 to answer the query.

Community Summaries to Global Answers: The summaries of the communities are utilized to respond to user queries. This involves generating intermediate answers, which are then consolidated into a comprehensive global answer. —> GraphQueryEngine

以下流程先假设有openai api

load csv [’title’ ‘date’ ‘text’] —>

Extraction Process:

For each input node (chunk of text):

It sends the text to the LLM along with the extraction prompt.

The LLM's response is parsed to extract entities, relationships, descriptions for entities and relations.

Entities are converted into EntityNode objects. Entity description is stored in metadata(暂时还未implement 现在只有Relationship description)

Relationships are converted into Relation objects. Relationship description is stored in metadata.

These are added to the node's metadata under KG_NODES_KEY and KG_RELATIONS_KEY.

后续还有完整的实现… 这里就不放了原网页上有

OK 没有openai api的话那么现在开始看一下 Triplex + R2R + neo4j + llamaindex??

https://www.sciphi.ai/blog/triplex → 模型介绍

https://ollama.com/sciphi/triplex → ollama

“A high quality dedicated model for triples extraction is a significant step towards making it possible to build a knowledge graph locally - as I have personally seen that right now even frontier models struggle with the task of triples extraction.” — https://www.reddit.com/r/LocalLLaMA/comments/1e77yqy/build_a_knowledge_graph_from_your_laptop/

"The triple extraction model achieves results comparable to GPT-4, but at a fraction of the cost. This significant cost reduction is made possible by Triplex's smaller model size and its ability to operate without the need for few-shot context.

Building upon the SFT model, we generated additional preference-based dataset using majority voting and topological sorting to further train Triplex using DPO and KTO. These additional training steps yielded substantial improvements in model performance.”

leverages proprietary datasets generated from authoritative sources such as DBPedia and Wikidata, as well as web-based text sources and synthetically generated datasets

https://neo4j.com/labs/genai-ecosystem/llamaindex/

OK那么先从R2R开始看起 https://github.com/SciPhi-AI/R2R?tab=readme-ov-file

这个支持的太全面了呀甚至前端dashboard都有

https://mychen76.medium.com/automatic-knowledge-rag-with-r2r-0e9841714d5b

https://freedium.cfd/https://mychen76.medium.com/automatic-knowledge-rag-with-r2r-0e9841714d5b

分享首歌

https://www.bilibili.com/video/BV1Rs4y1A7xJ/?spm_id_from=333.337.search-card.all.click&vd_source=143a2ef0cd4b513f15da9430a5ab01fc

Well I do