type
status
date
slug
summary
tags
category
icon
password
以下文字从https://docs.llamaindex.ai/en/stable/examples/cookbooks/GraphRAG_v1/中粘贴性总结 总结性粘贴
后续还会更新
Approximate implementation of GraphRAG
两步:
- Graph Generation - Creates Graph, builds communities and its summaries over the given document.
- Source Documents to Text Chunks: Source documents are divided into smaller text chunks for easier processing. —>
SentenceSplitter
with a chunk size of 1024 and chunk overlap of 20 tokens - Text Chunks to Element Instances: Each text chunk is analyzed to identify and extract entities and relationships, resulting in a list of tuples that represent these elements.( 建立triples) (how local? 这里看了一眼 用的是openai 正在找用本地ollma 方式 )
- Element Instances to Element Summaries: The extracted entities and relationships are summarized into descriptive text blocks for each element using the LLM. —>
GraphRAGExtractor
- Element Summaries to Graph Communities: These entities, relationships and summaries form a graph, which is subsequently partitioned into communities using algorithms using Heirarchical Leiden(关系网上关系检测 ,社区聚类,递归收敛 类似k-means ) to establish a hierarchical structure.
- Graph Communities to Community Summaries: The LLM generates summaries for each community, providing insights into the dataset’s overall topical structure and semantics.
—>
GraphRAGStore
- Answer to the Query - Use summaries of the communities created from step-1 to answer the query.
Community Summaries to Global Answers: The summaries of the communities are utilized to respond to user queries. This involves generating intermediate answers, which are then consolidated into a comprehensive global answer.
—>
GraphQueryEngine
以下流程先假设有openai api
- load csv [’title’ ‘date’ ‘text’] —>
- Extraction Process:
For each input node (chunk of text):
- It sends the text to the LLM along with the extraction prompt.
- The LLM's response is parsed to extract entities, relationships, descriptions for entities and relations.
- Entities are converted into EntityNode objects. Entity description is stored in metadata(暂时还未implement 现在只有Relationship description)
- Relationships are converted into Relation objects. Relationship description is stored in metadata.
- These are added to the node's metadata under KG_NODES_KEY and KG_RELATIONS_KEY.
后续还有完整的实现… 这里就不放了 原网页上有
OK 没有openai api的话 那么现在开始看一下 Triplex + R2R + neo4j + llamaindex??
https://ollama.com/sciphi/triplex → ollama
“A high quality dedicated model for triples extraction is a significant step towards making it possible to build a knowledge graph locally - as I have personally seen that right now even frontier models struggle with the task of triples extraction.” — https://www.reddit.com/r/LocalLLaMA/comments/1e77yqy/build_a_knowledge_graph_from_your_laptop/
"The triple extraction model achieves results comparable to GPT-4, but at a fraction of the cost. This significant cost reduction is made possible by Triplex's smaller model size and its ability to operate without the need for few-shot context.
Building upon the SFT model, we generated additional preference-based dataset using majority voting and topological sorting to further train Triplex using DPO and KTO. These additional training steps yielded substantial improvements in model performance.”
leverages proprietary datasets generated from authoritative sources such as DBPedia and Wikidata, as well as web-based text sources and synthetically generated datasets
OK那么先从R2R开始看起 https://github.com/SciPhi-AI/R2R?tab=readme-ov-file
这个支持的太全面了呀 甚至前端dashboard都有
分享首歌
Well I do
- Author:ran2323
- URL:https://www.blueif.me//article/15271a79-6e22-8094-ade8-d5960fe99967
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!