The Triplex model by SciPhi is an advanced tool for knowledge graph construction. It excels in extracting triplets (subject, predicate, object) from unstructured data, significantly reducing costs and improving performance compared to traditional models like GPT-4.
Key Features of Triplex
Cost Efficiency
Offers a 98% reduction in costs for creating knowledge graphs, outperforming models like GPT-4 at a fraction of the cost.
High Performance
Trained on diverse datasets, ensuring robustness and versatility across various applications.
Open Source
Available on platforms like Hugging Face, making it accessible for developers and researchers.
Advanced Training Techniques
Utilizes Dynamic Programming Optimization (DPO) and Knowledge Triplet Optimization (KTO) for improved accuracy and efficiency.
Download and Install Triplex
Step 1: Install the Required Packages
Run the following command to install the necessary libraries:
pip install transformers torch
Step 2: Clone the R2R Repository
Clone the repository from GitHub and navigate to the directory:
git clone https://github.com/SciPhi-AI/R2R.git
cd R2R
Step 3: Install R2R
Install R2R using pip:
pip install r2r
Or use Docker for a more streamlined setup:
r2r --config-name=default serve --docker
How to Use Triplex
Loading the Model and Tokenizer
Use the following code to load the model and tokenizer:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("SciPhi/Triplex")
tokenizer = AutoTokenizer.from_pretrained("SciPhi/Triplex")
Extracting Triplets
Define and use a function to extract triplets from text:
import json
def triplextract(model, tokenizer, text, entity_types, predicates):
input_format = """Perform Named Entity Recognition (NER) and extract knowledge graph triplets from the text.
NER identifies named entities of given entity types, and triple extraction identifies relationships
between entities using specified predicates.
**Entity Types:**
{entity_types}
**Predicates:**
{predicates}
**Text:**
{text}
"""
message = input_format.format(entity_types=json.dumps({"entity_types": entity_types}), predicates=json.dumps({"predicates": predicates}), text=text)
messages = [{'role': 'user', 'content': message}]
input_ids = tokenizer(messages, return_tensors="pt").input_ids
output = model.generate(input_ids=input_ids, max_length=2048)
return tokenizer.decode(output[0], skip_special_tokens=True)
Example Usage
Extract triplets from a sample text:
text = "Paris is the capital of France."
entity_types = ["CITY", "COUNTRY"]
predicates = ["CAPITAL_OF", "LOCATED_IN"]
triplets = triplextract(model, tokenizer, text, entity_types, predicates)
print(triplets)
Additional Tips for Triplex
Optimizing Performance
- Use a temperature setting of 0.3 for optimal results.
- Ensure your hardware meets the requirements for running large models efficiently.