The site is under development.

Doccano Tutorial

1. What is Doccano?
Doccano is an open-source, web-based annotation tool designed for natural language processing tasks like named entity recognition (NER), text classification, and sequence-to-sequence tasks such as translation or summarization. It allows you to label text data through an easy-to-use interface.

2. Key Features of Doccano
- Web-based UI for annotation
- Supports multiple projects and users
- Supports multilingual data
- Export and import in various formats (JSON, CSV)
- Admin tools for managing annotation workflows

3. Use Cases in NLP Projects
Doccano is widely used in real-world NLP projects for tasks like building custom datasets for chatbots, classifying customer reviews, translating documents, extracting named entities (like names, locations), and training machine learning models.

4. Comparison with Other Annotation Tools
Compared to tools like Prodigy, brat, or Label Studio, Doccano stands out for being free, open-source, and having a user-friendly browser interface. Unlike Prodigy, Doccano doesn’t require a license. It also supports sequence-to-sequence tasks, which many alternatives do not.

5. Supported Annotation Types
- Text Classification
- Sequence Labeling (NER, POS tagging)
- Sequence-to-Sequence (Translation, Summarization)

6. Doccano Licensing and Community
Doccano is released under the MIT License, meaning it can be freely used, modified, and distributed. It has an active GitHub community with frequent updates and feature additions.

7. System Requirements
- Python 3.8+
- Node.js (for frontend builds)
- Docker (optional but recommended for quick setup)
- Modern browser for accessing the UI

8. Understanding the Doccano UI
The Doccano interface is simple and intuitive. It contains dashboards for creating projects, assigning annotators, viewing progress, and starting annotations. You can switch between different project types with just a few clicks.

9. Language Support Overview
Doccano supports multilingual text annotation. It works with right-to-left languages like Arabic and Hebrew, as well as Asian scripts such as Chinese, Japanese, and Korean.

10. Roadmap and Future Vision
Doccano plans to improve team collaboration tools, better support for document-level annotation, deeper integrations with machine learning pipelines, and features like active learning, model-in-the-loop, and real-time annotation suggestions.

1. Creating Admin Users
Admin users can create, manage, and assign projects. They are created through command line or admin dashboard during installation.

2. User Registration Process
Doccano allows users to register through the web interface (if enabled) or be added manually by an admin.

3. Managing User Roles
Users can be assigned roles such as admin, project manager, or annotator. Each role has different levels of access.

4. Permissions and Access Control
Access is controlled at the project level. Annotators only access assigned projects, while admins see everything.

5. Password Management
Users can change their own passwords, and admins can reset them via the dashboard or command line.

6. Disabling User Accounts
Accounts can be disabled to prevent login without deleting the user. This is useful for managing temporary access.

7. Multi-user Collaboration
Doccano supports multiple users annotating the same or different documents within a project, with progress tracking.

8. User Interface Preferences
Users can customize preferences like dark/light mode and language settings from their profile page.

9. Deleting Users
Admins can delete users permanently via the admin interface or command line. All related data may be affected.

10. Bulk User Upload via Scripts
Admins can batch-create users using scripts and CSV files for larger teams or integrations with HR systems.

1. Sequence Labeling Projects
These projects let annotators tag sequences of tokens, useful for NER or POS tagging. Each token can be assigned a label like "Person" or "Location".

2. Sequence to Sequence Annotation
Ideal for translation and summarization tasks. Annotators provide output text that corresponds to the input sequence.

3. Text Classification Projects
Annotators classify whole documents or sentences into predefined categories like sentiment (positive/negative/neutral).

4. Named Entity Recognition (NER)
This is a subtype of sequence labeling used to extract proper names, organizations, dates, etc. Doccano highlights selected tokens.

5. Translation Projects
In these, annotators provide translations for source sentences into target languages. Useful for building multilingual datasets.

6. Speech/Text Alignment Projects
Though not natively supported, Doccano can be adapted for aligning text with spoken words using timestamp-like labeling.

7. OCR Annotation Projects
Optical Character Recognition data can be annotated as text blocks, sometimes post-processed from image data.

8. Relation Annotation Mode
Used to annotate relationships between two entities (e.g., Person → WorksAt → Company). Requires both entity tagging and relation drawing.

9. Zero-shot Classification Projects
Doccano can be used to create datasets for zero-shot models where labels are described at inference time rather than training time.

10. Custom Annotation Schemas
Users can create their own labels, colors, and categories to fit unique project needs. Doccano supports full customization of labels.

Starting a New Project
You can create a new project by clicking "Create Project" and selecting a type (classification, sequence labeling, or translation). Each project includes its own data, labels, and settings. This is the foundation for organizing all tasks in Doccano.

Configuring Project Settings
Settings define behavior—like allowing multiple labels, collaborative annotation, and visibility. Configure them based on your team and task. These settings control how annotation functions and how users interact.

Defining Labels and Tags
Labels are the backbone of annotation. Tags add extra context or category grouping. Clearly defined labels ensure consistency. You can assign shortcut keys to improve efficiency during annotation.

Project Description and Metadata
Descriptions give annotators context and purpose. Metadata includes task type, version, language, etc. This ensures clarity when multiple users or projects are involved. It's visible on the project dashboard.

Adding Guidelines for Annotators
Good annotation guidelines help annotators work consistently and avoid confusion. Include examples, edge cases, and dos/don’ts. These appear inside the annotation UI for reference.

Assigning Users to Projects
You can add team members to each project and assign roles like admin or annotator. This controls access, tracks progress, and enables multi-user collaboration.

Cloning Projects
Clone existing projects to reuse labels and configurations. This is helpful for multiple phases or language variations. Annotation data can optionally be excluded.

Archiving Projects
Archive projects to remove them from the active dashboard without deleting them. This keeps things clean while preserving data for future use.

Deleting Projects
Deleting is permanent. Use with caution and only after exporting valuable data. There’s no undo, so back up before deletion.

Managing Project Visibility
You can make a project visible only to assigned users or public within the system. This is useful for managing access in shared environments.

Adding Labels
Labels define what you’re tagging—like entities or classes. You can add labels manually through the interface. Give each a clear name and purpose for accurate annotations.

Color-Coding Labels
Each label can have a custom color. Colors help visually distinguish different labels during annotation, reducing mistakes and improving speed.

Grouping Labels into Categories
You can assign categories or prefixes to labels, which helps organize complex labeling schemes (e.g., PERSON vs ORG). This supports better readability and filtering.

Editing Existing Labels
Labels can be renamed, re-colored, or updated without losing previously annotated data. This helps adapt to project changes mid-way.

Reordering Labels
You can change the order of labels in the UI to group commonly used ones together. This speeds up selection during labeling.

Importing Labels from JSON
Large sets of labels can be imported in JSON format. This is efficient when labels are predefined or reused from past projects.

Exporting Labels
You can export your current labels for backup or reuse in other projects. Exported labels are also useful for team documentation.

Label Shortcuts and Hotkeys
Assign keyboard shortcuts to labels to allow fast tagging without clicking. This improves annotator speed, especially in high-volume tasks.

Labeling Best Practices
Keep labels minimal, specific, and non-overlapping. Use clear naming and consistent colors. Always test with sample annotations before scaling.

Hierarchical Labels Support
Although limited, labels can be grouped to simulate hierarchy using prefixes (like `Animal → Dog → Poodle`). True nested support is expected in future versions.

Acceptable Data Formats
Doccano supports JSONL, CSV, and plain text formats depending on the task type. Data should be clean and well-structured to ensure smooth import.

JSONL File Format Explanation
JSON Lines (JSONL) is the standard input format. Each line is a separate JSON object. This format is efficient for large-scale, line-based annotation tasks.

Uploading Text Files
You can drag-and-drop text files or upload via file selector. Files must match the project type. For translation tasks, include both source and target text fields.

Batch Data Upload
Multiple files or large files can be uploaded at once. Doccano processes them line-by-line and maps to your schema automatically. Monitor progress via the UI.

Handling Large Datasets
For large datasets, it’s recommended to chunk data into smaller files or use the API. This prevents memory issues and speeds up processing.

Preprocessing Before Upload
Clean, tokenize, and verify your data before uploading. Common steps include removing duplicates, checking for special characters, and validating encoding.

Label Mapping on Import
If importing labeled data, make sure the label names match exactly with your project’s labels. Mismatches will result in import errors or missing annotations.

Troubleshooting Import Errors
Errors usually relate to formatting, invalid JSON, missing fields, or incompatible types. Use the error message shown in Doccano to debug line-by-line.

Metadata in Import Files
Metadata like document ID, timestamp, or annotator can be included in your JSONL objects. These fields help organize and audit annotations later.

Importing Multilingual Data
Doccano supports Unicode and right-to-left languages. Ensure files are UTF-8 encoded. You can label mixed-language documents or build language-specific projects.

Layout of Annotation Screen
The screen includes text to be labeled, a label panel, navigation buttons, and a sidebar with stats. Depending on task type, you'll see checkboxes, buttons, or span-based text highlighting options.

Navigating Between Samples
Annotators can use “Next” and “Previous” buttons to move through text entries. This allows for efficient sequential review or skipping incomplete entries.

Keyboard Shortcuts
Label shortcuts enable fast annotation using keys. These are defined in the label settings.
# Sample shortcut mapping
shortcuts = {'1': 'Positive', '2': 'Neutral', '3': 'Negative'}
print("Press 1/2/3 to classify sentiment")

Search and Filter Options
Use the search bar to find specific keywords. Filters allow viewing completed, incomplete, or rejected samples to review progress and data coverage.

Annotation Auto-Save
Annotations are automatically saved as you label data. This ensures no manual save is needed, reducing the risk of losing changes.

Viewing and Editing Previous Annotations
Users can revisit samples to modify or review previous annotations, allowing iterative refinement.

Inline Suggestions (if enabled)
With model integration, Doccano can show suggested annotations inline. These can be accepted or ignored to speed up labeling.

Highlighting Entities
For sequence labeling, selected spans appear highlighted. Each label type has a distinct color to reduce confusion.

Jump to Specific Entries
Enter a sample ID or position number to quickly navigate to a specific entry. This helps QA teams audit specific annotations.

Annotation Statistics Sidebar
Shows overall progress, label distribution, and annotation count. It's useful for tracking individual or team performance.

Multi-Class Classification
In this setup, only one label can be selected per sample. For example, tagging a review as either Positive, Neutral, or Negative.
# Only one label is allowed
text = "Great product!"
label = "Positive"

Multi-Label Classification
Multiple relevant tags can be applied simultaneously. For instance, a tweet may be both “Sarcastic” and “Political”.

Using Checkboxes vs Buttons
Checkboxes are used for multi-label tasks, while buttons are used for mutually exclusive classification tasks. The UI adapts to the project type.

Handling Long Texts
Doccano supports large text bodies. Use scrolling or zooming as needed. It's useful for legal or medical documents.

Predefined Label Selection
Labels are shown in a panel and can be selected with a click or shortcut key. Clear label naming is essential for accuracy.

Classification Confidence Score
Some Doccano setups allow adding model-predicted confidence scores during review. This helps prioritize uncertain cases.

Hotkeys for Label Assignment
Speed up annotation by assigning shortcut keys to labels. This is done in the label config section.
# Example
shortcut = {'1': 'Spam', '2': 'Ham'}
print("Press 1 or 2 to annotate")

Exporting Labeled Data
Annotations can be exported in JSONL, CSV, or CoNLL formats. Exported files contain both the original text and associated labels.

Editing Annotations
Annotations can be changed anytime by revisiting the sample and updating the label selection. All changes are saved automatically.

Best Practices for Quality
Use clear guidelines, conduct inter-annotator checks, and review edge cases. Keep label sets minimal and mutually exclusive when possible.

Span-Based Annotation Basics
Users label spans of text (words or phrases) by highlighting and assigning an entity label like PERSON or LOCATION.

Drag and Highlight Interface
Annotation is done by clicking and dragging across text, which opens a popup for label selection.

Dealing with Overlapping Entities
Doccano currently does not support overlapping spans directly. Annotators must choose the most relevant label or split the sentence for dual tagging.

Editing Spans
Click an annotated span to re-label or delete it. This allows corrections or improvements without restarting annotation.

Adding Nested Annotations
While true nested annotation isn't supported natively, you can simulate it using entity suffixes or label conventions like `ORG-IN-PERSON`.

Entity Types and Color Coding
Each entity type is color-coded for clarity. Custom colors can be assigned during label setup for better visibility.

Using Keyboard Shortcuts
Shortcuts help confirm a label without using the mouse. For instance, highlight and press a mapped key to apply a label.
# Example
keymap = {'a': 'PERSON', 'b': 'ORG'}
print(\"Press A or B after highlighting to assign label\")

Exporting Labeled Spans
Export includes each span, label, and character index in JSONL. This is used for training NER or token-level models.

Undo and Redo Options
Use the on-screen undo button or Ctrl+Z/Ctrl+Y to reverse or redo annotation changes. This feature prevents accidental mislabeling.

Validating Consistency
Ensure consistent span boundaries and entity types. Use review workflows or scripts to check labeling uniformity across the dataset.

Introduction to Entity Relations
Relation annotation allows connecting entities (like PERSON → ORG) to express relationships. It’s useful for building knowledge graphs and complex NLP datasets.

Enabling Relation Mode
In project settings, toggle “Relation Mode” to enable relational links between spans. This changes the interface to allow arrows between entities.

Connecting Entities via Arrows
Hold shift and click on two spans to draw a relation arrow. This visually represents the relationship direction.

Defining Relation Labels
Relation labels (e.g., “works_for”, “married_to”) are defined separately from entity labels. These must be configured before annotating.

Bidirectional vs Unidirectional
Relations can be set as one-way or two-way depending on meaning. For example, “parent_of” is directional while “married_to” may be bidirectional.

Relation UI Design
Arrows are color-coded and labeled in the UI. You can hover to see relation type and involved entities for clarity.

Editing and Deleting Relations
Click on an arrow to change or remove a relation. This allows fine-tuning of annotation without redoing entity spans.

Use Cases in Knowledge Graphs
Relations help build knowledge bases from unstructured data by capturing who-does-what-to-whom relationships in documents.

Export Format for Relations
Exports include source/target entity IDs and relation types. JSONL includes relation metadata, usable in model training.
# Sample format:
{
\"text\": \"Alice works at OpenAI.\",
\"entities\": [[0, 5, \"PERSON\"], [15, 21, \"ORG\"]],
\"relations\": [[0, 1, \"works_for\"]]
}

Common Pitfalls and Fixes
Avoid missing or mismatched entity indices. Use clear and limited relation types. Validate using export previews and QA checks.

What is Seq2Seq Annotation?
Sequence-to-sequence annotation maps input text to a generated output (e.g., translation, summarization). Annotators type or select the output sequence manually.

Setting Up Projects
Create a new project of type “Seq2Seq”. You’ll need a source field (input) and a target field (desired output).

Input and Output Fields
Input is displayed, and the user fills the output field with the transformed version. It's ideal for translation or paraphrasing tasks.

Handling Multiple Translations
Multiple outputs can be collected by assigning several annotators per sample or using separator tokens between variants.

Typing vs Selecting Output
You can either manually type the output or select from a predefined list (via custom interfaces or scripts).

Keyboard Navigation
Tab to switch between fields and Enter to save or move forward. Shortcuts speed up high-volume text entry.

Reviewing Output Texts
Annotators and reviewers can revisit and refine text outputs to improve consistency and fluency.

Export Formats for Seq2Seq
Each sample includes original text and output. Common format is:
{
\"text\": \"Hello.\",
\"label\": [\"Bonjour.\"]
}

Language Translation Use Case
Seq2Seq is used to build parallel corpora for neural machine translation. Quality and fluency are key criteria.

Quality Checks for Seq2Seq
Use peer review, BLEU score testing, or linguistic rules to assess translated/generated outputs before deployment.

Reviewer Role Overview
Reviewers verify annotations made by others. They can approve, reject, or comment on entries. This role ensures high data quality.

Setting up Review Workflow
Enable the review feature in project settings. Assign specific users as reviewers. Annotations will then require approval.

Approving and Rejecting Annotations
Reviewers use interface buttons to accept or reject annotations. Rejected samples return to annotators for corrections.

Commenting on Annotations
Reviewers can leave comments explaining why an annotation was rejected or how to improve it. This promotes learning and consistency.

Reviewer Dashboard
A dashboard shows pending reviews, approved entries, and performance metrics. It helps manage reviewer workload.

Filtering by Annotation Status
Filter samples by approved, rejected, or unreviewed status to streamline QA and task tracking.

Escalation or Re-annotation
Complex or disputed samples can be escalated to project admins or sent for re-annotation with reviewer comments attached.

Batch Reviewing Tips
Use keyboard shortcuts and batch navigation to quickly move through samples. Focus on edge cases and disagreement hotspots.

Exporting Review Results
Exported files include review status, reviewer ID, and any comments. This supports audits and training refinement.

Review Quality Metrics
Track annotation agreement rates, reviewer consistency, and time-per-sample. These metrics help improve training and project quality.

1. Supported Export Formats
Doccano supports exporting data as JSONL, CSV, and plain text. Format depends on project type (e.g., classification or NER).

2. Exporting JSONL Files
Each line in a JSONL file represents a data object (document + labels).
# Simulated JSONL line
{ "text": "Canada is a country.", "labels": [[0, 6, "Location"]] }

3. CSV Export Options
CSV exports are useful for spreadsheets. Each row represents a labeled item.
# Example CSV row
Text,Label
"Canada is a country.","Location"

4. Export with Metadata
You can choose to include user, timestamp, or agreement info during export.
{ "text": "Hello", "labels": [], "created_by": "admin", "timestamp": "2023-09-01" }

5. Filtering before Export
Filters like date range, approved-only, or by label type can be applied before exporting.
# Pseudo filter logic
if approved and date > "2023-01-01":
export(data)

6. Exporting Only Approved Annotations
You can export only samples reviewed and approved.
# Pseudo filter
if annotation["status"] == "approved":
export(annotation)

7. Exporting Label Schemes
The label configuration (label names/colors) can be exported as JSON.
{ "labels": ["Person", "Location", "Date"] }

8. Post-processing Exported Files
After export, you can clean, reformat, or use scripts to convert into model-ready formats.
import json
with open("export.jsonl") as f:
data = [json.loads(line) for line in f]
print("Cleaned records:", len(data))

9. API-Based Export
Use Doccano’s API to export programmatically.
import requests
response = requests.get("http://localhost:8000/v1/projects/1/docs/download", headers={"Authorization": "Token YOUR_TOKEN"})
with open("project1.jsonl", "wb") as f:
f.write(response.content)

10. Export Errors and Fixes
Common issues include encoding problems, incomplete annotations, or bad formats.
Always validate your exported data using scripts or JSON validators.

1. Accessing Admin Panel
Admins can access the dashboard via `/admin` URL. It provides control over users, projects, and backend settings.

2. System Overview Stats
View total documents, projects, users, and annotations. Useful for project managers.
{ "users": 10, "projects": 3, "docs": 1200, "annotations": 800 }

3. User Activity Logs
Admins can monitor logins, project access, and label activity.
{ "user": "alice", "action": "annotated", "timestamp": "2023-01-01T12:00" }

4. Annotation Progress Reports
Track how many documents are completed by each user in each project.
{ "project": "NER", "user": "bob", "completed": 85 }

5. System Health Checks
Status indicators show database connection, API availability, and disk usage.
{ "db": "OK", "disk_space": "70% used", "api": "Running" }

6. Resource Usage Monitoring
Admins may monitor server CPU, memory, and storage through external tools or Docker stats.

7. Managing Multiple Projects
Create, clone, delete, or rename projects via the admin dashboard.

8. Config File Settings
Advanced settings like login limits, export options, and integrations are handled through `.env` and YAML files.
# Example .env
DOCANNO_ALLOW_SIGNUP=False
DOCANNO_TIME_ZONE=UTC

9. User Account Recovery
Admins can reset user passwords or re-enable locked accounts.

10. Admin Security Best Practices
Use strong passwords, HTTPS, disable public sign-up, and update Doccano regularly.

1. Custom Logo and Branding
You can replace the Doccano logo in frontend source files and rebuild the UI using `npm run build`.

2. UI Theming with CSS
Edit `src/assets` for styles. Customize colors, spacing, or dark mode preferences.
/* Example */
body { background-color: #f2f2f2; }

3. Editing Frontend Text
Change labels and tooltips in Vue components (like `Project.vue`).
// Example label change
<label>Custom Text</label>

4. Adding Custom Shortcuts
Modify key bindings in the annotation keyboard shortcuts JS file.
window.addEventListener('keydown', function(e) {
if (e.key === 'n') { /* custom action */ }
});

5. Modifying Label Palette
Colors for each label can be edited in the label creation interface or backend JSON.
{ "label": "Person", "color": "#00ff00" }

6. Custom Widgets
Advanced users can add widgets to the UI by extending the Vue components.

7. Third-Party Plugin Integration
You can integrate analytics or feedback plugins into the frontend Vue app.
import SomePlugin from 'plugin-name';
Vue.use(SomePlugin);

8. Feature Requests and Custom Builds
You can fork the GitHub repo and build your own features. Doccano encourages open-source contributions.

9. Advanced Label Filtering
Filter by label type, annotation status, or text using the advanced filter UI.

10. API Custom Endpoints
You can extend the backend by adding Django REST Framework views.
from rest_framework.decorators import api_view
@api_view(['GET'])
def custom_export(request):
return Response({"message": "Hello from custom API!"})

1. Overview of Doccano API
Doccano exposes a RESTful API for managing projects, uploading data, annotating text, and exporting datasets. It allows full automation and integration with your NLP workflows.

2. Authentication via API
The API uses token-based authentication. After login, you receive an API token to access endpoints.

3. API for Project Creation
You can create a new project by sending a POST request to `/v1/projects/` with required fields like name, description, and project type.

4. API for Data Import
Use `/v1/projects/{id}/docs/upload/` to import text data into a project. It accepts plain text, CSV, or JSON depending on the project type.

5. API for Annotation Submission
Annotations are submitted via POST requests to project-specific annotation endpoints, where you send label positions and values.

6. Export via API
Export your project’s annotations by calling `/v1/projects/{id}/docs/download/`. This returns a file in your preferred format (JSONL, CSV, etc.).

7. Managing Users through API
Admins can manage users by using endpoints like `/v1/users/`, assigning roles and granting access to specific projects.

8. Pagination and Limits
Large datasets are paginated. Use parameters like `limit` and `offset` to navigate responses. Example: `/v1/projects/?limit=50&offset=100`.

9. API Tokens and Permissions
Tokens are generated per user and define their access level. Only admins can perform sensitive actions like deleting data or adding users.

10. Automating Workflows with API
Combine all API features to automate the full lifecycle: create project → upload data → annotate → export → archive.

1. Real-Time Syncing
Changes made by annotators are reflected across the team instantly, ensuring up-to-date annotations and conflict avoidance.

2. Assigning Tasks to Users
Admins can assign subsets of documents to users to balance workloads or specialize tasks.

3. Viewing Team Progress
Doccano dashboards show annotation progress per user and project, enabling better project tracking.

4. Team-Based Review Cycles
After initial annotations, other team members can review and approve/reject them. This promotes quality assurance.

5. Chat and Notes Inside Doccano
Collaborators can leave notes or comments on specific data samples for clarification or discussion.

6. Annotation Dispute Resolution
When multiple annotators disagree, admins can resolve disputes by reviewing conflicting labels and selecting the correct one.

7. Feedback from Annotators
Users can submit feedback about the tool or data via comment boxes or shared documentation fields.

8. Documenting Decisions
Annotation guides and decision logs help annotators align on labeling standards and avoid inconsistency.

9. Version Control in Annotations
Changes in annotations are tracked over time, and previous versions can be reviewed to understand changes.

10. Training Annotators Collaboratively
Admins can run mock annotations and collaborative workshops using shared datasets to onboard new annotators effectively.

1. Handling Large Datasets
Doccano handles millions of records. To scale effectively, batch uploads and use pagination to prevent crashes.

2. Indexing and Caching
Indexing the database and caching frequent queries can significantly improve UI and API response time.

3. Optimizing Docker Settings
Increase memory limits, add swap space, and enable multi-threading in Docker containers for faster performance.

4. Scaling with Kubernetes
Use Kubernetes to run Doccano in a distributed, auto-scalable way that handles peak loads during annotations.

5. Reducing Latency
Serve Doccano using reverse proxies (like NGINX), use CDN for static files, and deploy closer to your users geographically.

6. Optimizing Label Rendering
Reduce DOM complexity by limiting number of labels rendered per page. Use lazy-loading when possible.

7. Disabling Unused Features
Turn off features like auto-save, tag suggestions, or annotation history if they are not needed for your workflow.

8. Server Load Balancing
Balance load across multiple Doccano instances to avoid overload and improve response time for large teams.

9. Performance Monitoring Tools
Use Prometheus, Grafana, or Sentry to monitor performance metrics, memory usage, and detect bottlenecks.

10. Database Optimization Tips
Use PostgreSQL indexes, vacuum tables regularly, and separate annotation from user metadata for large-scale deployments.

1. Docker Compose Setup
Docker Compose allows defining and running multi-container Docker applications easily, useful for deploying Doccano with databases and backend services.

2. Nginx Reverse Proxy
Nginx can be configured as a reverse proxy to route incoming traffic to your Doccano application securely and efficiently.

3. SSL Configuration
Setting up SSL certificates (e.g., via Let's Encrypt) ensures encrypted connections between users and your deployed Doccano instance.

4. AWS Deployment
Amazon Web Services offers scalable infrastructure to host Doccano using EC2, RDS, and other managed services.

5. GCP Deployment
Google Cloud Platform supports deployment using Compute Engine, Cloud SQL, and Kubernetes Engine for container orchestration.

6. Heroku Hosting
Heroku provides a PaaS option for simpler deployment with automatic scaling and easy management.

7. Azure Cloud Setup
Microsoft Azure offers virtual machines, managed databases, and Kubernetes services to deploy Doccano securely.

8. On-Premise Setup
Running Doccano on local servers for organizations requiring full control and data privacy.

9. CI/CD Integration
Continuous Integration/Continuous Deployment pipelines automate testing and deployment to speed up release cycles.

10. Backup and Restore Procedures
Regular backups of databases and application data ensure recovery from failures or data loss.

1. Legal Document Classification
Legal document classification is the process of categorizing legal texts like contracts, court rulings, or compliance forms into predefined classes. Labels can include confidentiality level, contract type, or jurisdiction. Doccano allows annotators to mark segments and tag them with specific categories, helping legal professionals sort and retrieve documents easily. This accelerates legal research and enables automation of repetitive document analysis tasks in law firms and regulatory bodies.
document = "This is a non-disclosure agreement (NDA) under US law."
label = "Confidential - NDA"
print("Classified as:", label)

2. Customer Feedback Analysis
Analyzing customer feedback involves reviewing survey comments, support tickets, or reviews to understand user sentiment, product issues, or feature requests. In Doccano, such texts are annotated by sentiment, topic, or urgency. This classification aids companies in making data-driven decisions, improving product features, and identifying negative experiences early. It also enables automated triaging of user complaints.
feedback = "The app keeps crashing on startup."
sentiment = "Negative"
category = "Bug Report"
print("Feedback Type:", sentiment, "| Category:", category)

3. Sentiment Labeling
Sentiment labeling assigns emotional tone to texts—positive, negative, or neutral. It's common in social media monitoring and customer service. Annotators mark sentences in Doccano with emotion tags to train models that detect opinions and public sentiment trends. Sentiment analysis is critical in brand monitoring, elections, and market research to assess people's views.
sentence = "I love this product!"
sentiment = "Positive"
print(f"Sentence: '{sentence}' labeled as {sentiment}")

4. Medical Named Entity Recognition
This involves tagging medical entities like diseases, drugs, or procedures in clinical notes. Annotators in Doccano highlight terms like "diabetes" or "aspirin" and label them with appropriate categories. It’s crucial for building systems that extract structured data from unstructured medical records to aid diagnostics or research. NER models trained on this data improve clinical decision support.
text = "Patient was prescribed ibuprofen for headache."
entities = [{"entity": "ibuprofen", "label": "Drug"}, {"entity": "headache", "label": "Symptom"}]
print("Extracted entities:", entities)

5. Social Media Classification
Posts from Twitter, Facebook, or Reddit can be classified by topic, sentiment, or user intent. This is especially useful for brands or researchers trying to monitor public discourse. In Doccano, annotators assign labels such as "complaint", "praise", or "product inquiry" to help train models that automate this task at scale.
tweet = "Why is my order delayed again?"
label = "Complaint"
print("Tweet Label:", label)

6. Chatbot Intent Tagging
Intent tagging defines what a user wants when they input a query to a chatbot (e.g., "Book flight", "Cancel reservation"). Annotators use Doccano to tag these sentences with intents so AI chatbots can learn to map phrases to backend actions. It forms the foundation of NLP-based conversational agents.
user_input = "I want to cancel my booking."
intent = "CancelBooking"
print("Detected intent:", intent)

7. Toxic Comment Detection
Detecting toxic content involves labeling offensive, abusive, or threatening language online. Annotators tag such instances in Doccano with categories like "hate speech", "harassment", or "spam". This annotated data is critical to developing content moderation tools used on platforms like YouTube, Twitter, or Reddit.
comment = "You are so dumb and useless!"
label = "Toxic - Personal Insult"
print("Flagged comment:", label)

8. Product Review Analysis
Annotators tag product reviews with aspects such as "battery life", "camera", or "shipping" to extract fine-grained sentiment. This helps e-commerce platforms understand detailed feedback, like customers liking the camera but disliking battery performance. The annotations can power aspect-based sentiment analysis systems.
review = "Camera quality is good but battery drains quickly."
tags = [{"aspect": "Camera", "sentiment": "Positive"}, {"aspect": "Battery", "sentiment": "Negative"}]
print("Review Analysis:", tags)

9. Contract Term Extraction
Legal agreements often require extracting terms like expiration date, governing law, or liabilities. Using Doccano, these key clauses can be annotated and later extracted automatically by models. It's valuable for automating compliance checks, contract analysis, and due diligence.
clause = "This agreement terminates on Dec 31, 2025."
term = {"label": "End Date", "value": "Dec 31, 2025"}
print("Extracted term:", term)

10. OCR Error Correction
OCR (Optical Character Recognition) often introduces text errors. Annotators can use Doccano to correct these by labeling incorrect segments and writing corrections. This helps improve OCR engines and build training datasets for error correction in scanned document pipelines.
ocr_text = "Th1s 1s a t3st."
corrections = {"Th1s": "This", "t3st": "test"}
print("Corrected Output:", corrections)

1. QA Workflow Setup
A QA workflow in annotation ensures high-quality labeled data by defining structured steps: annotation, review, feedback, and correction. In Doccano, this could involve roles like annotators, reviewers, and project managers. Setting up this workflow includes configuring permissions, defining review checkpoints, and establishing versioning. Without QA, annotations may suffer from inconsistency, which can degrade model performance.
workflow = ["Annotate", "Review", "Feedback", "Finalize"]
print("QA Workflow steps:", workflow)

2. Inter-Annotator Agreement
Inter-Annotator Agreement (IAA) measures how consistently multiple annotators label the same text. High agreement indicates reliability, while low agreement suggests ambiguity or poor guidelines. Metrics like Cohen’s Kappa or F1-score are commonly used. Doccano exports can be used to calculate IAA across overlapping tasks.
annotator1 = ["Positive", "Negative", "Neutral"]
annotator2 = ["Positive", "Negative", "Negative"]
matches = sum([a1 == a2 for a1, a2 in zip(annotator1, annotator2)])
print("Agreement Ratio:", matches/len(annotator1))

3. Precision/Recall Metrics
Precision and recall are standard evaluation metrics. Precision measures how many selected labels were correct, while recall measures how many correct labels were selected. These metrics guide annotator performance and help in QA reports.
true_positives = 8
false_positives = 2
false_negatives = 3
precision = true_positives / (true_positives + false_positives)
recall = true_positives / (true_positives + false_negatives)
print("Precision:", precision, "Recall:", recall)

4. Manual Spot Checks
Spot checking involves manually reviewing random samples to detect annotation errors. It’s cost-effective and especially useful in large datasets. Doccano allows exporting annotations for sampling externally. Annotators are then given targeted feedback for improvement.
import random
annotations = ["A", "B", "C", "D", "E"]
spot_check = random.choice(annotations)
print("Sample chosen for spot check:", spot_check)

5. Reviewer Comments
Reviewers can leave comments during or after checking annotations. These are essential for continuous learning and correcting annotation errors. Some tools support threaded comments or in-line suggestions that annotators must review and acknowledge.
reviewer_comment = "Ensure correct sentiment tag for sarcasm cases."
print("Reviewer Note:", reviewer_comment)

6. Annotator Training
Before annotation starts, annotators must undergo training sessions with sample texts and guidelines. Doccano allows using practice projects to prepare them. This ensures consistency and prevents early-stage errors that may cascade.
training_dataset = ["Text A", "Text B", "Text C"]
print("Training examples provided:", training_dataset)

7. Audit Trails
Audit trails track who did what and when. Doccano logs annotations, edits, and reviewer actions. This information is critical when debugging large projects or handling compliance audits.
audit = [{"user": "Ann", "action": "label", "time": "12:00"}, {"user": "Rev", "action": "approve", "time": "12:30"}]
print("Audit Log:", audit)

8. Automated QA Scripts
Scripts can flag missing labels, inconsistent formats, or schema violations. They are crucial in large datasets where manual review isn’t scalable. These can be run before model training to clean data.
annotations = ["Positive", "", "Negative"]
flagged = [i for i, a in enumerate(annotations) if a == ""]
print("Flagged entries:", flagged)

9. Comparing Annotation Versions
Comparing old and new annotation versions helps track changes or model-driven improvements. It’s especially useful when refining guidelines or retraining teams.
version1 = ["A", "B", "C"]
version2 = ["A", "B", "D"]
changes = [(i, v1, v2) for i, (v1, v2) in enumerate(zip(version1, version2)) if v1 != v2]
print("Differences:", changes)

10. Common QA Issues
Common problems include missing annotations, inconsistent labeling, bias, and ambiguous definitions. These can often be solved by clearer guidelines, better training, and using automated tools to validate consistency before exporting datasets for training models.
issues = ["Inconsistent tag", "Missed label", "Overlapping span"]
print("Frequent QA Issues:", issues)

1. GDPR and CCPA Overview
Doccano should comply with privacy laws such as GDPR (EU) and CCPA (California), which include rights to access, delete, and restrict processing of user data.

2. Deleting User Data
You can remove user data from the database with admin controls or script-based deletion.
# Deleting a user (Django ORM)
from django.contrib.auth.models import User
User.objects.filter(username="john").delete()

3. Data Anonymization Techniques
Before sharing/exporting, personal identifiers can be masked.
def anonymize_text(text):
return text.replace("John", "[REDACTED]")
print(anonymize_text("John lives in New York"))

4. Access Logs and Audits
Maintain logs of user actions for audit trails. Use Django’s logging or integrate external log tools.
{ "user": "admin", "action": "deleted project", "time": "2025-07-01T14:30" }

5. Role-Based Access Control
Doccano supports assigning roles to users (admin, annotator, manager) with permission scopes.
user_role = {"username": "editor", "role": "annotator"}

6. Encryption Practices
All traffic should use HTTPS. Doccano data at rest should be encrypted on disk (Docker volume encryption, etc.).

7. Consent Collection (If Needed)
If collecting external data, include consent fields or external consent systems.
consent_form = {"user": "alice", "agreed": True}

8. Sharing Data Securely
Share datasets via secure URLs or password-protected archives (e.g., ZIP with encryption).

9. Retention Policy
Define how long data is stored. Auto-delete options can be added via cron jobs.
# Pseudocode
if data_date < 365_days_old:
delete_record()

10. Data Breach Handling
Set up monitoring and incident response plans. Immediately revoke tokens and notify stakeholders.

1. Exporting to Training Pipelines
Export data in JSONL or CSV from Doccano and feed into model training pipelines (e.g., spaCy or PyTorch).
# Load exported JSONL
import json
with open("data.jsonl") as f:
data = [json.loads(line) for line in f]

2. Using Doccano with spaCy
Transform JSONL to spaCy format and train custom models.
# Sample NER tuple
("John lives in Canada", {"entities": [(0, 4, "PERSON"), (14, 20, "LOCATION")]})

3. Hugging Face Transformers Integration
You can prepare Doccano-labeled datasets for Hugging Face using `datasets` library.
from datasets import load_dataset
dataset = load_dataset("json", data_files="data.jsonl")

4. Custom Model Feedback Loop
Use model outputs to re-import predictions into Doccano for correction.
# Append model suggestions
{"text": "Berlin", "predicted_labels": [[0, 6, "LOCATION"]]}

5. Zero-shot Learning with Doccano
Prepare label-rich context for zero-shot models like `bart-large-mnli`.
from transformers import pipeline
classifier = pipeline("zero-shot-classification")
result = classifier("I love Paris", candidate_labels=["Location", "Person"])

6. Data Augmentation Tools
Use libraries like `nlpaug` to generate synthetic data.
import nlpaug.augmenter.word as naw
aug = naw.SynonymAug(aug_src="wordnet")
print(aug.augment("He is happy."))

7. Auto-Annotation Possibilities
Use scripts to pre-label texts using a model and upload into Doccano.
# Auto-tag locations
if "Canada" in text:
labels.append((0, 6, "LOCATION"))

8. Training Loop Automation
Build pipelines with exported data, model training, and feedback uploading.
def loop_train():
download_data()
train_model()
evaluate_and_upload()

9. Evaluating Model Performance
Compare model predictions vs. Doccano labels using F1, precision, recall.
from sklearn.metrics import classification_report
print(classification_report(true, predicted))

10. Continual Learning Setup
Export corrected annotations regularly and retrain your model to keep it improving over time.

1. Common UI Errors
Issues may arise from frontend build problems or JS misconfigurations. Try refreshing with `Ctrl+Shift+R` or rebuild UI.

2. Import Failures
Occurs if data has bad JSON format or unsupported structure.
# Use JSON validator
import json
json.loads('{"bad": "syntax}') # Will throw error

3. Export File Corruption
Check encoding. Always export in UTF-8 and avoid Excel auto-formatting.

4. Docker Container Logs
View logs using:
docker logs -f doccano_backend

5. Missing Labels in Export
Check if the label is saved and if filtering is excluding unapproved annotations.

6. User Login Issues
Ensure backend is running, user exists, and browser has cookies enabled.

7. Annotation Not Saving
This may happen if DB is disconnected or session times out. Try logging out and in again.

8. Project Not Loading
Verify project status in the database or via API.

9. Database Connection Errors
Check PostgreSQL credentials and ensure Docker DB is reachable.
docker exec -it doccano_backend ping db

10. Getting Community Help
Visit GitHub Issues page or join discussions at:
https://github.com/doccano/doccano/discussions

1. Upcoming Features
Expected improvements include better document import UX, richer metadata export, and enhanced bulk actions.

2. Roadmap Review
Doccano's roadmap is open. Check GitHub milestones for progress and goals.

3. Suggested Improvements
Popular suggestions include keyboard-only annotation, more analytics, and plug-and-play ML models.

4. AI-Assisted Annotation
Auto-annotation using pretrained models (Hugging Face, spaCy) is being explored for tighter integration.

5. Deeper ML Integration
Feedback loops and continual learning pipelines are on the radar for enterprise users.

6. WebSocket-Based Sync
Live collaboration features with real-time syncing could be added using WebSockets.

7. Cross-platform Mobile Use
Doccano works on mobile browsers but future versions might offer dedicated PWA/mobile apps.

8. Offline Annotation Mode
Allowing local annotation syncing once online could benefit users in low-connectivity environments.

9. Semantic Search in UI
Advanced search (embedding/vector-based) is being considered to help find specific texts.

10. Vision for Next-Gen Doccano
A fully modular, plugin-based platform for both beginners and researchers with cloud-native deployment support.