The Stanford Converter into Braille & E-Text (SCRIBE) is an online document conversion system supporting the transformation of text and image-based file types into different formats. Individuals upload files through a Web interface and select from a variety of output options, including audio, Braille, or e-text formats. The SCRIBE platform will convert text files into alternate text file types, as well as convert image-based files into text formatted files. MP3 audio versions are created using the high-quality text-to-speech voices.
The SCRIBE Project
The Stanford Converter into Braille & E-Text (SCRIBE) is an online document conversion system supporting the transformation of text and image-based file types into different formats. Individuals upload files through a Web interface and select from a variety of output options, including audio, Braille, or e-text formats. The SCRIBE platform will convert text files into alternate text file types as well as convert image-based files into text formatted files. MP3 audio versions are created using the high-quality text-to-speech voices.
Notification of a completed conversion is made via e-mail. Currently, the SCRIBE platform will only support requests from individuals using a valid Stanford University e-mail address. The e-mail notification will contain the converted file or a hyperlink to download the specific file. Please note - the SCRIBE platform removes processed files after several days.
To begin converting documents, go to the Convert a File web form to upload and choose the output format as well as conversion options. A full list of the supported file types and output selections may be found at Conversion Options.
The quality of the conversion is dependent upon the quality of the original document. For instance, if you are converting a well-structured MS Word document into audio, DAISY, or ePub format, the resulting file will be properly rendered into the desired output. If, however, you upload a poorly scanned image file for conversion to a text format, the resulting text document may include recognition errors in the output file.
Welcome to the SCRIBE (SensusAccess) e-learning course!: The SensusAccess e-learning course is intended for students, staff, faculty and others who are converting material into alternate formats such as audio books, e-books, digital large-print and Braille, either for themselves or on behalf of others. The course also covers how SensusAccess can be used to improve the accessibility of documents and to make documents easier to work with.
- The SensusAccess e-learning course comprises the following nine modules:
- Module 1: Introduction to the e-learning course
- Module 2: Overview of the SensusAccess service
- Module 3: Producing simple MP3 files
- Module 4: Converting inaccessible and tricky documents
- Module 5: Producing simple e-books
- Module 6: Designing and creating accessible documents
- Module 7: Producing advanced e-books
- Module 8: Producing DAISY books
- Module 9: Producing Braille
SCRIBE is now available within Canvas !
- Follow this step-by-step tutorial on how to use SCRIBE in Canvas.
- The copyright law of the United States (Title 17, U.S. Code) governs the making of photocopies or other reproductions of copyrighted material. The person using this equipment is liable for any infringement.
Convert a File
Conversion Options
The Stanford Converter into Braille & E-Text (SCRIBE) system offers a number of different conversion options to transform image-based and text-based documents into alternate and accessible formats. In general, text-based files may be converted to other text-based file formats or MP3 audio formats. Image-based file formats will first be converted into text-based versions using optical character recognition (OCR) before being converted to the desired format type.
The following table provide specific information as to the output options when starting with a specific file type.
Original File Formats and Output Options
| Original File Format | Audio (MP3) | Braille | PEF (Portable Embosser Format) | DAISY (Full & Text-Only) | MS Word | RTF | Text | ePub | MOBI | Tagged PDF |
|---|---|---|---|---|---|---|---|---|---|---|
| .doc, .docx | Yes | Yes | Yes | Yes | No | No | Yes | Yes | Yes | Yes |
| .rtf | Yes | Yes | Yes | Yes | No | No | Yes | Yes | Yes | No |
| .txt | Yes | Yes | Yes | Yes | No | No | Yes | Yes | Yes | No |
| .htm, .html | Yes | Yes (ASCII) | Yes | No | No | No | Yes | Yes | Yes | Yes |
| ePub | No | No | No | No | No | No | No | NA | Yes | No |
| MOBI | No | No | No | No | No | No | No | Yes | NA | No |
| Yes | Yes | Yes | No | Yes | Yes | Yes | Yes | Yes | Yes | |
| .tiff, .jpeg, .gif .bmp, .djv, .j2k | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes | Yes | Yes |
Please note that the above information is not a complete list, but represents the most commonly used files for upload and conversion. You can view all the supported files on the Convert a File page.
Conversion Best Practices
The quality of a conversion is dependent upon the quality of the original document. Additionally, the resulting output format may include enhancements for navigation if the original file contains the appropriate semantic markup. For instance, a MS Word document containing the heading style markup for chapters (e.g., Heading 1, Heading 2, etc.) will convert into a more usable DAISY or ePub format with the relevant chapter navigation elements. The following best practices identify simple methods to prepare the file before converting in order to achieve a high-quality output.
PDF and image-based files will be processed using optical character recognition (OCR) to create a text-based version of the document.
- If scanning the document, ensure the scanned image is free from smudges, dark marks, highlighted text, or artifacts in the image. These will affect the accuracy of the OCR process.
- Minimize the any effects from skewing. If the image is presented at an "off-angle", the accuracy of the OCR process will be lower resulting in a lower quality text version.
- If you are starting with an image-based format and wish to convert to a text format, you may achieve better results by initially converting to Tagged PDF and then copying/pasting the text into a MS Word document. While you can convert directly from an image file to a text file with SCRIBE, you may find better results for some image documents if converting to Tagged PDF and then to a text file (see "Converting to MS Word and Text Files" section).
SCRIBE will convert image-based documents into MS Word, RTF, and text files. You may also find it useful with some image-based documents to convert initially to Tagged PDF and then copy and paste the text from the Tagged PDF into MS Word. This may result in a better reading experience and may remove non-essential content.
With the MS Word version of the document, you can more accurately "clean" the content for conversion into MP3 audio or for use with assistive technologies. Most conversions will take just a few seconds within MS Word and involve the use of the Find and Replace tools. For more information on using the Find and Replace tools, see Using the Find and Replace in MS Word , removing special characters in a document.
- Note: In the Find and Replace examples below, replace the <space> value with one spacebar and do not include the quotes.
- Submit the image-based document to SCRIBE and select Tagged PDF as the output option.
- Open the Tagged PDF and select all the text. Copy and paste this into a MS Word document (Open Office may also be used).
- Using Find and Replace:
- Search for ".<space>^p" and replace with ".^p^p" .
- Search for "<space>^p" and replace with "<space>" .
- Search for "<space>•<space>" and replace with "^p•<space>" .
- Search for "-<space>" and replace with no value.
- Save the document in your preferred text format.
To clean up a MS Word file for use with assistive technology or for creating MP3 files, perform a "search and replace" to remove optional hyphens and section breaks. Identify the special character you wish to find in the "Find:" box and leave the "Replace with:" box empty. Explore using the Find and Replace in MS Word for additional information on removing special characters in a document.
- Submit the image-based document to SCRIBE and select Microsoft Word as the output option.
- Open the converted Microsoft Word document (Open Office may also be used).
- Using Find and Replace:
- Search for "Optional Hyphen" under Special Formatting and replace with no value.
- Search for "Section Breaks" under Special Formatting and replace with "^p^p".
- Search for "Manual Page Breaks" and replace with "^p^p".
- Save the document in your preferred text format.
- Use Word styles to specify document headings. For example, the style "Heading 1" could be used to identify the title of the document and the style "Heading 2" could be used to identify chapter information. It is best to use only one "Heading 1" to facilitate accurate conversions into other document formats (e.g., DAISY, ePub, Braille, etc.).
- Provide short descriptions for content-related images in your MS Word document.
- Avoid using text boxes in your document. If you want to customize the layout, use a Column Tool or a Section Break.
- When converting to DAISY, page numbers will be based on MS Word pagination. To obtain custom pagination, use the PageNumber style from the Save As DAISY plug-in for Microsoft Office for your custom page numbers.