如何理解翻译中的格式问题?

已邀请:

朱小二 - 大辞科技,产品经理,期待和大家交流翻译技术

赞同来自:

翻译项目,第一步就需要处理文件格式。
文件格式,从广义上说,可以理解成:源文档格式、本地化文件格式、项目文件包格式、翻译数据交换格式、各种CAT工具间相互交换的文件格式(或许还包括各种规则设置的文件格式)。

适用的CAT在格式处理方面,起码要满足两点:
1.内置主流的文档格式解析器,兼容通用格式
2.允许用户在软件内处理特殊文件格式
 
“特殊文件格式”包含两块:
A 通用格式的特殊处理
例如,对于Excel文件,过滤带颜色背景的单元;Adobe InDesign文件,是否可导入master pages;Word文档,可选择是否导入翻译评论,多语Excel格式文件,多语XML格式文件处理等;另外内嵌HTML的XML文档,或内嵌XML的HTML文档处理等;
 
B 特殊格式的通用处理
客户/有翻译需求的甲方,通常由写作工具生成的文件格式比较复杂,这就需要CAT可定义特定的filter(文件导入时,特定的导入规则的集合),形成“模板”来处理特殊的且常用到的文件格式,或对接客户的内容管理系统。
相应的需要进行的工作,包括:
  • 梳理文本内容结构
  • 定义导入规则(导入前和导入后)
  • 将规则集合定义成文本过滤器模板

实现:
  • 将文本信息同结构化的标记剥离开来,对文本信息进行翻译或者进行语言字符转化
  • 同时,完整的保留标签、属性和结构(tags、attributes)


列举下memoQ支持的文件格式内容,作为参考:

Source documents:
Adobe Framemaker™ (.MIF)
Adobe InCopy™
Adobe InDesign™ (.INDD - a free Language Terminal account is needed)
Adobe InDesign™ (.INX)
Adobe InDesign™ Markup Language (.IDML)
Adobe PDF files (.PDF)
Adobe PhotoShop™ (.PSD) - you need to open the file in Photoshop for the translation to be visible
AuthorIT (.XML)
DITA (.DITA, .XML)
FreeMind mind maps (.MM)
HTML (.HTML, .HMT, .SHT), including HTML5
Microsoft Word™ 2003 (.DOC, .RTF, .BAK, .DOT)
Microsoft Word™ 2007-2013 (.DOCX)
Microsoft Excel™ 2003 (.XLS, .XML, .XLT)
Microsoft Excel™ 2007-2013 (.XLSX, .XLSM)
Microsoft PowerPoint™ 2003 (.PPT, .PPS, .POT)
Microsoft PowerPoint™ 2007-2013 (.PPTX, .PPSX, .POTX, .SLDX)
Microsoft Visio™ (.VDX)
MS Help™ Workshop (.HHC, .HHK)
OpenDocument text documents (.ODT, .ODF)
Plain text (.TXT, .INF, .INI, .REG)
Rich Text Format™ (.RTF, two-column .RTF)
Scalable Vector Graphics (.SVG)
TMX filter
Typo3 pages (.XML)
XML (.XML) and SGML (.SGML) files
YAML (.YAML)
Any text based document can be imported for translation through the "regex text filter"
Complex documents (for example HTML embedded into Excel files) can be imported in a translator-friendly way using cascading filters which link several filters one after the other
Microsoft Office documents embedded into other documents can be imported using the composite filters
TMX files can be imported also as translation documents
memoQ can extract images from documents and offers a translation workflow for images
 
Software localization formats:
.NET resource files (.RESX)
HTML (.HTML, .HMT, .SHT), including HTML5
Java properties
JSON
Multilingual XML formats (XML, using xPath)
Multilingual Excel/CSV formats (CSV, TSV, XLS, XLSX)
PO Gettext files (.PO)
Regex text filter
XML
YAML
Any text based document can be imported for translation through the "regex text filter"
Cascading filters can be used to process for example HTML embedded in TSV files
Multilingual Excel files are documents with multiple columns, each representing a different language, context or other information
Through structural alignment, available in LiveDocs™, you can automatically create perfect quality translation memories from existing documents that have structural information like resource ID, or Microsoft Excel files
memoQ can process image files and offers a translation workflow for images
 
Bilingual Documents:
memoQ Bilingual Document (.MBD) - import only since memoQ 6.0
memoQ XLIFF (.MQXLZ)
SDL TradosTag (.TTX)
SDL Trados / Wordfast Classic bilingual RTF (.RTF or .DOC, pre-segmented)
SDL Worldserver XLIFF (.XLZ)
SDLXLIFF (.SDLXLIFF)
SDL WSXZ (.WSXZ) 
Two-column .RTF export (.RTF)
Wordfast Professional TXML (.TXML)
XLIFF (.XLF, .XLIF, .XLIFF)
Interoperability Now! XLIFF:doc
 
Project Files:
Handoff packages (.MQOUT)
Handback packages (.MQBACK)
SDL Studio package (.SDLPPX)
STAR Transit project (.PXF, .PPF)
TIPP package

Translation Memories:
Translation Memory eXchange (.TMX) 1.1-1.4
memoQ translation memories
 
Term bases:
Delimited files (.CSV, .TSV, .TXT)
Microsoft Excel term bases (.XLS)
SDL Multiterm XML term bases (.XML)
TBX files (.TBX)
Translation Memory eXchange (.TMX) 1.1-1.4
 
memoQ term bases
Segmentation rules:Segmentation Rule eXchange (.SRX)
 

要回复问题请先登录注册