Unihan統合原則

From Phonology

  • 本文節譯自《Unicode Standard 4.0 -- East Asian Scripts》pdf 299-302頁
  • 原始文檔(英文——包括插圖)版權由[www.unicode.org]保有,譯者擬以此譯文作為學術參考資料及知識普及讀物供各界參考學習。未經受權,不得轉載及用於其他目的。
  • 原文版權所有人,若以此文發佈形式不妥,請知會譯者,本人將按貴方要求刪除此文或做相應改動。
  • 能力所限,本文對原文的理解及轉述或有不當之處,譯者對內容的準確性不作任何擔保。

Contents

[edit] Principles·原則

[edit] Three-Dimensional Conceptual Model·三維概念模型

原文 譯文

【Three-Dimensional Conceptual Model.】 To develop the explicit rules for unification, a conceptual framework was developed to model the nature of Han ideographic characters.This model expresses written elements in terms of three primary attributes: semantic(meaning, function), abstract shape (general form), and actual shape (instantiated, typeface form). These attributes are graphically represented in three dimensions according to the X, Y, and Z axes (see Figure11-3).


【三維概念模型】爲了確立歸併的準則,我們構建了一個概念框架以之模型化地展示漢字的本質屬性。此模型將漢字字樣分解爲以下三方面的特質:語義(意義,功能)、抽象形(普通形態)和實用形(實體化的,印刷字樣的形態)。這些特質可以圖形化地以相對於X、Y和Z三條坐標軸的維度(dimension)分別表示(見下圖)。

3D_hHSJqRR7Rs0f.jpg


原文 譯文

The semantic attribute (represented along the X axis) distinguishes characters by meaning and usage. Distinctions are made between entirely unrelated characters such as 澤 (marsh)and 機 (machine) as well as extensions or borrowings beyond the original semantic cluster such as 机1 (a phonetic borrowing used as a simplified form of 機) and 机2 (table, the original meaning).


The abstract shape attribute (the Y axis) distinguishes the variant forms of a single character with a single semantic attribute (that is, a character with a single position on the X axis).

The actual shape (typeface) attribute (the Z axis) is for differences of type design (the actual shape used in imaging) of each variant form.

Only characters that have the same abstract shape (that is, occupy a single point on the X and Y axes) are potential candidates for unification. Z axis typeface and stylistic differences are generally ignored.

語義特質(X軸方向)以字義和用途區分不同的字。完全無關的字,比如「澤」(沼澤)和「機」(機械)以及與本來義羣無關的字形借用所造成的對立,比如机1(同音假借作「機」的簡化形)和机2(桌子,本義)都可以從語義上加以區分。

抽象形特質(Y軸)將語義特質一致的(就是說,X軸的坐標相同)某字的若干變體(異體字)加以區分。

實用形(字樣)特質(Z軸)表示的是某一字的某一變體在具體字樣(字型)中形態上的差異。

只有那些抽象形相同的字符(在X軸和Y軸所構成的坐標平面中佔據相同的點)纔被看成是潛在的候選統合對象。Z軸方向字樣上和風格上的差異一般忽略不計。

[edit] Unification Rules·歸併準則

原文 譯文

【Unification Rules】. The following rules were applied during the process of merging Han characters from the different source character sets:

【歸併準則】將不同字集來源的漢字進行歸併的過程,遵照下述準則。

[edit] Source Separation Rule·來源區分規則

原文 譯文

R1:〖Source Separation Rule〗. If two ideographs are distinct in a primary source standard, then they are not unified.

  • This rule is sometimes called the round-trip rule because its goal is to facilitate a round-trip conversion of character data between an IRG source standard and the Unicode Standard without loss of information.


  • This rule was applied only for the work on the original CJK Unified Ideographs block (also known as the Unified Repertoire and Ordering or URO). The IRG dropped this rule in 1992 and will not use it in future work.

Each of the six variants in Figure11-4 is separately encoded in one of the primary source standards——in this case, J0 (JIS X 0208-1990), as shown in Table11-3.


R1,〖來源區分規則〗。如果兩箇字符被任何一種主要的來源標準認定爲不同的字,則不能將它們統而爲一。



  • 此準則有時也被稱作「往返準則」,因為它的目的就是要實現在IRG(表意字符採錄小組)來源字集標準與unicode標準之間無損的相互轉換。
  • 本準則僅適用於構建CJK 統一表意字符基本區。IRG 於1992年廢止了此準則,在未來項目中將不復採用。
例圖11-4 來源區别

例圖11-4中的這六個字在某一來源字集中(J0--JIS X 0208-1990)都被分配了編碼,如表11-3所示。

Table 11-3. Source Encoding for Sword Variants Unicode JIS
U+5263 J0-3775
U+528D J0-5178
U+5271 J0-517B
U+5294 J0-5179
U+5292 J0-517A
U+91FC J0-6E5F
原文 譯文

Because the six sword characters are historically related, they are not subject to disunification by the Noncognate Rule (R2 below), and thus would ordinarily have been considered for possible abstract shape-based unification by R3 below. Under that rule, the fourth and fifth variants would probably have been unified for encoding. However, the Source Separation Rule required that all six variants be separately encoded, precluding them from any consideration of shape-based unification. Note that further variants of the !¡ãsword!¡À ide graph, U+5251 and U+528E, are also separately encoded, because of application of the Source Separation Rule!ain that case applied to one or more Chinese primary source standards, rather than to the J0 Japanese primary source standard.


這六箇「劍」字是歷史相關的,因此它們不是依照非同源準則(詳下)保持的獨立,那麼就應該遵照下面的準則3對它們的抽象形進行考量,看看是否有可行的統合存在。單就準則3而言,第4、5兩字是應該統合的。然而來源準則要求這六個字皆應有不同的編碼,這就將依基於抽象形而統合的考量排除在外了。注意,另外兩個劍字的異體U+5251(剑)和 U+528E(劎)也被分別配予了編碼,同樣是出於來源準則的要求。只不過,在此案例中應用的是中國的來源字集標準而不是前例中日本的來源字集標準J0。

[edit] Noncognate Rule·非同源規則

原文 譯文

R2〖Noncognate Rule〗. In general, if two ideographs are unrelated in historical derivation (noncognate characters), then they are not unified.


For example, the ideographs in Figure11-5, although visually quite similar, are nevertheless not unified because they are historically unrelated and have distinct meanings.


R2〖非同源規則〗。一般來講,如果兩箇字符的歷史來源不同,則不能將它們統而爲一。


比如例圖11-5中的兩個字,儘管視覺上它們給人的感覺是非常相似的,但是由於沒有歷史關聯且字義也迥然不同,故而不能將它們統合爲一。

例圖11-5 非同源關係,不歸併
土≠士

[edit] (相同抽象形)

原文 譯文

R3: By means of a two-level classification (described next), the abstract shape of each ideograph is determined. Any two ideographs that possess the same abstract shape are then unified provided that their unification is not disallowed by either the Source Separation Rule or the Noncognate Rule.

To determine differences in abstract shape and actual shape, the structure and features of each component of an ideograph are analyzed as follows.

Ideograph Component Structure. The component structure of each ideograph is examined. A component is a geometrical combination of primitive elements. Various ideographs can be configured with these components used in conjunction with other components. Some components can be combined to make a component more complicated in its structure. Therefore, an ideograph can be defined as a component tree with the entire ideograph as the root node and with the bottom nodes consisting of primitive elements


R3:每個(參與比較的)字符,其抽象形將通過兩平面歸類*來確定。任何兩箇擁有相同抽象形的字符,只要不與來源準則和非同源準則的要求相抵觸,即可以被統合爲一。

我們將用以構成漢字的各個部件的結構和特徵以如下形式加以分析,以之判定抽象形和實用形之間的差別。

漢字部件結構。我們對漢字部件的結構進行了考查。一個部件是一至若干個基本元素的幾何組合。部件與部件相結合可構成各種字形。有的部件可作爲更複雜部件的構成材料。所以,我們可以將漢字分解以部件樹的形式加以定義,樹的根節點就是整個字,底層節點則由各類基本元素組成。

tree_LqsU04GCI5OU.jpg tree_LqsU04GCI5OU.jpg

原文 譯文

Ideograph Features. The following features of each ideograph to be compared are examined:

  • Number of components
  • Relative position of components in each complete ideograph
  • Structure of a corresponding component
  • Treatment in a source character set
  • Radical contained in a component

Uniqueness. If one or more of these features are different between the ideographs compared, the ideographs are considered to have different abstract shapes and therefore are considered unique characters and are not unified.

Unification. If all of these features are identical between the ideographs, the ideographs are considered to have the same abstract shape and are therefore unified.

The examples in Table11-4 represent some typical differences in abstract character shape.


漢字特徵。我們將對每箇漢字的下列特徵進行考查:

  • 部件總數
  • 部件在完整字中的相對位置
  • 對應部件的內部結構
  • 來源字集的處理方式
  • 部件所包含的部首

獨立。如果參與比較漢字有一或多項特徵不同,那末這些字即被認為具有不同的抽象形,因而將其視爲獨立的字符,不合併。

統合。如果參與比較的漢字所有的特徵都相同,那末就這些字看成是具有相同的抽象形,加以合併。

表11-4所列舉的例子展示了一些典型的抽象形差異。

The ideographs are therefore not unified.

表11-4 不統合的字符
字 符 理由
崖≠厓部件數不同
峰≠峯部件數相同,相對位置不同
拡≠擴相同位置有相同數量的部件,但相對應的部件結構不同
区≠區兩字符在某來源字集中有分別
祕≠秘對應部件是不同的部首
爲≠為相同的抽象形,不同的實用形
原文 譯文

Differences in the actual shapes of ideographs that have been unified are illustrated in table11-5.


表11-5中列舉了只有實用形差別的若干組被統合的字

表11-5 統合字符 unified_o1MqD8vQZF69.jpg

Personal tools