red dot Temple University College of Liberal Arts red dot Center for Vietnamese Philosophy, Culture & Society
   N Ô M · S T U D I E S
A research project in the Vietnamese Nôm cultural heritage
Overview Resources ISSI Project Look Up Links Contact

A Look at the Status of Vietnamese Nôm Studies

Editorial, Vietnamese Public Library of Knowledge
Published online April 30, 2007

The word Nôm today provokes a variety of reactions from Vietnamese in many parts of the world. One type of reaction attributes Nôm 喃 to the ancient Vietnamese mythological writing that was lost centuries ago. This is a sacred script, having the power of a 符 bùa "a charm, an amulet, or a talisman". Thus, for example, people flock every New Year day to temples to get a golden character on a red piece of paper for luck. A Vietnamese American soldier in Iraq looked for a prayer in Nôm to post in his tent in order to protect his friends. The second type of reaction attributes Nôm to feudalism, backwardness and ignorance of the dark ages in Vietnam, which should be consigned to the rubbish heap of history. This seems to be the attitude of the majority of Vietnamese. The third type of reaction considers Nôm, 喃 那 nôm na, to be “counterfeit”, or 𠼽 𠺺 mách qué, compared to the legitimate knowledge, written in 𡨸 漢 chữ Hán. This is the attitude of the majority of Vietnamese intellectuals. Most seriously, a Nôm scholar would suggest that, in order to learn chữ Nôm, the students need first to learn Chinese, or chữ Hán. S/he would say, ideographic scripts like chữ Nôm are very difficult to learn and should not be promoted—not knowing that over 90% of the people in China, Taiwan, Japan, and Korean can read and write. These attitudes are the major obstacles to attempts to promote Nôm studies.

A review of some important periods in Vietnamese history shows a different reality. Vietnam finally won its independence from China in 939 AD after more than a thousand years of being dominated and assimilated. Chữ Hán Nôm was called the “national script”, or 國 語 quốc ngữ, for the next thousand years. The last national examination using chữ Nôm was in 1919. In the 1920’s, chữ Nôm was replaced by the Latin script called chữ “quốc ngữ” by the French colonial authorities. Chữ Nôm was totally “forgotten” for the next 50 years. All documents and knowledge of chữ Nôm continued to be destroyed during these century-long wars. In the past 30 years since the country’s reunification, academic and research exchanges between the two parts of Vietnam, north and south, flowered, the seed of Nôm studies began to be planted. In the 1980’s, the Unicode Consortium and the ISO/IEC 10646 formulated international multi-lingual standards for all human scripts. The fir st submission of 2,357 chữ Nôm ideograms to Unicode was in 1992. Subsequently, when the multi-lingual operating systems emerged in 1995, chữ Nôm became part of the global computer and internet world, thus inheriting the processing power of modern technologies. One may say, the danger of losing this heritage was thwarted, but the knowledge inscribed in chữ Nôm documents has yet to be recovered. The danger of further destruction of valuable historical documents and the knowledge therein is still imminent. The presence of chữ Nôm is everywhere in the country, from steles, tombstones to historical sites and royal citadels. However, only a few Vietnamese can read them.

The changes in the last few years stunned even the pessimists. For example, the Unicode version 5.0 contains 9,299 chữ Nôm in its multi-lingual standards.[1-3] Over 400 students applied for the 30 college seats available in Hán Nôm Studies in Hanoi National University in 2005, and the same happened in Huế. In the second International Nôm Conference held in Huế in June 2006, 90% of the papers submitted were in the electronic form with Nôm fonts.

A question arises: what is going to become of Nôm studies?

A brief account for the fate of chữ Nôm

To a great extent, disaster indeed befell the thousand year-old heritage of Vietnam recorded in chữ Nôm. There seemed to be a total “conspiracy” of social forces and historical circumstances working against the restoration of the importance of chữ Nôm in Vietnamese history, which may explain the general apathy towards chữ Nôm.

We know that hundreds of thousands of documents exist in chữ Nôm over a period of over a thousand years since the tenth century. These documents—in literature, medicine, drama, music, court records, philosophy, village records, and royal proclamations—are now in danger of further destruction after 125 years of warfare, and hundreds of years of monsoons, pillage and neglect. In addition, major Nôm documents have been found languishing, unidentified, in many European and East Asian libraries, museums and private holdings (in France, Italy, the Vatican, England, Spain, Belgium, Germany, China, Japan,… to name a few), as well as libraries, institutes and private homes in Vietnam. Most of these precious texts are in grave danger of becoming lost forever. The preservation of the Nôm heritage is a desperate race against time.

Since 1919, no Hán Nôm scholars have been systematically trained. Scholars who can read and understand chữ Nôm today are almost extinct, following the general acceptance of the romanized chữ quốc ngữ (or “national script”) in the 1920's. Surviving Nôm scholars are not authorized to teach in colleges and universities because they lack of modern pedagogical training. In addition, because of the wars and the requirements of modern education, there are precious few Vietnamese teaching materials for Nôm. This is the greatest loss to Vietnamese culture in its history, second only to the loss of life in the wars.

Chữ Nôm, devised “ideographically” to represent Vietnamese speech, has never been standardized, or printed (except by woodblock) until recently. Unlike the romanized Vietnamese script, whose alphabet includes only 29 letters and 5 accent marks,[4,5] chữ Nôm has never had “an alphabet”. Traditionally, the only way to learn chữ Nôm has been to memorize all the necessary ideograms, one by one. A teacher starts teaching Hán Nôm with 三 千 字 Tam Thiên Tự (“Three thousand ideograms”) [6] written by Ngô Thì Nhậm around the end of the 18th century. It takes years to memorize the book. In fact, very few people can memorize 10,000 ideograms, much less 70,000, in one lifetime.

The above state of affairs is compounded by a series of misconceptions about the nature of chữ Nôm in learned circles.

Misconceptions about chữ Nôm

1. There are two main misconceptions about chữ Nôm as a script. The first has to do with the potentially unlimited number of ideograms. The second has to do with the procedure for forming or sorting ideograms. These problems rest in turn on confusions concerning ideographic scripts pertaining to two fundamental concepts— the ideogram and the character—that were coined by European missionaries and which had already been shown in 1838 by Peter S. Du Ponceau to be observationally inadequate.[7,8]

a. The term ideogram—representing an idea (or meaning) with a graphic symbol—exposes the internal arbitrariness of the relationship between meaning and graphic symbols. This position leads to the misconception of a “character” and to an inadequate representation of the Chinese-Japanese-Korean-Vietnamese [CJKV] Hán-based writing systems. An ideogram, 𡨸 chữ or 字 tự, is simply a graphic representation of a spoken syllable, 㗂 tiếng.

b. What is called a “character” in ideographic scripts is actually a written syllable (i.e., an ideogram, 𡨸 chữ or 字 tự 9), which incorrectly puts it on an equal conceptual footing with a Latin letter of the alphabet. A character (a letter of the alphabet, or an orthographic unit) should correspond to an ideographeme—the “most basic meaningful graphic unit of ideograms.”[9,10] The Unicode Consortium [11] defines a character as “the basic unit of encoding for the Unicode character encoding”1 and assigns each a code point: a letter of a roman alphabet, for example. Unfortunately, this misnaming of ideograms as “characters” led Unicode to reserve over 70,000 code points (71,622 in UniHan standard 5.01) for the unified CJKV scripts. Hence, an ideogram (𡨸 chữ or 字 tự) represents a spoken syllable written in a box, and a character equals to an ideographeme that forms ideograms.

There is ample evidence that each ideogram contains one or more ideographemes (or “characters”). One example is the concept of “radical” used in ideogram dictionaries (字 典 tự điển). The famous KangXi Dictionary (康 熙 字 典 Khang Hi tự điển) [12] orders each ideogram, 𡨸 chữ or 字 tự, according to its regular recurrent graphic component, called 部 bộ (“a part, division, section; bucket”, not “radical”, nor “root”) or bộ thủ 部 首(“section head; head index”) and the number of its remaining strokes. The KangXi Dictionary sorts 47,035 unique Chinese ideograms into 214 部 bộ “buckets” and further sorts each bucket into smaller buckets of ideograms with the same number of remaining strokes. Many dictionaries use the same approach. For example, 粉 phấn “flour; powder; plaster” has bộ 米 mễ and 4 strokes. There are at least 30 other ideograms that have bộ 米 mễ and 4 strokes, such as 粑 or 𥸿 bả, 𥹀 tấm, 粃 tẻ, 𥸷 x ôi, 㫧 gạo,… This description of 粉 as “米 plus 4 strokes” is thus completely inadequate.

There are also inconsistencies in the number of “radicals” in each dictionary: 214 in the KangXi Dictionary, 540 in 説 文 解 字 Thuyết văn giải tự, 200 in 漢 語 大 字 典 Hán ngữ đại tự điển, 189 in 新 華 字 典 Tân Hoa tự điển, etc. The choice of which “radical” an ideogram belongs to, or how many strokes each ideogram has, etc. is also arbitrary.

The principled strategy to resolve the above problems is one that relies solely on internal graphic regularities of ideograms as a system: a Nôm ideogram is composed of a number of the most basic graphs, or ideographemes. For example, 粉 phấn “flour; powder; plaster” is composed exactly of three ideographemes 米 mễ, 八 bát and 刀 đao, successively composed by 八 bát over 刀 đao to form 分 phân, and 米 mễ before 分 phân to form 粉 phấn. Note that the sound of 粉 phấn /fʌn/ (with high rising tone) is derived from the sound of 分 phân (with high-level tone).

2. Nôm studies in Vietnam is currently called “Hán Nôm” studies, and institutionally placed in universities under the departments of literature. This naming “Hán Nôm” incorrectly implies that chữ Nôm is secondary to chữ Hán—that in order to learn Nôm, one must first learn Hán. In fact, on the contrary, Nôm studies is a research discipline on all aspects of the Vietnamese society in history recorded in a script called chữ Nôm that properly includes chữ Hán, or Hán Việt studies, just like spoken Vietnamese contains Hán Việt words, or English contains Latin words.

a. The term 漢 “Hán” in “Hán Nôm” is misconstrued to be “Chinese” rather than “Hán Việt”—a borrowing from Chinese during the 唐 T’ang Dynasty. In modern spoken Vietnamese, Hán Việt is a type of words formed differently from the native words. For example, the term dân số “population”, formed by Hán Việt word formation [民 dân “people” + 數 số “number”] has the equivalent, số dân “population”, formed by native formation. Hán Việt formation is used to form scientific or abstract words, similar to Latin words in English. Hán Việt is a borrowing from Chinese in the sixth century, when both Vietnamese and Chinese had no tones. This borrowing has been nativized for more than 1,400 years, the ideograms may look like modern Chinese ideograms, but their sounds and meanings have changed. For example, the term “gross domestic product [GDP]” in modern Chinese is 國 内 生 產 總 值 or 国 内 生 产 总 值 (in simplified form) quốc nội sinh sản tổng trị, and in modern Sino-Japanese, 国 内 総 生 産 quốc nội tổng sinh sản; however, in Vietnamese, it is tổng sản phẩm quốc nội [總 產 品 國 内], or tổng sản phẩm nội địa [總 產 品 内 地], using all Hán Việt stems, but a mix of Hán Việt and native formations.

Thus, “Hán” in the term “Hán Nôm studies” is used to represent “Hán Việt” Sino-Vietnamese. “Hán” is therefore an inherent part of the native script “Nôm”—the script that recorded spoken Vietnamese. One may say, the term “chữ Nôm” conceptually includes chữ Hán Việt. There are many advantages in taking this position—it is known that chữ Hán has been borrowed as script elements to represent the basic sounds of ideograms. The Vietnamese used these basic sounds to approximate Vietnamese sounds. For example, the Nôm ideogram 唎 lời “word” /lɤy/ with low level tone carries the segmental Hán Việt sound 利 lợi /lɤy/ (with low creaky tone), not the modern Mandarin li4 nor the Cantonese lei6.

b. The fact that chữ Nôm (Hán Việt included) has never been standardized can also be construed as an advantage for Nôm studies. It is known that chữ Nôm reflects dialectal and idiolectal aspects of Vietnamese, synchronically. As a result of the national standardization, the romanized quốc ngữ today [4,5,13] cannot tell how people in Quảng Nam or in Huế pronounce the word tám “eight”. Chữ Nôm on the contrary, recorded all dialectal and ideolectal phonetic variations, synchronically and diachronically. Secondly, since chữ Nôm was not standardized, it recorded how a sound in Vietnamese was pronounced at different periods in history. The study of transliteration of Latin names in the Catholic bible in the past, or that of Sanskrit or Pali names in Buddhist sutras, into chữ Nôm may hint at the sound patterns of Vietnamese in history. Furthermore, the study of syntaxes in different Nôm documents in different historical times may also hint at how the Vietnamese syntactic struct ures changed.

c. Since the number of Nôm documents (Hán Việt documents included) that Nôm scholars have seen and analyzed so far is small—probably less than 1% compared to the number of Nôm documents in existence, the basic study of Nôm ideograms is still urgently needed. At one time, in the 1990’s, many scholars believed that the number of chữ Nôm ideograms proper (i.e. those that do not have the same shape as Chinese, Japanese or Korean ideogram repertoires) are less than 3,000 as seen in existing dictionaries.[14-17] The number of chữ Nôm ideograms proper we have seen since then is now over 6,000 and still growing. This fact means that strategies to discover unseen chữ Nôm ideograms are important. One of such endeavor is to digitize all Nôm books in libraries and post them on the web. The other is to organize regional or international Nôm conferences to draw Nôm scholars out of woodwork into the national scene. This will give the field of Nôm studies basic tools and materials to advance.

3. Chữ Nôm is a writing system, representing spoken Vietnamese at different periods in history. It recorded not only literature, but also all aspects of Vietnamese life, including religions, philosophy, natural sciences, and social and economic activities of a civilization. Therefore Nôm studies deserves to be a part of Vietnamese studies, a multi-disciplinary field of study, making use of all modern scientific disciplines to shed light on what is called Vietnamese in history recorded by chữ Nôm.

a. One of the first steps in Nôm studies is to quickly bring chữ Nôm script onto the same platform of technologies as other western language scripts. Nôm ideograms need to be included in the Unicode multi-lingual computer standards to be distributed to every computer and webpage produced. In web dictionaries and knowledge bases of Nôm documents, [18-20] words and characters need to be developed and to be accessed widely. In this way, character and word frequencies, syntactic and morphologic co-occurrences, contexts, comparisons of different published versions of a Nôm text (such as the textual studies on the 5 different versions of The Tale of Kiều,[21] or the Hán Việt phonology and syntax of A Complete History of Đại Việt [22]), etc. can be done uniformly.

b. Similar to the romanized quốc ngữ alphabet which has become a part of all operating systems and the web, an ideographeme project to identify the basic building blocks of Nôm existing ideograms, i.e. the “Nôm alphabet”, is the most urgent and basic step to provide similar digital tools to Nôm scholars and students to reproduce documents and perform textual studies. Libraries, galleries and museums all over the world with Nôm holdings, including those in Vietnam, will be able to post them for public access. The Vietnamese education system will be able to institute basic Nôm curricula [23] from high school through university, based on the Nôm ideographemes with the Nôm lookup tools, the dictionary searches, and the archives as valuable educational resources. The Nôm “alphabet” of fewer than 500 ideographemes not only requires less memorization (and thus, greatly improves literacy), it also rationalizes the Nôm ideographic system for the first time. It lends scientific methodology to Nôm pedagogy. The Nôm alphabet consequently speeds up while securing the preservation of Vietnamese culture by increasing Nôm literacy. The alphabet will make it easier for the Ministry of Education to include Nôm literacy in the educational system. A Nôm alphabet will cut at least 10 years off the constant process of “carving” and “encoding” of newly found ideograms. In all these ways, the digital re-engineering of the Nôm alphabet is a logical and urgently needed step in the preservation of the endangered Vietnamese cultural heritage.

  1. The Unicode Standard (Chapter 12: East Asian Script). (The Unicode Consortium,, 2006).
  2. Vietnamese Standard. Information Technology – The Nôm 16-bit character standard code set for information interchange — Chữ Nôm Việt. (TCVN 5573, Hanoi, Vietnam, 1993).
  3. Vietnamese Standard. Information Technology – The Nôm 16-bit character standard code set for information interchange — Chữ Nôm Hán (TCVN 6056, Hanoi, Vietnam, 1995).
  4. Ðỗ, J., Ngô, T.N. & Nguyễn, H. A proposal for standard Vietnamese character encodings in a unified text-processing framework. Computer Standards & Interfaces 14, 3-10 (1992).
  5. Vietnamese Standard. Information Technology – Vietnamese 8-bit standard code character set for information interchange (VSCII) (TCVN 5712, Hanoi, Vietnam, 1993).
  6. Đoàn, T.C. Tam Thiên Tự 三 千 字 [Three thousand ideograms] (Vietnam Culture and Information Publishing House, Hanoi, Vietnam, 1999).
  7. Mai, B.T. & Ngô, T.N. in The Second International Nôm Conference (2006).
  8. Du Ponceau, P.S. (ed. John Vaughan Esq.) (American Philosophical Society, Philadelphia, 1838).
  9. Ngô, T.N. (Vietnam Unicode/ISO 10646 Committee, Hanoi, 2001).
  10. Ngô, T.N. in The First Conference of Vietnamese Studies (The National University of Hanoi, 1998).
  11. ISO/IEC 10646 JTC 1/WG 1/IRG Ideographic Rapporteur Group and Unihan 3.1 Radical-Stroke Index. ( ).
  12. 康 熙 字 典 KangXi Dictionary (Trung Tân Library, Taiwan, 1981).
  13. Ngô, T.N. Some problems in the designing of Vietnamese on a computer. Vietnam Culture Journal 4, 210-217 (1985).
  14. Nguyễn, Q.X. & Vũ, V.K. Tự điển chữ Nôm [An ideogram dictionary of the Nôm script] (Trung tâm Học liệu, Sàigòn, 1971).
  15. Viện Ngôn ngữ học. Bảng tra chữ Nôm [A Nôm ideogram glossary] (Nhà xuất bản Khoa học Xã hội, Hà Nội, 1976).
  16. Thiều, C. Hán Việt tự điển [A Sino-Vietnamese ideogram dictionary] (Đuốc Tuệ Publishing House, Hà Nội, 1942).
  17. Trần, V.K. Giúp đọc Nôm và Hán Việt [A Guide to the Pronunciation of Nôm and Sino-Vietnamese ideograms] (Đà Nẵng Publishing House & The Vietnamese Nôm Preservation Foundation, Đà Nẵng, Vietnam, 2004).
  18. The Vietnamese Nôm Preservation Foundation. (
  19. Ngô, T.G., Tô, T.Đ., Ngô, T.N. & Ngô, T.V. in The Second International Nôm Conference (The Thừa Thiên-Huế Center for Information Technology, 2006).
  20. Tống, P.K. & Lê, A.M. in The First International Nôm Conference (The National Library of Vietnam. Hà Nội, 2004).
  21. Balaban, J. et al. in The Second International Nôm Conference (The Thừa Thiên-Huế Center for Information Technology, Huế, 2006).
  22. Ngô, T.N. & Ngô, T.V. in The Second International Nôm Conference (The Thừa Thiên-Huế Center for Information Technology, Huế, 2006).
  23. Ngô, T.N. & Ngô, T.V. in the 2006 Association for Asian Studies: Promoting Nôm Studies in the U.S. (Hyatt Hotel, San Francisco, 2006).

Center for Vietnamese Philosophy, Culture & Society Gladfelter Hall 1016 1115 Polett Walk red dot
Tel. 215 204 9207 red dot Temple University Philadelphia, PA 19122 Nôm page queries: Ngô Thanh Nhàn