useful links:
- http://www.fileformat.info/info/unicode/category/index.htm
- http://www.unicode.org/
- http://www.unicode.org/charts/PDF/U4E00.pdf
- http://nadeausoftware.com/articles/2007/09/php_tip_how_strip_symbol_characters_web_page
- http://www.unicode.org/charts/symbols.html#CombiningDiacriticalMarks
印地语使用国家:缅甸、泰国、柬埔寨、老挝。
http://zh.wikipedia.org/zh-cn/%E4%BB%A5%E4%BA%BA%E5%8F%A3%E6%8E%92%E5%88%97%E7%9A%84%E8%AF%AD%E8%A8%80%E5%88%97%E8%A1%A8
葡萄牙语使用国家:葡萄牙、巴西、安哥拉、中国澳门、西班牙、莫桑比克和东帝汶。
http://www.iso.org/iso/country_codes/iso_3166_code_lists/english_country_names_and_code_elements.htm
http://www.loc.gov/standards/iso639-2/langhome.html
http://msdn.microsoft.com/en-us/library/ms533052(VS.85,loband).aspx
Briefly, language codes consist of a primary code and a possibly empty series of subcodes:
language-code = primary-code ( "-" subcode )*
Here are some sample language codes:
"en": English
"en-US": the U.S. version of English.
"en-cockney": the Cockney version of English.
"i-navajo": the Navajo language spoken by some Native Americans.
"x-klingon": The primary tag "x" indicates an experimental language tag
Two-letter primary codes are reserved for [ISO639] language abbreviations. Two-letter codes include fr (French), de (German), it (Italian), nl (Dutch), el (Greek), es (Spanish), pt (Portuguese), ar (Arabic), he (Hebrew), ru (Russian), zh (Chinese), ja (Japanese), hi (Hindi), ur (Urdu), and sa (Sanskrit).
Any two-letter subcode is understood to be a [ISO3166] country code.


