[T106][ZXW-22]7520V3SCV2.01.01.02P42U09_VEC_V0.8_AP_VEC origin source commit Change-Id: Ic6e05d89ecd62fc34f82b23dcf306c93764aec4b

commit: 9ed821d7e5d875a3395740a9cc2545671fa429b7 [log] [tgz]
author: lh <lh@exm.com> Fri Apr 07 01:36:19 2023 -0700
committer: lh <lh@exm.com> Fri Apr 07 01:36:19 2023 -0700
tree: 121b5ff9a43e30066e3b33d57065b04bcf9bbabf
parent: a10a76fcf09e2ec7e9055902f242dff467ab9654 [diff] [blame]
diff --git a/ap/app/busybox/src/docs/unicode.txt b/ap/app/busybox/src/docs/unicode.txt
new file mode 100644
index 0000000..9c159ce
--- /dev/null
+++ b/ap/app/busybox/src/docs/unicode.txt

@@ -0,0 +1,71 @@
+	Unicode support in busybox
+
+There are several scenarios where we need to handle unicode
+correctly.
+
+	Shell input
+
+We want to correctly handle input of unicode characters.
+There are several problems with it. Just handling input
+as sequence of bytes would break any editing. This was fixed
+and now lineedit operates on the array of wchar_t's.
+But we also need to handle the following problematic moments:
+
+* It is unreasonable to expect that output device supports
+  _any_ unicode chars. Perhaps we need to avoid printing
+  those chars which are not supported by output device.
+  Examples: chars which are not present in the font,
+  chars which are not assigned in unicode,
+  combining chars (especially trying to combine bad pairs:
+  a_chinese_symbol + "combining grave accent" = ??!)
+
+* We need to account for the fact that unicode chars have
+  different widths: 0 for combining chars, 1 for usual,
+  2 for ideograms (are there 3+ wide chars?).
+
+* Bidirectional handling. If user wants to echo a phrase
+  in Hebrew, he types: echo "srettel werbeH"
+
+	Editors (vi, ed)
+
+This case is a bit similar to "shell input", but unlike shell,
+editors may encounter many more unexpected unicode sequences
+(try to load a random binary file...), and they need to preserve
+them, unlike shell which can afford to drop bogus input.
+
+	more, less
+
+Need to correctly display any input file. Ideally, with
+ASCII/unicode/filtered_unicode option or keyboard switch.
+Note: need to handle tabs and backspaces specially
+(bksp is for manpage compat).
+
+	cut, fold, watch
+
+May need ability to cut unicode string to specified number of wchars
+and/or to specified screen width. Need to handle tabs specially.
+
+	sed, awk, grep
+
+Handle unicode-aware regexp match
+
+	ls (multi-column display)
+
+ls will fail to line up columnar output if it will not account
+for character widths (and maybe filter out some of them, see
+above). OTOH, non-columnar views (ls -1, ls -l, ls | car)
+should NOT filter out bad unicode (but need to filter out
+control chars (coreutils does that). Note that unlike more/less,
+tabs and backspaces need not special handling.
+
+	top, ps
+
+Need to perform filtering similar to ls.
+
+	Filename display (in error messages and elsewhere)
+
+Need to perform filtering similar to ls.
+
+
+TODO: write an email to Asmus Freytag (asmus@unicode.org),
+author of http://unicode.org/reports/tr11/
commit	9ed821d7e5d875a3395740a9cc2545671fa429b7	[log] [tgz]
author	lh <lh@exm.com>	Fri Apr 07 01:36:19 2023 -0700
committer	lh <lh@exm.com>	Fri Apr 07 01:36:19 2023 -0700
tree	121b5ff9a43e30066e3b33d57065b04bcf9bbabf
parent	a10a76fcf09e2ec7e9055902f242dff467ab9654 [diff] [blame]