Commit Graph

7 Commits

Author SHA1 Message Date
Linus Torvalds
060fc106b6 Merge tag 'unicode-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode
Pull unicode updates from Gabriel Krisman Bertazi:

 - constify a read-only struct (Thomas Weißschuh)

 - fix the error path of unicode_load, avoiding a possible kernel oops
   if it fails to find the unicode module (André Almeida)

 - documentation fix, updating a filename in the README (Gan Jie)

 - add the link of my tree to MAINTAINERS (André Almeida)

* tag 'unicode-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
  MAINTAINERS: Add Unicode tree
  unicode: change the reference of database file
  unicode: Fix utf8_load() error path
  unicode: constify utf8 data table
2024-11-22 20:50:55 -08:00
Gabriel Krisman Bertazi
5c26d2f1d3 unicode: Don't special case ignorable code points
We don't need to handle them separately. Instead, just let them
decompose/casefold to themselves.

Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-10-09 13:34:01 -04:00
Gan Jie
66715f005b unicode: change the reference of database file
Commit 2b3d047870 ("unicode: Add utf8-data module") changed
the database file from 'utf8data.h' to 'utf8data.c' to build
separate module, but it seems forgot to update README.utf8data
, which may causes confusion. Update the README.utf8data and
the default 'UTF8_NAME' in 'mkutf8data.c'.

Signed-off-by: Gan Jie <ganjie182@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240912031932.1161-1-ganjie182@gmail.com
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-09-13 11:23:01 -04:00
Thomas Weißschuh
43bf9d9755 unicode: constify utf8 data table
All users already handle the table as const data.
Move the table itself into .rodata to guard against accidental or
malicious modifications.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240809-unicode-const-v1-1-69968a258092@weissschuh.net
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-08-13 15:21:50 -04:00
Jeff Johnson
68318904a7 unicode: add MODULE_DESCRIPTION() macros
Currently 'make W=1' reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/unicode/utf8data.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/unicode/utf8-selftest.o

Add a MODULE_DESCRIPTION() to utf8-selftest.c and utf8data.c_shipped,
and update mkutf8data.c to add a MODULE_DESCRIPTION() to any future
generated utf8data file.

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20240524-md-unicode-v1-1-e2727ce8574d@quicinc.com
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-06-20 19:30:02 -04:00
Christoph Hellwig
2b3d047870 unicode: Add utf8-data module
utf8data.h contains a large database table which is an auto-generated
decodification trie for the unicode normalization functions.

Allow building it into a separate module.

Based on a patch from Shreeya Patel <shreeya.patel@collabora.com>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
2021-10-12 11:41:39 -03:00
Masahiro Yamada
28ba53c076 unicode: refactor the rule for regenerating utf8data.h
scripts/mkutf8data is used only when regenerating utf8data.h,
which never happens in the normal kernel build. However, it is
irrespectively built if CONFIG_UNICODE is enabled.

Moreover, there is no good reason for it to reside in the scripts/
directory since it is only used in fs/unicode/.

Hence, move it from scripts/ to fs/unicode/.

In some cases, we bypass build artifacts in the normal build. The
conventional way to do so is to surround the code with ifdef REGENERATE_*.

For example,

 - 7373f4f83c ("kbuild: add implicit rules for parser generation")
 - 6aaf49b495 ("crypto: arm,arm64 - Fix random regeneration of S_shipped")

I rewrote the rule in a more kbuild'ish style.

In the normal build, utf8data.h is just shipped from the check-in file.

$ make
  [ snip ]
  SHIPPED fs/unicode/utf8data.h
  CC      fs/unicode/utf8-norm.o
  CC      fs/unicode/utf8-core.o
  CC      fs/unicode/utf8-selftest.o
  AR      fs/unicode/built-in.a

If you want to generate utf8data.h based on UCD, put *.txt files into
fs/unicode/, then pass REGENERATE_UTF8DATA=1 from the command line.
The mkutf8data tool will be automatically compiled to generate the
utf8data.h from the *.txt files.

$ make REGENERATE_UTF8DATA=1
  [ snip ]
  HOSTCC  fs/unicode/mkutf8data
  GEN     fs/unicode/utf8data.h
  CC      fs/unicode/utf8-norm.o
  CC      fs/unicode/utf8-core.o
  CC      fs/unicode/utf8-selftest.o
  AR      fs/unicode/built-in.a

I renamed the check-in utf8data.h to utf8data.h_shipped so that this
will work for the out-of-tree build.

You can update it based on the latest UCD like this:

$ make REGENERATE_UTF8DATA=1 fs/unicode/
$ cp fs/unicode/utf8data.h fs/unicode/utf8data.h_shipped

Also, I added entries to .gitignore and dontdiff.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-04-28 13:45:36 -04:00