sys/man/1/tcs

   1 .TH TCS 1
   2 .SH NAME
   3 tcs \- translate character sets
   4 .SH SYNOPSIS
   5 .B tcs
   6 [
   7 .B -slcv
   8 ]
   9 [
  10 .B -f
  11 .I ics
  12 ]
  13 [
  14 .B -t
  15 .I ocs
  16 ]
  17 [
  18 .I file ...
  19 ]
  20 .SH DESCRIPTION
  21 .I Tcs
  22 interprets the named
  23 .I file(s)
  24 (standard input default) as a stream of characters from the
  25 .I ics
  26 character set or format, converts them to runes,
  27 and then converts them into a stream of characters from the
  28 .I ocs
  29 character set or format on the standard output.
  30 The default value for
  31 .I ics
  32 and
  33 .I ocs
  34 is
  35 .BR utf ,
  36 the
  37 .SM UTF
  38 encoding described in
  39 .IR utf (6).
  40 The
  41 .B -l
  42 option lists the character sets known to
  43 .IR tcs .
  44 Processing continues in the face of conversion errors (the
  45 .B -s
  46 option prevents reporting of these errors).
  47 The
  48 .B -c
  49 option forces the output to contain only correctly converted characters;
  50 otherwise,
  51 .B Runeerror
  52 (0xFFFD)
  53 characters will be substituted for
  54 .SM UTF
  55 encoding errors and unknown characters.
  56 .PP
  57 The
  58 .B -v
  59 option generates various diagnostic and summary information on standard error,
  60 or makes the
  61 .B -l
  62 output more verbose.
  63 .PP
  64 .I Tcs
  65 recognizes an ever changing list of character sets.
  66 In particular, it supports a variety of Russian and Japanese encodings.
  67 Some of the supported encodings are
  68 .TF jis-kanji
  69 .TP
  70 .B utf
  71 The Plan 9
  72 .SM UTF
  73 encoding, known by ISO as UTF-8
  74 .TP
  75 .B utf1
  76 The deprecated original
  77 .SM UTF
  78 encoding from ISO 10646
  79 .TP
  80 .B ascii
  81 7-bit ASCII
  82 .TP
  83 .B 8859-1
  84 Latin-1 (Central European)
  85 .TP
  86 .B 8859-2
  87 Latin-2 (Czech .. Slovak)
  88 .TP
  89 .B 8859-3
  90 Latin-3 (Dutch .. Turkish)
  91 .TP
  92 .B 8859-4
  93 Latin-4 (Scandinavian)
  94 .TP
  95 .B 8859-5
  96 Part 5 (Cyrillic)
  97 .TP
  98 .B 8859-6
  99 Part 6 (Arabic)
 100 .TP
 101 .B 8859-7
 102 Part 7 (Greek)
 103 .TP
 104 .B 8859-8
 105 Part 8 (Hebrew)
 106 .TP
 107 .B 8859-9
 108 Latin-5 (Finnish .. Portuguese)
 109 .TP
 110 .B html
 111 Unicode as encoded by HTML
 112 .TP
 113 .B koi8
 114 KOI-8 (GOST 19769-74)
 115 .TP
 116 .B jis-kanji
 117 ISO 2022-JP
 118 .TP
 119 .B ujis
 120 EUC-JX: JIS 0208
 121 .TP
 122 .B ms-kanji
 123 Microsoft, or Shift-JIS
 124 .TP
 125 .B jis
 126 (from only) guesses between ISO 2022-JP, EUC or Shift-Jis
 127 .TP
 128 .B gb
 129 Chinese national standard (GB2312-80)
 130 .TP
 131 .B big5
 132 Big 5 (HKU version)
 133 .TP
 134 .B unicode
 135 Unicode Standard 1.0
 136 .TP
 137 .B tis
 138 Thai character set plus
 139 .SM ASCII
 140 (TIS 620-1986)
 141 .TP
 142 .B msdos
 143 IBM PC: CP 437
 144 .TP
 145 .B atari
 146 Atari-ST character set
 147 .SH EXAMPLES
 148 .TP
 149 .B tcs -f 8859-1
 150 Convert 8859-1 (Latin-1) characters into
 151 .SM UTF
 152 format.
 153 .TP
 154 .B tcs -s -f jis
 155 Convert characters encoded in one of several shift JIS encodings into
 156 .SM UTF
 157 format.
 158 Unknown Kanji will be converted into
 159 .B 0xFFFD
 160 characters.
 161 .TP
 162 .B tcs -t html
 163 Convert UTF into character set-independent HTML.
 164 .TP
 165 .B tcs -lv
 166 Print an up to date list of the supported character sets.
 167 .SH SOURCE
 168 .B /sys/src/cmd/tcs
 169 .SH SEE ALSO
 170 .IR ascii (1),
 171 .IR rune (2),
 172 .IR utf (6).