Dear members of www-html,I may have to re-introduce myself here before I send the followingposting t...
By oskarwelzl
Using Regmon I have watched registry operations during a process' initialization time.The weired thi...
By anonymous
Coincidence or not, we installed an un-related package on an NT Server (SP6) and subsequently applic...
By kfh
The following database is available in either MySQL Dump, MS Access, MS Excel or SQL 2000 backup fil...
By odditysoftware
I have an audio cassette of my father talking to his grandchildren. I would like to copy this to a C...
By dmcclure, 1 Comments
HUNDREDS of insurance jobs were under threat last night after Zurich Financial Services announced pl...
By nomen_nescio, 7 Comments
I know that files I'm interested in are open for write access of some sort. I'd like to op...
By anonymous, 2 Comments
Hi, is it possible (C#) to display a small window (somekind of a modeless dialog) on top of all othe...
By anonymous, 4 Comments
Does anyone want to help me make a Zelda Online RPG Game using Visual Basic 6?I need people for the ...
By tennisdude88, 34 Comments
Unicode Technical Report #20 (Unicode in XML and other Markup Languages)
<specifies that
Zero-width Joiners/ nonjoiners (ZWJ and ZWNJ) are suitable for use with
in the markup. But when an xml file with the tags written in Malayalam
using ZWJs (In Malayalam ZWJ is used to form certain characters) an
error is reported that the tag contained an invalid character.
Can anyone tell me what will be wrong?
Can you include or link to a short example? Note that ZWJ is legal in
the content of an XML document, but not in the names of elements.
simonmontagu | Mon, 05 May 2008 14:59:00 GMT |
Wed, 13 Sep 2006, Jose wrote:
Unicode Technical Report #20 (Unicode in XML and other Markup
Languages) specifies that
Zero-width Joiners/ nonjoiners (ZWJ and ZWNJ) are suitable for use with
in the markup.
Yes, for affecting ligature and joining behavior. I mention this because
there is a popular word processor that uses ZWJ and ZWNJ quite
inappropriately for line break control.
course, the statement is of general nature: those characters are in
principle suitable for use in marked-up text. It does not guarantee or
prescribe that a particular markup system allows them or that they will be
interpreted by their Unicode semantics.
But when an xml file with the tags written in Malayalam
using ZWJs (In Malayalam ZWJ is used to form certain characters) an
error is reported that the tag contained an invalid character.
Reported by which program? I first suspected that you may have tried to
enter these characters but they do not appear correctly in the declared or
implied character encoding.
But reading again, I notice that you are referring to _tags_ and might
actually mean the use of characters in element or attribute names, as
opposite to their use in content between tags. UTR #20 discusses the
latter, i.e. what you can use in document content proper - together with
markup, not _inside_ markup (tags).
The use of characters in element and attribute names is governed by the
use of each markup language, basically in the _identifier_ syntax.
Generally, and in XML 1.0, control characters are excluded in that syntax,
and ZWJ and ZWNJ are control characters by definition (General Category:
Cf). Thus, an attempt to use them in element names would violate
well-formedness constraints, and an XML parser would report an error - not
about an invalid character per se but about a syntax error.
In XML 1.1, ZWJ and ZWNJ are allowed in identifiers, but this is probably
of little practical value.
jukkak_korpela | Mon, 05 May 2008 15:00:00 GMT |
Jukka K. Korpela scripsit:
In XML 1.1, ZWJ and ZWNJ are allowed in identifiers, but this is
probably of little practical value.
It has the merit that it allows identifiers to be spelled correctly
in the various languages that *require* ZWJ or ZWNJ or both; Persian
and several Indic languages come to mind.
If you meant simply that XML 1.1 is not widely adopted, and it is
therefore of little practical value to write documents in it, I
must sadly agree.
johncowan | Mon, 05 May 2008 15:01:00 GMT |
As I recall, the problem with XML 1.1 adoption was that XML 1.1 was
not fully backwards compatible with XML 1.0: there were XML 1.0
documents that were not valid XML 1.1. That being the case, people
just didn't see it worthwhile to have two incompatible parsers.
As for ZWJ/NJ - the original intent was for these to not make any
semantic difference. There is a UTC action to collect cases where they
are being used to make a clear semantic difference (eg XXX means "sea
gull" and XX<ZWNJ>X means "republican"), so any feedback on such cases
would be useful.
Mark
9/13/06, John Cowan <cowan (AT) ccil (DOT) orgwrote:
--
Jukka K. Korpela scripsit:
In XML 1.1, ZWJ and ZWNJ are allowed in identifiers, but this is
probably of little practical value.
It has the merit that it allows identifiers to be spelled correctly
in the various languages that *require* ZWJ or ZWNJ or both; Persian
and several Indic languages come to mind.
If you meant simply that XML 1.1 is not widely adopted, and it is
therefore of little practical value to write documents in it, I
must sadly agree.
markdavis | Mon, 05 May 2008 15:02:00 GMT |
Mark Davis scripsit:
As I recall, the problem with XML 1.1 adoption was that XML 1.1 was
not fully backwards compatible with XML 1.0: there were XML 1.0
documents that were not valid XML 1.1.
In the sense that "XML 1.0" names a countably infinite set of abstract
objects, true; in the sense that "XML 1.0" names a set
of texts physically fixed in a tangible medium, I venture to doubt it.
Specifically, I doubt that any Real World XML 1.0 documents contained
any instances of U+007F through U+009F not as character references.
In exactly the same sense, Unicode 2.0 was not backward compatible with
Unicode 1.1, a fact which does not seem to have seriously impeded its
adoption.
The issues with XML 1.1 were in fact political; I say no more.
As for ZWJ/NJ - the original intent was for these to not make any
semantic difference. There is a UTC action to collect cases where they
are being used to make a clear semantic difference (eg XXX means "sea
gull" and XX<ZWNJ>X means "republican"), so any feedback on such cases
would be useful.
IIRC the leading case is the plural ending in Persian. It's not just
a matter of a clear semantic difference: there is no semantic difference
between "they're" and "theyre" in English, but the latter is unambiguously
wrong in the standard orthography.
johncowan | Mon, 05 May 2008 15:03:00 GMT |