Welcome, Guest. Please login or register.

Poll

Do you support this?

No
5 (27.8%)
Yes, Version 1.1
0 (0%)
Yes, Version 1.2
1 (5.6%)
Yes, Version 1.3
12 (66.7%)

Total Members Voted: 17

Author Topic: [Feature Request] Multipart Message (Draft: V 1.3)  (Read 20908 times)

AyrA

  • BM-Bc7Rspa4zxAPy9PK26vmcyoovftipStp
  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1262
  • Karma: +75/-7
  • bitmessage.ch and timeservice operator
    • View Profile
    • AyrAs Homepage
Re: [Feature Request] Multipart Message (Draft: V 1.3)
« Reply #45 on: June 18, 2013, 04:59:22 PM »
No, Version 1.3 suggests, to use messageType 3 instead of 2 to tell the two apart without actually looking at the content, but basically, no it does not change the protocol.
My Address: BM-Bc7Rspa4zxAPy9PK26vmcyoovftipStp
Bitmessage Time Service (Subscribe): BM-BcbRqcFFSQUUmXFKsPJgVQPSiFA3Xash
Support the Multipart Message Declaration Draft for Bitmessage: https://bitmessage.org/forum/index.php/topic,1553.0.html
Free Bitmessage to E-Mail Gateway: https://bitmessage.ch

srmojuze

  • Full Member
  • ***
  • Posts: 151
  • Karma: +6/-0
    • View Profile
    • BitChirp.org
Re: [Feature Request] Multipart Message (Draft: V 1.3)
« Reply #46 on: June 19, 2013, 07:01:59 AM »
Hey guys, for Bitchirp.org I really need support for images in Bitmessage. Hope this MIME Type(?) stuff works out. Because, text content in the "blockchain" seems like a solid idea for Bitchirp, however, the visual image, particularly in high-censorship times, is important.

How it is to be moderated is a question, but Bitmessage itself is designed that anything text can be sent (or Base64 etc). So, if the images are in the "blockchain" this is very, very important, because web hosters can kill image hosting but if the image binary itself is in the "blockchain" that means the image will never be "lost" (for the duration that messages are kept in the "blockchain").

Don't forget the API support for extracting binaries such as JPG, that is critical. Thanks...!

AyrA

  • BM-Bc7Rspa4zxAPy9PK26vmcyoovftipStp
  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1262
  • Karma: +75/-7
  • bitmessage.ch and timeservice operator
    • View Profile
    • AyrAs Homepage
Re: [Feature Request] Multipart Message (Draft: V 1.3)
« Reply #47 on: June 19, 2013, 07:27:46 AM »
How it is to be moderated is a question, but Bitmessage itself is designed that anything text can be sent (or Base64 etc). So, if the images are in the "blockchain" this is very, very important, because web hosters can kill image hosting but if the image binary itself is in the "blockchain" that means the image will never be "lost" (for the duration that messages are kept in the "blockchain").
Since the image is part of the message, it should not be lost for 2 days. This MIME implementation actually allows users to include data as raw 8 bit, so there is technically no special decoding algorithm needed.
My Address: BM-Bc7Rspa4zxAPy9PK26vmcyoovftipStp
Bitmessage Time Service (Subscribe): BM-BcbRqcFFSQUUmXFKsPJgVQPSiFA3Xash
Support the Multipart Message Declaration Draft for Bitmessage: https://bitmessage.org/forum/index.php/topic,1553.0.html
Free Bitmessage to E-Mail Gateway: https://bitmessage.ch

srmojuze

  • Full Member
  • ***
  • Posts: 151
  • Karma: +6/-0
    • View Profile
    • BitChirp.org
Re: [Feature Request] Multipart Message (Draft: V 1.3)
« Reply #48 on: June 19, 2013, 07:38:46 AM »
How it is to be moderated is a question, but Bitmessage itself is designed that anything text can be sent (or Base64 etc). So, if the images are in the "blockchain" this is very, very important, because web hosters can kill image hosting but if the image binary itself is in the "blockchain" that means the image will never be "lost" (for the duration that messages are kept in the "blockchain").
Since the image is part of the message, it should not be lost for 2 days. This MIME implementation actually allows users to include data as raw 8 bit, so there is technically no special decoding algorithm needed.

Cool. I would love to have image attachments in Bitmessage, hope we can work it out.

Mayly

  • Newbie
  • *
  • Posts: 10
  • Karma: +0/-0
    • View Profile
Re: [Feature Request] Multipart Message (Draft: V 1.3)
« Reply #49 on: July 10, 2013, 02:39:31 PM »
First off, I'd suggest printing out a copy of the MIME standards,
and NOT read it.  Burn them, it's a great symbolic gesture.

.... ok, i've stolen that https://www.kernel.org/doc/Documentation/CodingStyle ;)

i read through this thread and got different ideas and i am not sure which one to follow, so i comment on different things i have seen here.

  • there is no individual delimiter string required, as the serializer should make sure any \n----------\n found inside content parts is replaced by e.g. \n----------\n[space].
    a deserializer just removes the trailing space if \n----------\n[space] is found in parts to get the original back and even if the content used \n----------\n[space] anywhere inside this would not break it - side note: works with \r\n linebreaks of course too
  • Encoding types: Plain sounds like text/plain, so it was confusing for me. ftp uses the term binary, maybe raw would also be clear for binary data
  • Quote
    A Value can be made of any printable character of the UTF subset. It ends with a Line break. Maximum Length (without Line break) is 1024 Chars.
    utf8 characters can extend to multiple bytes. is this what you intended to specify: 1024 (multibyte+singlebyte) characters > 1024 byte?
  • Quote
    I propose the first part to have no headers and to be text-only. This would make reading in old/unsupported clients easier for its users.
    older clients would not stop displaying the entire message after the first delimiter behind the plain text. this will result in some mojibake and could break clients when binary/raw content is rendered. maybe ending the plain text with a zero (\x00) byte could trick an older client (well... we have just one to try atm) to stop rendering after it.
  • we should also add some subject/body specifications as in encoding type 2
    Quote
    messageToTransmit = 'Subject:' + subject + '\n' + 'Body:' + message
    but actually we could ignore this spec and build an entire new one for encoding type 3, the client would nevertheless ignore the content if it does not know type 3 - same problem with fallback. i rather suggest to break some backward compatibility when introducing the multipart feature: let the old client display emptyness and hope upgrades happen fast (and it is still a zero-point release where this is allowed).
  • i like ISibboI's idea but i would not use the var_int for numbers of parts, but for the length of each ones. so you can skip the delimiters and start away directly with the attributes:

    as content of message (uchar[]):
Code: [Select]
1+           part_length   var_int  part length
part_length  part_content  uchar[]  the content consisting of:
  Attributes: Values (as AyrA defined)
  [empty line]
  [contentcontentcontent]
1+           part_length   var_int  part length (of next part)
part_length  part_content  uchar[]  the content consisting of:
  Attributes: Values (as AyrA defined)
  [empty line]
  [contentcontentcontent]
  .
  .
  .
  • Quote from: boondoggle
    As far as I know there wouldn't be any advantage to multi-part messages in terms of the proof of work.  Maybe there could be a security/anonymity case to be made, but if so I'm not sure what that would be.
    the client adds a constant 18000 bytes to any message to penalise small messages. but actually i cannot estimate which tradeoff is better
  • Quote from: nimda
    This name can also be used, when referencing to this part inside a message (for example a text/html part may reference to a png image inside the same message).

    Should probably require the part to have already been declared if this is happening. For example, you must have the png data before the html data. Otherwise we run the risk of infinite loops that will be difficult to prevent. Similarly, if at some point this spec supports compression containers, they should be separate from the multipart spec, lest we run the risk of zip files all the way down.
    your entire message is already in memory at the time of decryption. so it is not neccesary to start rendering before deserialization of multiple contents is finished.
  • Quote from: nimda
    There are iFrames and other such things. Do you know the entire HTML spec? How about the entire MIME spec? The goal should be to prevent whole classes of attacks, not pick them off individually.
    hope i got your point right: as mail clients and webmail clients already do: not loading external resources without asking. i assume any html engine has some kind of zone model today, at least i expect them to have it in 2013 ;)
  • i am thinking of zyrox' and boondoggle's idea of splitting everything up. this may be a security benefit because the smaller the objects the harder it is to identify it (let's say the origin of multiple leaked voice recordings could be located by just looking for the large sized transmission).

waiting for version 1.4 ;)
« Last Edit: July 10, 2013, 02:46:10 PM by Mayly »
BM-Gu7n18AYojrcPFeGsW2uijMd7o5jVLHc

AyrA

  • BM-Bc7Rspa4zxAPy9PK26vmcyoovftipStp
  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1262
  • Karma: +75/-7
  • bitmessage.ch and timeservice operator
    • View Profile
    • AyrAs Homepage
Re: [Feature Request] Multipart Message (Draft: V 1.3)
« Reply #50 on: July 10, 2013, 03:02:21 PM »
there is no individual delimiter string required, as the serializer should make sure any \n----------\n found inside content parts is replaced by e.g. \n----------\n[space].
a deserializer just removes the trailing space if \n----------\n[space] is found in parts to get the original back and even if the content used \n----------\n[space] anywhere inside this would not break it - side note: works with \r\n linebreaks of course too
The delimiter is actually defined in the header, so there is no possible misinterpretation of headers.

Encoding types: Plain sounds like text/plain, so it was confusing for me. ftp uses the term binary, maybe raw would also be clear for binary data
Quote
A Value can be made of any printable character of the UTF subset. It ends with a Line break. Maximum Length (without Line break) is 1024 Chars.
utf8 characters can extend to multiple bytes. is this what you intended to specify: 1024 (multibyte+singlebyte) characters > 1024 byte?
Yes, multibyte, it would be unfair for other languages if we choose single bytes because their subject may be up to 3 times shorter.

Quote
I propose the first part to have no headers and to be text-only. This would make reading in old/unsupported clients easier for its users.
older clients would not stop displaying the entire message after the first delimiter behind the plain text. this will result in some mojibake and could break clients when binary/raw content is rendered. maybe ending the plain text with a zero (\x00) byte could trick an older client (well... we have just one to try atm) to stop rendering after it.
The null terminator trick does not works, I tried already. mojibake is not a problem, because in old/unsupported clients users can read the first part anyway so it does not matters, if the rest is messed up or not, since it is a display only issue, not a storage issue.


we should also add some subject/body specifications as in encoding type 2
Quote
messageToTransmit = 'Subject:' + subject + '\n' + 'Body:' + message
but actually we could ignore this spec and build an entire new one for encoding type 3, the client would nevertheless ignore the content if it does not know type 3 - same problem with fallback. i rather suggest to break some backward compatibility when introducing the multipart feature: let the old client display emptyness and hope upgrades happen fast (and it is still a zero-point release where this is allowed).
old clients will probably not display anything at all, if they do not recognize the message type but they will update quickly, if they can no longer read many messages.
My Address: BM-Bc7Rspa4zxAPy9PK26vmcyoovftipStp
Bitmessage Time Service (Subscribe): BM-BcbRqcFFSQUUmXFKsPJgVQPSiFA3Xash
Support the Multipart Message Declaration Draft for Bitmessage: https://bitmessage.org/forum/index.php/topic,1553.0.html
Free Bitmessage to E-Mail Gateway: https://bitmessage.ch

Mayly

  • Newbie
  • *
  • Posts: 10
  • Karma: +0/-0
    • View Profile
Re: [Feature Request] Multipart Message (Draft: V 1.3)
« Reply #51 on: July 10, 2013, 04:15:20 PM »
there is no individual delimiter string required, as the serializer should make sure any \n----------\n found inside content parts is replaced by e.g. \n----------\n[space].
a deserializer just removes the trailing space if \n----------\n[space] is found in parts to get the original back and even if the content used \n----------\n[space] anywhere inside this would not break it - side note: works with \r\n linebreaks of course too
The delimiter is actually defined in the header, so there is no possible misinterpretation of headers.
i had a mistake in this example, it actually does not work that way. i was trying to say you could just use hyphens and don't require a randomly generated string inside if you simply regex replace any occurrence of the same pattern while serializing and deserializing (adding or removing a specific pattern respectively) pgp does this in some way a cannot remember at the moment but my previous example does break at least in one specific case.

but you would not require this at all if you just give the length of each part in the beginning (like in my proposal in the code box): counting the bytes and split after them is quite easy to implement in all languages.

BM-Gu7n18AYojrcPFeGsW2uijMd7o5jVLHc

berdario

  • Newbie
  • *
  • Posts: 12
  • Karma: +0/-0
    • View Profile
Re: [Feature Request] Multipart Message (Draft: V 1.3)
« Reply #52 on: November 12, 2013, 01:22:25 PM »

The E-Mail MIME implementation also uses dashes and I want this implementation to be as identical as possible, so existing MIME parsers can be rewritten with little effort.


Adapting existing MIME parsers is really this important?

Already forking them to make them support unicode for the attribute values might be not so straightforward in statically typed languages with no type inference (like C or Java) and I assume that most parser (or the most used) fall in this case...

for all the others it wouldn't be difficult, but exactly for this reason it might make more sense to just do the simplest thing possible, disregarding MIME altogether

A Value can be made of any printable character of the UTF subset. It ends with a Line break. Maximum Length (without Line break) is 1024 Chars.

I think we need to specify an encoding (UTF alone isn't one)... I assume you meant UTF-8?

I have made here a system, that does not relies on integers at all and is fully compatible with the current clients and protocol spec. If you create a standard you have the full possibility to make it without any silly integers, that can cause problems between big endian and little endian systems. technically an integer limmits you to transport a 4 GB File maximum. Without any sort of integer at all you have no limits. Also your idea forces us to invent a part for each content type, that can be transferred.

just use the network byte order... if clients don't use it, it's simply their fault for not being compliant

also: unless the 180MB message limit is lifted, I don't see the rationale for allowing files bigger than 4GB

I rather standardize it to the Windows linebreak (\r\n), so it will work with Windows, Linux and Mac without any impacts, also It is compatible with the E-Mail standard.

IMHO, "it will work with Win, Lin, Mac without any impacts" doesn't mean anything...
This is a protocol and an application issue... the only place it matters in afaik is when people try to debug it from a commandline tool that doesn't convert the newline

A compliant HTTP server won't necessarily accept your hand-made connections made with socat or telnet from a *nix box, they have to special case it (example: cherokee)


having a "\r\n" for newline seems like old cruft... I'd rather standardize on "\n", and possibly be lenient to accepting "\r\n"

Encoding: String, that specifies how the content is encoded. This may either be PLAIN, BASE64 or a user defined value. Minimal Implementation of a client is PLAIN and BASE64. If not specified, PLAIN is assumed. PLAIN is 8-bit raw data, where BASE64 is the Base64 encoding specified in rfc4648

Why BASE64? Only for MIME compatibility?

as you wrote:
old clients will probably not display anything at all, if they do not recognize the message type but they will update quickly, if they can no longer read many messages.

I'd rather just scrap the Encoding attribute, and make everything "PLAIN" by default (as you mentioned, we have the content-type if we need to further specify the encoding)

Example Message

Multipart: -----0-TROLOLOL-----
Hello this is an example Text you will not even see if you look at this with HTML
Multipart: -----2-TROLOLOL-----
Content-Type: application/octet-stream
Encoding: Plain
Name: multipart.fake
Some random Data
Multipart: -----0-TEST-----
more random data
Multipart: -----3-TROLOLOL-----
Content-Type: text/html
Encoding: Base64
Name: secret.html
WW91IGxvc3QgdGhlIGdhbWUh

In this example, the "TEST" Multipart is considered as part of the data, because "TEST" does notmatches "TROLOLOL" fro the first line.

This stems from my disliking of adapting MIME, but I don't really like that to choose the ID_PART, you'd need to scan every file (possibly several MBs big) to check for the lack of
"Multipart: -----N-yourchosenstring-----"

(ok, to do the PoW I assume that the whole message will be already in memory... and so it should be reasonably fast, but still it seems suboptimal)

AyrA

  • BM-Bc7Rspa4zxAPy9PK26vmcyoovftipStp
  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1262
  • Karma: +75/-7
  • bitmessage.ch and timeservice operator
    • View Profile
    • AyrAs Homepage
Re: [Feature Request] Multipart Message (Draft: V 1.3)
« Reply #53 on: November 12, 2013, 06:28:12 PM »
Adapting existing MIME parsers is really this important?
its easier, than rewriting everything from scratch, if users tend to use the various applications out there to read and write bitmessage with their E-mail applications already, we should try to make a standard, which is similar to MIME.

Already forking them to make them support unicode for the attribute values might be not so straightforward in statically typed languages with no type inference (like C or Java) and I assume that most parser (or the most used) fall in this case...
Messages in bitmessage are UTF-8 encoded anyways. I recommend not using a programming language, that has UTF-8 issues.

for all the others it wouldn't be difficult, but exactly for this reason it might make more sense to just do the simplest thing possible, disregarding MIME altogether
MIME is a standard that is somewhat human readable and very well tested. Also I do not want the full MIME specs because we do not have the 7-bit issue.

A Value can be made of any printable character of the UTF subset. It ends with a Line break. Maximum Length (without Line break) is 1024 Chars.

I think we need to specify an encoding (UTF alone isn't one)... I assume you meant UTF-8?
Yes. Why should I change the existing encodings at all?


just use the network byte order... if clients don't use it, it's simply their fault for not being compliant
no, you have an integer again, that requires you to check fisrt, if it is valid.
Since parts are terminated anyways by the end of the message or the next part, it is a stupid, unneeded addition.

also: unless the 180MB message limit is lifted, I don't see the rationale for allowing files bigger than 4GB
We probably want to be able to split files in the future. Also 4 GB is natural uint size. No need to strip it down, if it is there already. Makes it easier to check for a valid size, if you can simply check, if it fits into an uint field.

IMHO, "it will work with Win, Lin, Mac without any impacts" doesn't mean anything...
Try opening text files with different line breaks on different platforms, and you will see, that the file with \r\n is the most readable on all systems.

Why BASE64? Only for MIME compatibility?
No, for UTF-8 compatibility. You cannot send raw 8-bit bytes in an utf-8 coded message or bytes get lost when converting back.

I'd rather just scrap the Encoding attribute, and make everything "PLAIN" by default (as you mentioned, we have the content-type if we need to further specify the encoding)
Content-Type is not the same as Content-Encoding.

/*EXAMPLE*/
This stems from my disliking of adapting MIME, but I don't really like that to choose the ID_PART, you'd need to scan every file (possibly several MBs big) to check for the lack of
"Multipart: -----N-yourchosenstring-----"
You scan the whole file anyways, if you read it or when it arrives and is decrypted. The Multiplart declaration is the first line, so there is not much to check, if it is a multipart message or not.
My Address: BM-Bc7Rspa4zxAPy9PK26vmcyoovftipStp
Bitmessage Time Service (Subscribe): BM-BcbRqcFFSQUUmXFKsPJgVQPSiFA3Xash
Support the Multipart Message Declaration Draft for Bitmessage: https://bitmessage.org/forum/index.php/topic,1553.0.html
Free Bitmessage to E-Mail Gateway: https://bitmessage.ch

berdario

  • Newbie
  • *
  • Posts: 12
  • Karma: +0/-0
    • View Profile
Re: [Feature Request] Multipart Message (Draft: V 1.3)
« Reply #54 on: November 13, 2013, 02:59:16 AM »
Adapting existing MIME parsers is really this important?
its easier, than rewriting everything from scratch, if users tend to use the various applications out there to read and write bitmessage with their E-mail applications already, we should try to make a standard, which is similar to MIME.

Already forking them to make them support unicode for the attribute values might be not so straightforward in statically typed languages with no type inference (like C or Java) and I assume that most parser (or the most used) fall in this case...
Messages in bitmessage are UTF-8 encoded anyways. I recommend not using a programming language, that has UTF-8 issues.
"has UTF-8 issues" is a crude oversimplification

Do you realize that most E-mail applications that people use, that you claim to want to support, are written in languages that fit the above description? (Thunderbird, Evolution, Claws Mail, Mail.app, Win8's mail)

Or do you just want to give them rope, and eventually let some developers hang themselves with it?
Quote
A Value can be made of any printable character of the UTF subset. It ends with a Line break. Maximum Length (without Line break) is 1024 Chars.

I think we need to specify an encoding (UTF alone isn't one)... I assume you meant UTF-8?
Yes. Why should I change the existing encodings at all?
You shouldn't
Quote

just use the network byte order... if clients don't use it, it's simply their fault for not being compliant
no, you have an integer again, that requires you to check fisrt, if it is valid.
Since parts are terminated anyways by the end of the message or the next part, it is a stupid, unneeded addition.
Yes, the idea would be to scrap MIME altogether

I don't share your sentiment of "people are using email clients, so we should make life easy for them"

So, we'll have to just agree to disagree

(but, aside from the MIME details, I support your proposal of Multipart messages... )
Quote
IMHO, "it will work with Win, Lin, Mac without any impacts" doesn't mean anything...
Try opening text files with different line breaks on different platforms, and you will see, that the file with \r\n is the most readable on all systems.
That's an application concern: you have applications on windows that can deal just fine with "\n" newlines...

We're talking about a protocol here, people aren't going to write their messages with their text editor of choice, they're going to write them in the BitMessage client (or a Mail client) that knows which newline is the correct one to use (instead of having to make the receiving end rely on guessing which newline is it)

Quote
Why BASE64? Only for MIME compatibility?
No, for UTF-8 compatibility. You cannot send raw 8-bit bytes in an utf-8 coded message or bytes get lost when converting back.
If it has messageType 3 (or you otherwise know that's a multipart) you already know that one of the parts could be PLAIN

then it would be obviously wrong (and wasteful) to decode the whole message with utf-8 (ok, in the MIME world maybe every attachment is sent as Base64, but then again, that's no good reason for borking a protocol, imho)

the correct approach would be to: first parse the multipart protocol, and then decode the parts that have no Content-type or Content-type text

"plain text" doesn't truly exist, there's no fundamental difference to the idea of ASCII-encoding a text or UTF8-encoding it (other than the obvious character set and the multi-byte characters), and there're protocols that mix-n-match "plain" text and binary all the time

just try to download a .pdf through HTTP
Quote
I'd rather just scrap the Encoding attribute, and make everything "PLAIN" by default (as you mentioned, we have the content-type if we need to further specify the encoding)
Content-Type is not the same as Content-Encoding.

Content-Type can contain the Encoding

In fact, if everything is PLAIN (and there's no reason it shouldn't be, as I showed above... unless you're striving to be as close to MIME as possible) you should just scrap the Encoding attribute

you could have
Code: [Select]
Content-Type: text/html; charset=UTF-8
Encoding: PLAIN


Code: [Select]
Content-Type: text/html
Encoding: UTF-16

but even:
Code: [Select]
Content-Type: text/html; charset=UTF-8
Encoding: UTF-16

and this last one is incorrect... you're forcing the parser to handle a lot of different combinations and such inconsistencies by looking at 2 different attributes, when one should be enough
Quote
/*EXAMPLE*/
This stems from my disliking of adapting MIME, but I don't really like that to choose the ID_PART, you'd need to scan every file (possibly several MBs big) to check for the lack of
"Multipart: -----N-yourchosenstring-----"
You scan the whole file anyways, if you read it or when it arrives and is decrypted. The Multiplart declaration is the first line, so there is not much to check, if it is a multipart message or not.

Yes, the receiving end, if it wants, will scan it...

but you don't need to scan it...

if you know the size, you can just put the size in a part attribute and read() the file from disk into the message straight away

you're forcing the sender into doing some busy-work (which may be negligible, I have no idea... but still seems terribly wrong)

AyrA

  • BM-Bc7Rspa4zxAPy9PK26vmcyoovftipStp
  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1262
  • Karma: +75/-7
  • bitmessage.ch and timeservice operator
    • View Profile
    • AyrAs Homepage
Re: [Feature Request] Multipart Message (Draft: V 1.3)
« Reply #55 on: November 13, 2013, 06:40:40 PM »
"has UTF-8 issues" is a crude oversimplification

Do you realize that most E-mail applications that people use, that you claim to want to support, are written in languages that fit the above description? (Thunderbird, Evolution, Claws Mail, Mail.app, Win8's mail)

Or do you just want to give them rope, and eventually let some developers hang themselves with it?
They seem to handle UTF-8 encoded mails just fine.

A Value can be made of any printable character of the UTF subset. It ends with a Line break. Maximum Length (without Line break) is 1024 Chars.

I think we need to specify an encoding (UTF alone isn't one)... I assume you meant UTF-8?

Yes. Why should I change the existing encodings at all?
You shouldn't
Then why do you assume I probably would?

(but, aside from the MIME details, I support your proposal of Multipart messages... )
You are free to propose a multipart message system, that is completely different from this.
It probably will never get implemented anyways (and this proposal here too), because the bitmessage client we are using only serves as proof of concept.

We're talking about a protocol here, people aren't going to write their messages with their text editor of choice, they're going to write them in the BitMessage client (or a Mail client) that knows which newline is the correct one to use (instead of having to make the receiving end rely on guessing which newline is it)
E-mail clients use CRLF (\r\n), using only \n violates various protocols.
HTTP specification is the same: CRLF

If it has messageType 3 (or you otherwise know that's a multipart) you already know that one of the parts could be PLAIN
If an additional message type is approved at all. If not we need to stick with the default type and if you have a PLAIN section in the middle, which violates UTF-8 specs, you need to manually build decoders to rip apart the parts.

then it would be obviously wrong (and wasteful) to decode the whole message with utf-8 (ok, in the MIME world maybe every attachment is sent as Base64, but then again, that's no good reason for borking a protocol, imho)
Fast decoders can parse a gigabyte in under 10 seconds. Since we are limited to 180 MB anyways, you would not have longer than 2 seconds to decode the biggest possible message. Sending this big chunk from one node to another is slower anyways due to slow upload speeds.

the correct approach would be to: first parse the multipart protocol, and then decode the parts that have no Content-type or Content-type text
You need to seek through the complete message anyways to collect and analyze all parts. You might as well just extract all parts and store them in an efficient way in an already splitted form, so you do the parsing for the message just once, no matter how often you need it or parts from it.
But this has nothing to do with the proposal as this method works with any content, that holds multiple parts.

"plain text" doesn't truly exist, there's no fundamental difference to the idea of ASCII-encoding a text or UTF8-encoding it (other than the obvious character set and the multi-byte characters), and there're protocols that mix-n-match "plain" text and binary all the time
I see you have never really readed UTF-8 specs. if we assume plain text indicates a single byte text system, you are limited to somewhat 100 charsets. Most of them are identical in the first 127 chars, but the upper chars, where the 8th bit is on, is usually completely different.

just try to download a .pdf through HTTP
Browsers ignore encodings for non-text content and take the data stream "as is". There is no reason to decode an executable as UTF-8.

Content-Type can contain the Encoding
Now you talk about HTTP/MIME headers. I talk about the difference between a type (image, text, raw binary) and encoding in the bitmessage itself.

In fact, if everything is PLAIN (and there's no reason it shouldn't be, as I showed above... unless you're striving to be as close to MIME as possible) you should just scrap the Encoding attribute
Then how is a client supposed to display a text/plain part then. Could be any codepage. If you want to send anything as-is you need to get atheros to remove the UTF-8 parser from the bitmessage client. Just send a message with random binary garbage in it to yourself and you will see, that the received message differs from the sent. (You need to send via API)

/*Various header examples*/
The Encoding header has actually nothing to do with the charset.
The charset is interesting for all text content.
The Encoding just telly you, how it is represented in the bitmessage itself.
So you can have a PLAIN encoded UTF-8 html file included (because a bitmessage is UTF-8), but not a PLAIN zip file, because it violates UTF-8, in this case you would switch to BASE64, ASCII85 or whatever you want.

I think you are comparing my encoding header with the mime encoding header, but they differ. MIME operates on a TCP stream, My proposal operates on top of an UTF-8 formatted bitmessage. I am basically forced to live with the underlying UTF-8 message while MIME can use the 8BITMIME extension (if available)

Yes, the receiving end, if it wants, will scan it...
It must anyways, to verify digital signature and to store the message. Cutting it into pieces won't cost you any noticeable time.

but you don't need to scan it...
It would be stupid to not do it if you need to read the whole thing from the network interface anyways.

if you know the size, you can just put the size in a part attribute and read() the file from disk into the message straight away
And just hope, that the size is correct (or add a check for the size, which you would not need at all if there was no size specified because parts terminate each other)

you're forcing the sender into doing some busy-work (which may be negligible, I have no idea... but still seems terribly wrong)
Sending a 100 KB bitmessage can take multiple minutes, so adding 10 milliseconds to join multiple parts into one large string wont hurt anybody.
My Address: BM-Bc7Rspa4zxAPy9PK26vmcyoovftipStp
Bitmessage Time Service (Subscribe): BM-BcbRqcFFSQUUmXFKsPJgVQPSiFA3Xash
Support the Multipart Message Declaration Draft for Bitmessage: https://bitmessage.org/forum/index.php/topic,1553.0.html
Free Bitmessage to E-Mail Gateway: https://bitmessage.ch

berdario

  • Newbie
  • *
  • Posts: 12
  • Karma: +0/-0
    • View Profile
Re: [Feature Request] Multipart Message (Draft: V 1.3)
« Reply #56 on: November 14, 2013, 09:06:40 PM »
"has UTF-8 issues" is a crude oversimplification

Do you realize that most E-mail applications that people use, that you claim to want to support, are written in languages that fit the above description? (Thunderbird, Evolution, Claws Mail, Mail.app, Win8's mail)

Or do you just want to give them rope, and eventually let some developers hang themselves with it?
They seem to handle UTF-8 encoded mails just fine.
fair enough, but that's not my point
Quote
A Value can be made of any printable character of the UTF subset. It ends with a Line break. Maximum Length (without Line break) is 1024 Chars.

I think we need to specify an encoding (UTF alone isn't one)... I assume you meant UTF-8?

Yes. Why should I change the existing encodings at all?
You shouldn't
Then why do you assume I probably would?
I was under the impression that you thought I told you that you should

I just wrote "you shouldn't" to emphasize that we're agreeing
Quote

We're talking about a protocol here, people aren't going to write their messages with their text editor of choice, they're going to write them in the BitMessage client (or a Mail client) that knows which newline is the correct one to use (instead of having to make the receiving end rely on guessing which newline is it)
E-mail clients use CRLF (\r\n), using only \n violates various protocols.
HTTP specification is the same: CRLF

Yes, I'm arguing for breaking unneeded compatibility, and you're arguing for being compatible with MIME
Quote


"plain text" doesn't truly exist, there's no fundamental difference to the idea of ASCII-encoding a text or UTF8-encoding it (other than the obvious character set and the multi-byte characters), and there're protocols that mix-n-match "plain" text and binary all the time
I see you have never really readed UTF-8 specs. if we assume plain text indicates a single byte text system, you are limited to somewhat 100 charsets. Most of them are identical in the first 127 chars, but the upper chars, where the 8th bit is on, is usually completely different.
Quote
The Encoding header has actually nothing to do with the charset.
The charset is interesting for all text content.
The Encoding just telly you, how it is represented in the bitmessage itself.
So you can have a PLAIN encoded UTF-8 html file included (because a bitmessage is UTF-8), but not a PLAIN zip file, because it violates UTF-8, in this case you would switch to BASE64, ASCII85 or whatever you want.

I think you are comparing my encoding header with the mime encoding header, but they differ. MIME operates on a TCP stream, My proposal operates on top of an UTF-8 formatted bitmessage. I am basically forced to live with the underlying UTF-8 message while MIME can use the 8BITMIME extension (if available)
I wrote that because I was mistakenly under the impression that you had some misconceptions about UTF-8 as well

My point was that you're not forced to operate on top of an UTF-8, since there're almost no assumptions about the encoding in the protocol itself, from what I gather...

but still, I haven't realized that already since march the API requires message encoding == 2, and that apparently it never has been possible to send a message with it set to 0 from the JsonRPC API


Quote

Content-Type can contain the Encoding
Now you talk about HTTP/MIME headers. I talk about the difference between a type (image, text, raw binary) and encoding in the bitmessage itself.
exactly, I'm arguing that there's no need to have a concept of encoding for bitmessage itself, IF we're going to have multipart-like protocol on top of it
Quote

In fact, if everything is PLAIN (and there's no reason it shouldn't be, as I showed above... unless you're striving to be as close to MIME as possible) you should just scrap the Encoding attribute
Then how is a client supposed to display a text/plain part then. Could be any codepage. If you want to send anything as-is you need to get atheros to remove the UTF-8 parser from the bitmessage client. Just send a message with random binary garbage in it to yourself and you will see, that the received message differs from the sent. (You need to send via API)
curiously, the API doesn't work from python3... and, you're right

the good news, btw... is that you just need to comment out one line to be able to send and receive binary data :)

this one

the client will break when trying to load/display that message... but the API will keep on working just fine

I'm confident that it shouldn't be difficult to adapt bitmessage to avoid parsing binary messages based on some flag specified in the protocol (but I won't do it any time soon)

Quote

if you know the size, you can just put the size in a part attribute and read() the file from disk into the message straight away
And just hope, that the size is correct (or add a check for the size, which you would not need at all if there was no size specified because parts terminate each other)
Maybe I'm just being naïve or optimistic here, but is it a serious issue?

bitmessage is an authenticated protocol... if a bit flipped after the encryption/authentication it would get caught


Your actual estimates for the time it'd take for completing some operations are interesting... I agree that with such numbers those issue are negligible

ihack

  • Newbie
  • *
  • Posts: 1
  • Karma: +0/-0
    • View Profile
Re: [Feature Request] Multipart Message (Draft: V 1.3)
« Reply #57 on: April 06, 2017, 01:49:55 AM »
Why not use a chain of hashes as multipart identifier lines?

--multipart  sha256  intial hash (<some salt + total len of message>) ------
some content
--multipart  sha256  part hash <previous hash + privious part hash>  ------
another content
--multipart  sha256  part hash <previous hash + previous part hash>  ------
--multipart sha256  end hash <previous hash + another?check?hash>---


This will avoid still possible collisions of accidental congruence of part markers
and adds another level of security to the price of some more hashing costs.

unsere Daten sind gut geschützt
vor unserem Zugriff

BM-2cSuB69vnjBCPBrLB3YVnrXEDfUC8F3sN7

Peter Šurda

  • Full Member
  • ***
  • Posts: 137
  • Karma: +4/-0
    • View Profile
Re: [Feature Request] Multipart Message (Draft: V 1.3)
« Reply #58 on: April 06, 2017, 06:58:41 AM »
Bitmessage 0.6.2 already contains extended encoding for people who want to experiment with it, so this whole multipart stuff is obsolete.