by Anil Jalela | Aug 19, 2024 | Linux
The format and structure of e-mail messages are crucial for several reasons. To properly handle an e-mail message, it’s essential to understand its structure and identify all its components, including message data (e.g., sender, recipients), delivery information (e.g., e-mail servers involved, dates sent and received), message text, and attachments.
Understanding these elements is important for various processes involving e-mail, from ensuring accurate delivery to interpreting and managing message content effectively.
Firstly, to archive a message, it is essential to determine its structure and identify all the elements that comprise it, including:
- Message data: Information such as the sender, recipients, etc.
- Delivery information: Details about the email servers that handled the message, the date it was sent, the date it was retrieved, etc.
- Message text: The content of the email.
- Attachments: Any files attached to the email.
Next, these elements should be extracted from the message to help decide, through a delicate and complex process, whether the message should be archived and how it should be classified.
Finally, a decision must be made on the format in which the message and/or its components should be preserved.
Message Structure
An Internet email message consists of two main sections:
- Header: A sequence of lines at the beginning of the message, generated by the sender’s email client and the email servers involved in the delivery process.
- Body: The rest of the message, containing the message text in plain ASCII characters, and/or text containing non-ASCII characters, as well as binary data in plain ASCII encoding.
In the simplest case, as defined in RFC 822, the message body contains only plain ASCII characters. These messages are straightforward to handle, can be archived in their native format, and can be read again without any need for decoding.
However, most messages today use extended ASCII or Unicode characters, include attachments, or are in HTML format. In these cases, the message must be in MIME format. Therefore, the following sections focus on the structure of MIME messages.
Message Header
The message header is a sequence of lines, called header lines or simply headers, produced by the sender’s email client and the email servers along the delivery path. The header ends with a blank line, after which the message body begins.
Only a small portion of the information in the message header is displayed by email clients. This is reasonable, as there is a wide variety of headers, many of which are optional, and most users would be confused by too much detail. However, email clients typically allow users to inspect the complete header if they wish to investigate the message’s origin and delivery process.
The most common headers are shown in Table 1. These can be divided into four main categories based on the email management processes to which the data refer:
- Identity: These headers specify the sender and recipients of the message and add additional details. For instance, the message is usually assigned a unique Message-ID by the sender’s email server, which can be used to reference the message in other communications. Additionally, a Return-Path can be specified, which is different from the sender’s address, to receive bounce messages. The Sender header allows specifying the person or automated agent that is actually sending the message on behalf of the official sender, as listed in the From header.
- Delivery: These headers contain details about the delivery process. A Received record is added each time the message is handled by a server along the delivery path, starting with the sender’s email server and ending with the recipient’s server. A timestamp is associated with each step, specifying the local date and time the message arrived at the receiving server, expressed in standard format with GMT and time shift. Additional headers specify if the sender requested a receipt and to which address it should be sent. It is important to note that different email clients may handle receipt information differently, so the absence of a return receipt should not be taken as definitive proof that the message was not delivered or read.
- Thread: These headers are used in messages sent in reply to other messages or forwarded messages, forming a thread. Some of the header information from the original message initiating the thread is included in the new message, notably the message identifier. Headers referring to threads are particularly important in email archiving as they allow for the extraction of metadata connecting a message to other messages.
- MIME: These headers specify the structure of the message body and the MIME version, which remains 1.0 despite the evolution of the standard. The Content-Type header specifies whether the message contains one or several parts, and if it contains multiple parts, a boundary is specified to separate them. If the message contains a single part, the Content-Type and Content-Transfer-Encoding are directly specified in the header.
- Miscellaneous: Additional headers may be added, referring to security applications, spam filtering, and other email management processes.
Common Headers(A = Always Present, F = Frequent, O = Optional)
Category |
Header |
Description |
Origin |
Present |
Identity |
Date: |
Date/time sent |
Sender client |
A |
|
From: |
Address of sender |
Sender client |
A |
|
Sender: |
Address of sender’s assistant |
Sender client |
O |
|
Organization: |
Organization of author |
Sender client |
O |
|
To: |
Address of recipients (may be a list) |
Sender client |
O |
|
Cc: |
Address of recipients in carbon copy |
Sender client |
F |
|
Bcc: |
Address of recipients in blind carbon copy |
Sender client |
F |
|
Subject: |
Message summary |
Sender client |
A |
|
Message-ID: |
Unique identifier assigned by the sender |
Sender server |
F |
|
Return-Path: |
Address for ‘bounce messages’ |
Sender client |
O |
Delivery |
User-Agent: |
Sender email client software |
Sender client |
A |
|
Delivered-To: |
Recipient mailbox (may be a list) |
Recipient server |
A |
|
Received: |
One for each step in the delivery path |
Server |
A |
|
|
from: Server which sent the message |
Server |
A |
|
|
by: Server which received the message |
Server |
A |
|
|
with: Server ESMTP identifier |
Server |
A |
|
|
date: Date/time received |
Server |
A |
|
Return-Receipt-To: |
Address to send a read receipt |
Sender client |
O |
|
Disposition-Notification-To: |
Address to send a read receipt |
Sender client |
O |
Thread |
In-Reply-To: |
Message ID to which the message replies |
Sender client |
O |
|
References: |
Message ID to which the message refers |
Sender client |
O |
|
Resent-From: |
Address of sender forwarding the message |
Sender client |
O |
|
Resent-To: |
Address of the recipient forwarded message |
Sender client |
O |
|
Resent-Subject: |
Subject of the forwarding message |
Sender client |
O |
MIME |
MIME-Version: |
Always 1.0 |
Sender client |
A |
|
Content-Type: |
Specifies content and structure of the body |
Sender client |
O |
|
|
boundary: Separator in multipart messages |
Sender client |
O |
|
Content-Transfer-Encoding |
Encoding scheme |
Sender client |
A |
Message Body
A message in MIME format may contain one or several parts.
Single-Part Messages
A single-part message is a plain text message with no attachments. The corresponding Content-Type in the header is text/plain
, which also specifies character encoding. For messages containing only plain ASCII characters, the Content-Transfer-Encoding is 7-bit. If the character set is other than plain ASCII, a different encoding is used, often quoted-printable, which represents plain ASCII characters directly and encodes ISO 8859 (extended ASCII) or Unicode characters with three plain ASCII characters each. Although this and other encodings are common, many users have experienced issues with misinterpreted characters, particularly with diacritic marks, when reading messages—a common email client failure.
A similar encoding scheme, called Encoded-Word, is used for textual header information in character sets other than plain ASCII. The structure of a single-part message is represented in Figure 4. This message uses ISO 8859-1 (Western Europe) encoding and contains accented characters in both the Subject header and the text.
Example: Single-Part Message Structure
Date: Fri, 28 May 2021 16:39:57 +0200
From: “John Doe” <[email protected]>
Subject: =?iso-8859-1?Q?Meeting_with_Mr._Smith?=
MIME-Version: 1.0
Content-Type: text/plain; charset=”iso-8859-1″
Content-Transfer-Encoding: quoted-printable
Hello Mr. Smith,
Please find attached the minutes of the meeting.
Best regards,
John Doe
Multipart Messages
A multipart MIME message is used to combine several parts into a single message. Each part can have a different content type and/or encoding scheme. For example, a message with an attached image or file requires a multipart structure. Multipart messages are useful for combining different parts, such as text and HTML formats, or adding file attachments.
Figure 5 – Structure of a Multipart Message
multipart/mixed
: This subtype is used to combine different types of content into a single message, such as text with an attached image or file.
multipart/alternative
: This subtype contains multiple versions of the message body, for instance, plain text and HTML versions. This allows the recipient’s e-mail client to select the best format for display.
multipart/digest
: This subtype is similar to multipart/mixed
, but the default Content-Type value for a body part is changed from text/plain
to message/rfc822
. This media type indicates that the body contains an encapsulated message, which follows the syntax of an RFC 822 message. The multipart/digest
type is often used for sending collections of messages in a single email, such as in e-mail forwarding.
multipart/related
: This subtype provides a way to represent compound objects consisting of several interrelated parts. For example, an HTML message with embedded images would use this subtype, where the HTML document is the root part, and the images are referenced from it.
multipart/report
: This subtype is used for electronic mail reports of any kind, generally for message delivery reports. It usually consists of two parts, with an optional third part. The first part contains a human-readable message describing the condition that caused the report to be generated. The second part is machine-parsable and contains an account of the reported message handling event. The optional third part may include the original message or part of it, to assist in diagnosing problems.
multipart/signed
: This subtype is used to send digitally signed messages. It consists of two parts: a body part (the actual message) and a signature part. The digital signature authenticates the entire content of the first part. Multiple signature types are possible, though there is still a lack of standardization. Signed messages can also be sent using the multipart/mixed
schema.
multipart/encrypted
: This subtype is used to send encrypted messages. It has two parts: the first part contains information needed to decrypt the second part, which is the encrypted message. Similar to signed messages, there are different implementations specified in the Content-Type of the first part, and there is still a lack of standardization.
Each part in a multipart message is separated by a boundary string specified in the Content-Type header of the message. Multipart messages must be encoded using one of the standard encoding schemes, such as 7-bit, quoted-printable, or base64.
Example: Multipart Message Structure
Date: Mon, 31 May 2021 09:17:26 +0200
From: “Jane Smith” <[email protected]>
To: “John Doe” <[email protected]>
Subject: Meeting Notes
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary=”boundary1″
–boundary1
Content-Type: text/plain; charset=”us-ascii”
Content-Transfer-Encoding: 7bit
Hi John,
Please find the meeting notes attached.
Best regards,
Jane
–boundary1
Content-Type: application/pdf; name=”meeting_notes.pdf”
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename=”meeting_notes.pdf”
JVBERi0xLjQKJcTl8uXrp/Og0MTGCjMgMCBvYmoKMSAwIG9iago8PCAvVHlwZSAvRXh0R3N0IC9TdWI …
–boundary1–
Note: The binary content is encoded in Base64 to ensure safe transmission over the network.
OR
MIME Media Types
A MIME media type is an identifier used in a Content-Type header to specify the nature of the data in the body of a MIME entity, whether it is the body of a single-part message or a part of a multipart message. MIME media types are often referred to as Internet media types because they are also used in other Internet protocols, such as HTTP. Their purpose is to enable the correct interpretation of the message content by specifying the file format of its body and attachments.
The MIME media type mechanism is defined in RFC 2046 and is designed to be extensible, as the set of media types is expected to grow significantly over time. To ensure that Internet media types are developed in an orderly, well-specified, and public manner, a registration process has been devised, managed by the Internet Assigned Numbers Authority (IANA).
Media types are two-level identifiers, specifying a top-level type and a subtype, with optional additional parameters. RFC 2046 defines seven top-level media types. Five of them are discrete data types, specifying the format of a single file, and the remaining two are composite data types, specifying the structure of a MIME body composed of multiple parts.
The five top-level discrete media types are:
text
: Used for textual information. The subtype text/plain
indicates plain text with no formatting and is intended to be displayed directly without special software, aside from supporting the character set specified by a charset
parameter. For example: Content-Type: text/plain; charset=iso-8859–1
- This indicates a text encoded in the ISO/IEC-8859-1 character set, commonly referred to as Latin 1, Western European. Other subtypes include
text/html
for HTML files, text/xml
for XML files, and text/css
for CSS (Cascading Style Sheet) files.
image
: Used for image data, i.e., any information that requires a graphical display device to be rendered. Registered subtypes include widely used image types such as gif
, tiff
, jpeg
, and png
.
audio
: Used for audio data, i.e., any information that requires an audio device, such as a speaker, to be rendered. The general subtype is audio/mpeg
, which refers to MP3 or MPEG audio. Other audio data subtypes refer to proprietary formats, such as audio/x-ms-wma
for Windows Media Audio or audio/x-wav
for Waveform Audio File Format (WAV).
video
: Used for time-varying picture images, possibly with color and coordinated sound. Standard (IANA-registered) subtypes include video/mpeg
for MPEG-1 video with multiplexed audio, video/mp4
for MP4 video, and video/quicktime
for QuickTime video. Other subtypes refer to proprietary formats, such as video/x-ms-wmv
for Windows Media Video.
application
: Used for data that does not fit into any of the other media types. This type of data needs to be processed by an application program to be rendered. There is a very large variety of application
subtypes, with IANA having registered about 700 subtypes, most of which are vendor-specific, with identifiers beginning with vnd.
For example, the application/vnd.ms-excel
subtype is used for Microsoft Excel files. Due to the enormous variety, it is impossible to enumerate even a small set of relevant application
subtypes.
Media Types and Dynamic Contents
The situation with media types is more complex than it might appear. Besides the IANA-registered media types, many subtypes are widely used and handled by most e-mail clients but are not yet registered with IANA. For instance:
Content-Type: application/msword; name=“sample.doc” Content-Description: sample.doc Content-Disposition: attachment; filename=“sample.doc”; size=99328; creation-date=“Tue, 05 Aug 2008 10:08:40 GMT”; modification-date=“Tue, 05 Aug 2008 10:08:40 GMT” Content-Transfer-Encoding: base64
This indicates a Microsoft Word attachment, a common occurrence. Moreover, the Content-Type definition is often completed by several parameters specifying object metadata and encoding, and it is not always evident where to find the related documentation.
Dealing with media types poses several challenges when preserving and archiving e-mail, as we will discuss in more detail in section 5. The media type paradigm was designed to give e-mail users flexibility in attaching files to messages and in defining new types according to their needs. E-mail clients are not expected to handle all media types; if they cannot process a specific data type, they simply classify it as an “unknown application.”
In contrast, the archival preservation process requires the ability to render any part of an archived message at any time in the future. Therefore, it is essential to ensure that:
- All media types appearing in archived messages are registered in the archives, along with the necessary information to handle them, even if they are not registered with IANA.
- An application is available for each media type registered in the archives.
- A converted copy of the attachment is preserved in a format that guarantees it can be rendered at a later time.
Finally, issues arise from dynamic information that may be contained in a message. A common case involves external references (e.g., web links) or context-dependent information (e.g., date and time) in attached documents. Such messages are not self-contained and may not be properly rendered at a later time (or even at the time of arrival!). Therefore, when archiving these messages, appropriate policies should be established to either prevent dynamic content or “freeze” all dynamic references at arrival or archival time.
Archiving and Preserving Email Messages
When archiving and preserving email messages, it’s crucial to maintain the structure and format of the original message, including the message header, body, and any attachments. This also involves preserving metadata, such as the sender, recipients, and delivery information, to ensure the authenticity and integrity of the archived message.
by Anil Jalela | Aug 19, 2024 | Linux
How Email Works
Email is a store-and-forward method of exchanging messages on the Internet. This means a message sent by a user goes through an asynchronous process of delivery, typically involving a series of steps. In each step, the message is stored by an intermediate server on the network to be forwarded at a later time until it finally reaches its destination. The timing of delivery depends on the availability of network connections.
Figure 1 illustrates the delivery process, which involves a sender, Alice, and a recipient, Bob. Both Alice and Bob use specific applications called email clients, which run on their PCs to send and receive emails. These clients do not communicate directly but connect to email servers, which are specialized applications operated by Alice’s and Bob’s organizations or ISPs that manage the delivery process.
Figure 1 – Basic Email Infrastructure
The email delivery process involves the following steps:
- Alice composes the message using her email client.
- The message is formatted by Alice’s email client in a specific Internet email format and then sent to her local email server.
- Alice’s email server locates the address of Bob’s email server using the Domain Name System (DNS), the distributed directory of the Internet.
- The two email servers exchange the message, which may pass through a series of intermediate servers on the network, until it is finally stored in Bob’s personal mailbox on Bob’s email server.
- The message remains in Bob’s mailbox until he reads or downloads it using his email client.
The procedure is quite similar to the process Alice and Bob follow when exchanging letters. Local post offices play a role similar to that of local email servers, and letter delivery may go through additional post offices (intermediate servers). In both cases, delivery time and even delivery itself are not guaranteed.
The Internet is a best-effort network, meaning the message, like any other information crossing the network, must pass through several servers run by independent organizations that make no commitment to service availability or quality. Therefore, delivery time cannot be predicted, and the message may even get lost along the way.
However, as we will discuss later in more detail, all clients and servers involved in the delivery process follow a set of strict rules (protocols). This allows for the tracing of all relevant events and the recording of detailed information in a report appended to the message. Additionally, in case of delivery failure, the server may attempt delivery again, and the sender may request delivery reports and receipts to confirm that the message has been delivered and/or read by the recipient.
End-User Access to Email
End users can access the email system in several ways:
- Email Client: This method corresponds to the basic process discussed in the previous section, where the user runs a special application on their PC designed to interact with the email server. Email clients can be proprietary or open-source software, and a wide variety of them are available. Besides the basic functions of sending and retrieving messages from the email server, which are performed according to standard interaction protocols that ensure interoperability, they usually offer user-friendly interfaces and additional functions to classify and store messages, manage directories, and more. In this setup, messages are typically downloaded and stored on the user’s PC, which may not be convenient for users who need to access their mail from multiple devices.
- Webmail: This is the most common way users access email from their home PC, through a service offered by their ISPs or third-party organizations like Hotmail or Gmail. In this setup (see Figure 2), the client application running on the end user’s PC is an Internet browser (e.g., Explorer, Mozilla), which connects to a web server running a special webmail application. The web server acts as an intermediary and manages the connection with the email server. Additionally, messages are not downloaded to the user’s PC but are managed and stored directly on the web server. This provides a significant advantage for users who need to access their mail from multiple devices.
- Integrated Systems: This is the typical solution used by most corporations and large organizations. It integrates email access into a broader ‘collaborative’ environment that includes additional functions such as direct messaging, calendaring, contacts, and tasks, as well as support for mobile and web-based access to information. It also manages message storage on a central server. Popular products of this kind include Microsoft Exchange and IBM Lotus Domino. Users run proprietary client applications (e.g., Microsoft Outlook or Lotus Notes) on their PCs that connect to the corporate server, which in turn connects to the email server (see Figure 3). To assist mobile users, these systems often include an optional web interface, functionally equivalent to webmail, which allows access through a web browser. However, the primary interface is typically the proprietary one used on the organization’s intranet. Although this setup is specific and includes proprietary elements, it is essential to consider because it represents a significant portion of the market, especially for email archiving in corporations and large organizations.
Figure 2 – Webmail
Figure 3 – Corporate Mail with Integrated System
Interoperability of Email Systems
As discussed in previous sections, exchanging a message involves interaction among several agents (email clients and servers), which are generally heterogeneous systems based on different hardware and software platforms. Additionally, these systems are independently designed and implemented by different parties, potentially without any direct coordination.
One of the main challenges in the Internet email system is ensuring interoperability, i.e., correct and reliable communication among these heterogeneous systems. Interoperability is based on two main elements:
- Communication Protocols: These are sets of rules governing communication between agents, ensuring that agents can reliably and correctly interact using a common language and standard procedures.
- Message Format: This is a set of formal definitions specifying the structure of the message and how the message and its attachments are encoded, ensuring correct interpretation by different email clients and guaranteeing that the content of the message is correctly rendered to its recipient.
Another requirement is that interoperability must also be guaranteed over time. This means that when the definitions of protocols and message formats evolve, they should maintain backward compatibility, i.e., new rules should still be compatible with old ones. For example, a message formatted according to an older version of the message format standard should be presented correctly by an email client compliant with the new version. Unfortunately, this is not always the case, and it is a major concern in email archiving, where ensuring that archived messages remain readable over time, even as standards evolve, is crucial.
Internet Standards
The standardization process of the Internet is somewhat different from the usual ISO/IEC track, so it is worth explaining how these standards are developed and allowed to evolve.
Internet standards are developed and promoted by the Internet Engineering Task Force (IETF), which cooperates closely with major international standard bodies like ISO/IEC and the World Wide Web Consortium (W3C), the main international standards organization for the World Wide Web.
The standardization process, which dates back to the early days of the ARPAnet project, is highly cooperative and based on special documents called Request For Comments (RFC). RFCs are draft documents, mostly proposals for standards, published by the IETF and posted on the network as a ‘request for comments.’ Each RFC is assigned a unique number and is never rescinded or modified. If amendments are needed, a new RFC is issued with a different number, superseding the old one.
As stated in RFC 1796, which discusses the standardization process, “Not all RFCs are standards.” Some are just memoranda, remarks that people wish to share, research papers, or preliminary proposals on any matter concerning the Internet and Internet-based systems. The IETF assigns a status to each RFC.
‘Mature’ RFCs are rated Standard Track and are further divided into Proposed Standard, Draft Standard, and Internet Standard. Internet Standards (STD) each refer to an RFC (or a set of RFCs) and are given a unique number. Unlike the RFC number, when the standard evolves, the STD number does not change but simply refers to a new RFC that supersedes the original one.
Standardization of Email Transmission
Server-to-server and client-to-server interoperability are ensured by SMTP (Simple Mail Transfer Protocol), which is Internet Standard STD 10. SMTP dates back to August 1982 and is based on RFC 821. However, the protocol currently used by the majority of email applications is known as ESMTP (Extended SMTP) and is defined in RFC 2821, published in April 2001.
However, formally, the status of RFC 2821 is still a Proposed Standard, and the official standard is still the one defined by RFC 821. This situation of ‘going ahead of the official standard’ is typical of the Internet world, and it is of no use to argue whether it is right or wrong; we must simply cope with it.
SMTP specifies how the email client interacts with the email server to deliver the message and how email servers (often called SMTP servers) interact with each other to ensure the message passes through several agents and finally reaches its destination. The use of the SMTP protocol in the message delivery process is clearly shown in Figures 1 and 2.
Regarding the problem of email archiving, this standard is important because it defines the basic format of messages that can be handled by SMTP servers and go through the delivery process. This is a very basic format, supporting only simple text messages in plain ASCII (also called 7-bit ASCII or US-ASCII) characters, which are sufficient only for English and a few other languages. This limitation is overcome by defining a special way to encode richer content in plain ASCII characters, allowing the use of a more general set of characters in the message text, and including formatted text and multimedia content in email messages, as we will discuss in section 2.7.
Standardization of Client-Server Communication
Email clients can retrieve email from servers in several ways, supported by both standard and proprietary protocols. This is relevant to email archiving because the process of storing email messages must deal with how they are downloaded and handled by different client applications, which may affect the process and determine the format of archived messages.
POP3
POP3 (Post Office Protocol version 3) is the protocol most commonly used by email clients to retrieve messages from servers. The official Internet Standard is defined in STD 53 and is based on RFC 1939, published in May 1996. This protocol is limited in scope and allows for the download of messages only. It does not include the management of mail folders on the server side (e.g., Inbox, Sent, Drafts) or any other advanced features like server-based search or access to metadata. This is a severe limitation, especially when dealing with multiple clients, such as a PC and a smartphone, where folders should be synchronized.
IMAP4
IMAP4 (Internet Message Access Protocol version 4) is the most advanced and feature-rich protocol, officially defined by STD 55 and based on RFC 3501, published in March 2003. IMAP4 supports advanced folder management, server-based search, access to metadata, and offline operations. This makes it much better suited for use with multiple clients. However, it is more complex and demanding in terms of computing and network resources.
Webmail Protocols
The Webmail interface uses an Internet browser as the client application and a Web server (or a special Webmail server) as an intermediary that connects to the email server. The protocols used by the browser to communicate with the Web server are HTTP (Hypertext Transfer Protocol) and HTTPS (Secure HTTP). These protocols are not email-specific and are defined by Internet Standards STD 1 (RFC 2616, June 1999) and STD 66 (RFC 2818, May 2000), respectively.
The protocols used by the Web server to communicate with the email server are generally SMTP and IMAP, already discussed in previous sections.
This setup is highly relevant to email archiving, especially when it comes to ensuring that the archived message’s format includes all the information and content needed to faithfully reconstruct the message as seen by the user when accessing it via Webmail.
Standardization of Message Format
The Internet Standard format for email messages is defined by RFC 822 (August 1982), later superseded by RFC 2822 (April 2001), which specifies the format of the email header and body. The format defined by RFC 2822 is still the official standard, though it has been further refined by several other RFCs. The standard email format supports only plain text messages in US-ASCII encoding, which is a major limitation for modern email communication.
This limitation is overcome by the Multipurpose Internet Mail Extensions (MIME) standard, defined by STD 11, which is based on a set of five RFCs (RFC 2045 to RFC 2049, published in November 1996). MIME allows for the use of various character sets and multimedia content (e.g., images, sound, video) in email messages. It also supports the encoding of binary content in a 7-bit ASCII format, which is essential for the correct transmission of non-ASCII content in email messages. MIME is fundamental to modern email communication and is supported by almost all email clients and servers.
Conclusion
The email system is an essential part of modern communication, involving a complex and well-coordinated process of message exchange between various agents (clients and servers) over the Internet. The system is based on a set of standard protocols and message formats that ensure interoperability among heterogeneous systems and reliable communication across the network. These standards are defined and promoted by the IETF through a cooperative and evolving process of RFC publication and review.
The email system supports various access methods for end users, including traditional email clients, Webmail interfaces, and integrated corporate systems. Each method has its own strengths and weaknesses, but all rely on the same underlying protocols and message formats. The system’s success and widespread adoption are due to the interoperability guaranteed by these standards, which allow for seamless communication between different systems, platforms, and applications.
Understanding the standardization of email transmission, client-server communication, and message format is crucial for ensuring the long-term usability and accessibility of archived email messages. By following these standards, organizations can ensure that their archived email messages remain readable and accessible, even as technology evolves.
by Anil Jalela | Aug 19, 2024 | Linux
The first email was sent in 1971 between two computers sitting side by side in the same room, but it traveled through ARPAnet, the ancestor of the Internet. This marked the first time a message was systematically transmitted across a computer network.
The insightful remark by J.C.R. Licklider, quoted above, was made just a few years later when email was still confined to a limited circle within the scientific community, with widespread use at least a decade away. Licklider, a psychologist from MIT who conceived some of the earliest ideas of a global computer network and significantly contributed to ARPAnet, had a remarkably clear vision of what was to come and a prophetic sense of the role that this new medium would play in human communication.
Today, email is by far the most widely used form of written communication. It is estimated that more than 100 billion emails are sent daily, with that number projected to reach 300 billion by 2010. Additionally, over the last decade, it has become increasingly evident that in business, government, and even personal activities, a crucial share of relevant information is exchanged through email. In many cases, this information exists solely in email. For example, it has been estimated that email accounts for about 75% of corporate intellectual property.
Given this, the need to preserve and archive email has become clear. It would be unwise to preserve other documents while neglecting email, where the majority of information is concentrated. As a matter of fact, in recent years, many corporations and government agencies have dedicated significant resources to email archiving, triggering a market expected to reach half a billion dollars in software licenses and maintenance services by 2008.
A more detailed analysis reveals several motivations for email archiving:
Storage Concerns
The volume of email messages that corporations and large organizations must handle is vast and growing rapidly. However, email servers were not designed to store and manage large amounts of messages and attachments for extended periods. Consequently, most organizations enforce size limits on their employees’ mailboxes, often leading users to back up messages they consider important on their own PCs before they disappear from the servers. This process is informal, uncontrolled, and unreliable, with backed-up messages accessible only to the individual users who stored them—if they can still find them. Addressing storage concerns remains the primary motivation for email archiving and the strongest market driver.
Strategic Relevance
Email messages have become an increasingly important and strategic resource for organizations, and therefore should be centrally managed and archived according to precise and well-defined criteria. This approach automates and accelerates business processes, potentially leading to substantial savings by reducing the time spent locating and retrieving messages. Moreover, when an archival solution is implemented, email messages can be integrated with other organizational data and analyzed to monitor business processes and extract knowledge that can inform business strategies.
Regulatory Compliance
In recent years, many companies have faced substantial fines for failing to preserve corporate email records. For instance, in 2005, Morgan Stanley was fined $1.45 billion—an event some have dubbed a “legal Chernobyl”—for being unable to produce corporate email records during an investigation (due to lost or unrecoverable backup tapes). While the fines in other cases have been smaller, the total amount awarded in recent years has reached several billion dollars. In the U.S., new Federal Rules of Civil Procedure Amendments mandate that the production of electronic information is no longer optional. U.S. companies must be prepared to support electronic discovery and be able to quickly produce all records requested by the court, particularly emails, which have played a central role in many recent cases. Although the most prominent cases involve private organizations, government agencies must comply as well. Regulatory compliance has driven many organizations to implement email archiving systems in recent years, making it a significant market driver in the U.S.
Historical Preservation
Last but not least, in many situations, email messages should be archived and preserved as historical records for the benefit of future generations. This is especially important since, as noted earlier, email has become the most important form of communication between individuals, replacing paper-based correspondence and, in many cases, substituting or integrating telephone conversations. Historians of future generations may have a better chance of studying the Internet age than earlier parts of the 20th century when most rapid communication occurred via telephone, leaving almost no record in archives. We have a responsibility to preserve this valuable information.
The purpose of this document is to provide a concise but comprehensive account of the main issues related to email preservation and archiving, highlight the key challenges, and outline the basic policies and procedures. This is no trivial task, as email messages are a unique type of electronic document with a complex structure. Additionally, the specific infrastructure through which they are delivered—namely, the Internet—must be considered to some extent. Therefore, we have included a preliminary section on the email infrastructure and message format, issues that some users may view as technicalities, but which we believe are essential to understanding the challenges associated with preserving and archiving email messages.
by Anil Jalela | Jul 22, 2024 | Linux
Regions:-
Regions are Independent geographic areas that Consist of Zones. They affect pricing, reliability networking, and Performance. EG:- Zonal Resource:- VM
Every Regios have 3 zones in GCP.
Zone:-
A Zone is a deployment area for google cloud Resources within a region Zones Should be Considered
a single failure domain within a Region.
To deploy fault-tolerant applications with high availability and help protect against unexpected failures, deploy your application across multiple Zones in a region. EG:- Regional Resource → app engine
Multi-region:-
Multi-regional Services are designed to be able to function following the loss of a single region.
Multi-regional Resources:- BigQuery, Bigtable, Cloud Storage, Spanner, Datastore, Firestore, Artifact Registry
If a single region fails then only Customers in that region are impacted, Customers who have multi-region products are not impacted.
The fully qualified name for a zone is made up of <region> <zone >
EG:- Zone-a in region Us-central1 is Us-central1-a
every Regios end with 1,2,3 and zone ends with a,b,c
for more details visit:-
https://cloud.google.com/about/locations
And
https://cloud.google.com/compute/docs/regions-zones
by Anil Jalela | Mar 30, 2024 | Email
As an email marketer, reaching his audience effectively is essential for the success of your campaigns. In today’s online world, many people use temporary email addresses called disposable email domains. This can be both good and bad for email marketers. Understanding how disposable email domains impact your email marketing efforts is crucial for optimizing engagement and maximizing results.
What are Disposable Email Domains?
Disposable email domains are temporary email addresses created for short-term use. Users often employ these addresses to sign up for online services, newsletters, or forums without giving out their main email address. Popular disposable email services include 10minutemail.com, Guerrilla Mail, and Temp Mail.
Challenges:-
Low Engagement Rates: Emails sent to disposable addresses may have lower open and click-through rates since users often use them for temporary purposes and may not engage with the content.
Spam Filtering: Many disposable email domains are flagged by spam filters, leading to your emails being automatically routed to the spam folder or rejected altogether.
Data Quality Concerns: Since disposable email addresses are temporary, maintaining accurate subscriber data becomes challenging, impacting the quality of your email list.
Deliverability Issues: Email service providers may view emails sent to disposable addresses as suspicious, affecting deliverability rates and sender reputation.
Strategies for Overcoming Challenges:-
Segmentation: Segment your email list to identify and exclude disposable email addresses. Focus your efforts on engaging with subscribers who are more likely to interact with your content.
Email Verification: Implement email verification processes during the signup phase to detect and block disposable email addresses. This ensures that your list comprises genuine subscribers who are interested in your content.
Content Personalization: Tailor your email content to resonate with your audience’s interests and preferences. Personalized emails are more likely to capture the attention of subscribers, regardless of the email address type.
Optimize Deliverability: Monitor your email deliverability metrics closely and address any issues promptly. Utilize best practices for email authentication, such as SPF, DKIM, and DMARC, to enhance deliverability and inbox placement.
Incentivize Engagement: Offer incentives or exclusive content to encourage subscribers to use their primary email addresses rather than disposable ones. This fosters a more meaningful connection with your audience and increases the likelihood of sustained engagement.
block and verification:-
block into PowerMTA:-
domain-macro Disposable_dom nitwings.com, 0-00.usa.cc, 001.igg.biz
<domain $Disposable_dom>
type discard
discard-as-bounce yes
</domain>
block into Postfix:-
add the line in the Postfix’s “/etc/postfix/main.cf”.
transport_maps = hash:/etc/postfix/transport
Way one:- only send an email for Yahoo and Gmail all other domains are discarded
write /etc/postfix/transport
gmail.com:
yahoo.com:
* discard:
Way two:- only discard specific domains and all others are allowed.
create /etc/postfix/transport
nitwings.com discard:
blackpost.net discard:
Where to get Disposable:-
https://raw.githubusercontent.com/iocium/download.throwaway.cloud/main/list.txt
“https://raw.githubusercontent.com/andreis/disposable-email-domains/master/domains.json”
“https://github.com/ivolo/disposable-email-domains/blob/master/wildcard.json”
You can use open.kickbox.com free API to write automation and find out whether the domain is disposable or not.
“https://open.kickbox.com/v1/disposable/yopmail.com”
Small Providers need to block disposable domains because spam filter providers use unused disposable boxes as traps and generate high spam complements. Also, it is included in phishing and suspicious activity.