GroupDocsGists/ExtractDataFromAttachments.cs

## README.md

      
    Raw
  

              README.md
            
          
    Extract Text from ZIP/RAR Archives with GroupDocs.Parser for .NET

Learn how to pull text and metadata from documents stored inside ZIP and RAR archives directly in memory using GroupDocs.Parser for .NET.
📦 Prerequisites


GroupDocs.Parser for .NET (see the documentation)
Temporary license (obtain a free temporary license from the product page)
Supported archive formats: ZIP, RAR, and any document type that GroupDocs.Parser can read

🚀 Key Capabilities


In‑memory processing of archives without extracting files to disk
Recursive handling of nested ZIP/RAR archives
Extraction of document text and metadata
Graceful handling of unsupported document formats

💻 Code Example

See the following examples:

ExtractTextFromZipArchive.cs
ExtractDataFromAttachments.cs

📋 How to Use


Install GroupDocs.Parser via NuGet.
Add the two .cs files to your project.
Provide the path to the archive you want to process.
Call the ExtractTextFromZipArchive method.
Review the extracted text and metadata returned by the helper methods.

📎 Related Articles


Extract ZIP Files Data in C#
Extract ZIP Files Data in Java

🏁 Conclusion

These snippets show how to efficiently work with compressed documents using GroupDocs.Parser for .NET. For more details, explore the full documentation and try the library with a temporary license.

  
## ExtractDataFromAttachments.cs
// Extracts data from attachments: file path, metadata, and text from each document.
foreach (ContainerItem item in attachments)
{
    PrintMetadata(item.Metadata);

    try
    {
        using (Parser itemParser = item.OpenParser())
        {
            if (itemParser == null)
            {
                continue;
            }

            bool isArchive = item.FilePath.EndsWith(".zip", StringComparison.OrdinalIgnoreCase) ||
                             item.FilePath.EndsWith(".rar", StringComparison.OrdinalIgnoreCase);

            if (isArchive)
            {
                IEnumerable<ContainerItem>? nestedAttachments = itemParser.GetContainer();
                if (nestedAttachments != null)
                {
                    ExtractDataFromAttachments(nestedAttachments);
                }
            }
            else
            {
                using (TextReader reader = itemParser.GetText())
                {
                }
            }
        }
    }
    catch (UnsupportedDocumentFormatException)
    {
    }
}

## ExtractTextFromZipArchive.cs
// Extracts text from all documents inside a ZIP/RAR archive
using (Parser parser = new Parser(archivePath))
{
    IEnumerable<ContainerItem> attachments = parser.GetContainer();
    if (attachments == null)
    {
        return;
    }

    ExtractDataFromAttachments(attachments);
}
	// Extracts data from attachments: file path, metadata, and text from each document.
	foreach (ContainerItem item in attachments)
	{
	PrintMetadata(item.Metadata);

	try
	{
	using (Parser itemParser = item.OpenParser())
	{
	if (itemParser == null)
	{
	continue;
	}

	bool isArchive = item.FilePath.EndsWith(".zip", StringComparison.OrdinalIgnoreCase) \|\|
	item.FilePath.EndsWith(".rar", StringComparison.OrdinalIgnoreCase);

	if (isArchive)
	{
	IEnumerable<ContainerItem>? nestedAttachments = itemParser.GetContainer();
	if (nestedAttachments != null)
	{
	ExtractDataFromAttachments(nestedAttachments);
	}
	}
	else
	{
	using (TextReader reader = itemParser.GetText())
	{
	}
	}
	}
	}
	catch (UnsupportedDocumentFormatException)
	{
	}
	}
	// Extracts text from all documents inside a ZIP/RAR archive
	using (Parser parser = new Parser(archivePath))
	{
	IEnumerable<ContainerItem> attachments = parser.GetContainer();
	if (attachments == null)
	{
	return;
	}

	ExtractDataFromAttachments(attachments);
	}