Skip to content

Instantly share code, notes, and snippets.

@GroupDocsGists
Created March 5, 2026 17:27
Show Gist options
  • Select an option

  • Save GroupDocsGists/45e8196eb3c3fe11768149874fd36cc4 to your computer and use it in GitHub Desktop.

Select an option

Save GroupDocsGists/45e8196eb3c3fe11768149874fd36cc4 to your computer and use it in GitHub Desktop.
Demonstrates in‑memory extraction of text and metadata from documents inside ZIP and RAR archives using GroupDocs.Parser for .NET.

Extract Text from ZIP/RAR Archives with GroupDocs.Parser for .NET

Learn how to pull text and metadata from documents stored inside ZIP and RAR archives directly in memory using GroupDocs.Parser for .NET.

📦 Prerequisites

  • GroupDocs.Parser for .NET (see the documentation)
  • Temporary license (obtain a free temporary license from the product page)
  • Supported archive formats: ZIP, RAR, and any document type that GroupDocs.Parser can read

🚀 Key Capabilities

  • In‑memory processing of archives without extracting files to disk
  • Recursive handling of nested ZIP/RAR archives
  • Extraction of document text and metadata
  • Graceful handling of unsupported document formats

💻 Code Example

See the following examples:

  • ExtractTextFromZipArchive.cs
  • ExtractDataFromAttachments.cs

📋 How to Use

  1. Install GroupDocs.Parser via NuGet.
  2. Add the two .cs files to your project.
  3. Provide the path to the archive you want to process.
  4. Call the ExtractTextFromZipArchive method.
  5. Review the extracted text and metadata returned by the helper methods.

📎 Related Articles

🏁 Conclusion

These snippets show how to efficiently work with compressed documents using GroupDocs.Parser for .NET. For more details, explore the full documentation and try the library with a temporary license.

// Extracts data from attachments: file path, metadata, and text from each document.
foreach (ContainerItem item in attachments)
{
PrintMetadata(item.Metadata);
try
{
using (Parser itemParser = item.OpenParser())
{
if (itemParser == null)
{
continue;
}
bool isArchive = item.FilePath.EndsWith(".zip", StringComparison.OrdinalIgnoreCase) ||
item.FilePath.EndsWith(".rar", StringComparison.OrdinalIgnoreCase);
if (isArchive)
{
IEnumerable<ContainerItem>? nestedAttachments = itemParser.GetContainer();
if (nestedAttachments != null)
{
ExtractDataFromAttachments(nestedAttachments);
}
}
else
{
using (TextReader reader = itemParser.GetText())
{
}
}
}
}
catch (UnsupportedDocumentFormatException)
{
}
}
// Extracts text from all documents inside a ZIP/RAR archive
using (Parser parser = new Parser(archivePath))
{
IEnumerable<ContainerItem> attachments = parser.GetContainer();
if (attachments == null)
{
return;
}
ExtractDataFromAttachments(attachments);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment