Sunday, 15 June 2014

How to Extract Text From PDF File Using C#.Net

6/15/2014 - By Pranav Singh 4

In this article I will show you how you can read the PDF text using iTextSharp in your c# application. In this article I will retrieve the pdf data and display to console application.


Now for this article first we will download the iTextSharp and now create a new console application and add the below code.

C#.Net
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using iTextSharp.text.pdf;
using iTextSharp.text;
using iTextSharp.text.pdf.parser;

namespace DemoConsoleApplication
{
    class Program
    {
        /// <summary>
        /// How to Extract Text From PDF File Using C#.Net
        /// </summary>
        /// <param name="args"></param>
        static void Main(string[] args)
        {
            /*In this you needed to put your pdf in Bin\Debug folder.
             * If you new put file in bin directory on that case you
             * have to provide drive name*/
            string pdfdata = ExtractTextFromPdf(@"report_grid.pdf");
            Console.WriteLine(pdfdata);
            System.Console.ReadLine();
        }
        public static string ExtractTextFromPdf(string path)
        {
            using (PdfReader reader = new PdfReader(path))
            {
                StringBuilder text = new StringBuilder();

                for (int i = 1; i <= reader.NumberOfPages; i++)
                {
                    text.Append(PdfTextExtractor.GetTextFromPage(reader, i));
                }

                return text.ToString();
            }
        }
    }
}

VB.Net
Imports System
Imports System.Collections.Generic
Imports System.Linq
Imports System.Text
Imports System.IO
Imports iTextSharp.text.pdf
Imports iTextSharp.text
Imports iTextSharp.text.pdf.parser

Namespace DemoConsoleApplication
    Class Program
        ''' <summary>
        ''' How to Extract Text From PDF File Using C#.Net
        ''' </summary>
        ''' <param name="args"></param>
        Private Shared Sub Main(ByVal args As String())
            'In this you needed to put your pdf in Bin\Debug folder.
            '             * If you new put file in bin directory on that case you
            '             * have to provide drive name

            Dim pdfdata As String = ExtractTextFromPdf("report_grid.pdf")
            Console.WriteLine(pdfdata)
            System.Console.ReadLine()
        End Sub
        Public Shared Function ExtractTextFromPdf(ByVal path As String) As String
            Using reader As New PdfReader(path)
                Dim text As New StringBuilder()

                Dim i As Integer = 1
                While i <= reader.NumberOfPages
                    text.Append(PdfTextExtractor.GetTextFromPage(reader, i))
                    System.Math.Max(System.Threading.Interlocked.Increment(i), i - 1)
                End While

                Return text.ToString()
            End Using
        End Function
    End Class
End Namespace

In above code path of the pdf file is most important. In this we don’t needed to specify the path of the pdf file. For testing purpose we just needed to put the file in bin directory or you can also provide the physical path of the pdf file.

check the example

 string pdfdata = ExtractTextFromPdf(@"report_grid.pdf");

                   OR
 string pdfdata = ExtractTextFromPdf(@"D:\report_grid.pdf");

Here is the pdf file





Now run the application to check the output.


DOWNLOAD

About the Author

We are the group of people who are expertise in different Microsoft technology like Asp.Net,MVC,C#.Net,VB.Net,Windows Application,WPF,jQuery,Javascript,HTML. This blog is designed to share the knowledge.

Get Updates

Subscribe to our e-mail newsletter to receive updates.

Share This Post

4 comments:

  1. how to display the same with table in webform instead of console

    ReplyDelete
    Replies
    1. Hi please try the below link

      http://stackoverflow.com/questions/6882098/how-can-i-get-text-formatting-with-itextsharp
      http://sourceforge.net/projects/pdftohtml/
      http://stackoverflow.com/questions/8123786/c-sharp-converting-pdf-to-html?lq=1

      Delete
  2. I'm not a developer, i always use this free online pdf to text converter to extract text from pdf online free.

    ReplyDelete
  3. The type or namespace name 'parser' does not exist in the namespace 'iTextSharp.text.pdf' (are you missing an assembly reference?)

    ReplyDelete

Please let me know your view

Free Ebooks


About Us

We are the group of people who are expertise in different Microsoft technology like Asp.Net,MVC,C#.Net,VB.Net,Windows Application,WPF,jQuery,Javascript,HTML. This blog is designed to share the knowledge.

Contact Us

For writing article in this website please send request by your

GMAIL ID: dotnetpools@gmail.com

Bugs and Suggestions

As we all know that this website is for sharing knowledge and providing proper solution. So while reading the article is you find any bug or if you have any suggestion please mail us at contact@aspdotnet-pools.com.

Partners


Global Classified : Connectseekers.com
© 2014 aspdotnet-pools.com Designed by Bloggertheme9.
back to top