In this article I will show you how you can read the PDF
text using iTextSharp in your c# application. In this article I will retrieve the
pdf data and display to console application.
So here are some of my article are as follows: How
to Read a Text File Line by Line in C# in Console Application, Multiple
File Upload With Asp.Net MVC C# and HTML5 | How to upload files to ASP.NET MVC
application, Drag
Drop Cells in GridView Control Using Asp.net C# and jQuery.
Now for this article first we will download the iTextSharp
and now create a new console application and add the below code.
C#.Net
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using iTextSharp.text.pdf;
using iTextSharp.text;
using iTextSharp.text.pdf.parser;
namespace DemoConsoleApplication
{
class Program
{
/// <summary>
/// How to Extract Text From PDF File Using C#.Net
/// </summary>
/// <param
name="args"></param>
static
void Main(string[]
args)
{
/*In
this you needed to put your pdf in Bin\Debug folder.
* If you new
put file in bin directory on that case you
* have to
provide drive name*/
string
pdfdata = ExtractTextFromPdf(@"report_grid.pdf");
Console.WriteLine(pdfdata);
System.Console.ReadLine();
}
public
static string
ExtractTextFromPdf(string path)
{
using
(PdfReader reader = new PdfReader(path))
{
StringBuilder
text = new StringBuilder();
for
(int i = 1; i <= reader.NumberOfPages;
i++)
{
text.Append(PdfTextExtractor.GetTextFromPage(reader, i));
}
return
text.ToString();
}
}
}
}
|
VB.Net
Imports System
Imports System.Collections.Generic
Imports System.Linq
Imports System.Text
Imports System.IO
Imports iTextSharp.text.pdf
Imports iTextSharp.text
Imports iTextSharp.text.pdf.parser
Namespace DemoConsoleApplication
Class Program
'''
<summary>
''' How to
Extract Text From PDF File Using C#.Net
'''
</summary>
'''
<param name="args"></param>
Private
Shared Sub
Main(ByVal args As
String())
'In
this you needed to put your pdf in Bin\Debug folder.
' * If you new put file in bin
directory on that case you
' * have to provide drive name
Dim
pdfdata As String
= ExtractTextFromPdf("report_grid.pdf")
Console.WriteLine(pdfdata)
System.Console.ReadLine()
End Sub
Public
Shared Function
ExtractTextFromPdf(ByVal path As String) As String
Using
reader As New
PdfReader(path)
Dim
text As New
StringBuilder()
Dim
i As Integer
= 1
While
i <= reader.NumberOfPages
text.Append(PdfTextExtractor.GetTextFromPage(reader, i))
System.Math.Max(System.Threading.Interlocked.Increment(i), i - 1)
End
While
Return
text.ToString()
End
Using
End Function
End Class
End Namespace
|
In above code path
of the pdf file is most important. In this we don’t needed to specify the path
of the pdf file. For testing purpose we just needed to put the file in bin
directory or you can also provide the physical path of the pdf file.
check the example
string pdfdata = ExtractTextFromPdf(@"report_grid.pdf");
OR
string pdfdata = ExtractTextFromPdf(@"D:\report_grid.pdf");
Here is the pdf file

Now run the application
to check the output.
DOWNLOAD
how to display the same with table in webform instead of console
ReplyDeleteHi please try the below link
Deletehttp://stackoverflow.com/questions/6882098/how-can-i-get-text-formatting-with-itextsharp
http://sourceforge.net/projects/pdftohtml/
http://stackoverflow.com/questions/8123786/c-sharp-converting-pdf-to-html?lq=1
I'm not a developer, i always use this free online pdf to text converter to extract text from pdf online free.
ReplyDeleteThe type or namespace name 'parser' does not exist in the namespace 'iTextSharp.text.pdf' (are you missing an assembly reference?)
ReplyDelete