remove non ascii chartors

Technical support and scripting issues

Moderators: JRL, Dorian (MJT support)

Post Reply
kpassaur
Automation Wizard
Posts: 696
Joined: Wed Jul 07, 2004 1:55 pm

remove non ascii chartors

Post by kpassaur » Sat Jun 24, 2006 11:59 am

I am trying to parse out a file that is not a text file. It is a file created by an application that stores database records as a single file for each record. The company does not offer an SDK

If I open a data file I can see what I want to capture in Note Pad; however, when I read it using Readln it skips the data that I want. I am sure the file is either compiled or something (When I open it in Word there are page breaks in it (this is not shown in NotePad).

This is where it gets confusing to me. If I open the file in Note Pad and copy its contents to the clipboard, then open up another instance of Note Pad and save it, I can read the saved (copied and pasted version).

I really don't want to to write a script that opens the file in Note Pad, copies the data into a new file and then parses it, and then deletes it as there are thousands of files.

Does anyone know of a way to strip out all the non ascii charctors and dump the data into the clipboard or something like that, or of a utility or dll that accepts all types of charactors and returns only the acsii ones.

User avatar
Bob Hansen
Automation Wizard
Posts: 2475
Joined: Tue Sep 24, 2002 3:47 am
Location: Salem, New Hampshire, US
Contact:

Post by Bob Hansen » Sun Jun 25, 2006 4:52 am

I use TextPad almost daily to parse files and manipulate text files.

Check at http://textpad.com/products/textpad/features.html for a partial list of features.

Note, this is a text editor, not a word processor. No special fonts, enhancements for fonts, size, bold, etc. But it is very powerful and has a strong Regular Expression Search/Replace feature which is probably the feature that you may find most useful for the problem you have described.

It also has syntax capabilities that can be used with scripts from Macro Scheduler.
Hope this was helpful..................good luck,
Bob
A humble man and PROUD of it!

User avatar
pgriffin
Automation Wizard
Posts: 460
Joined: Wed Apr 06, 2005 5:56 pm
Location: US and Europe

Post by pgriffin » Mon Jun 26, 2006 1:55 pm

Have you tried running your script from the editor with the Watch List open? I'm curious as to what value would be shown in the variable holding the ReadLn> result. If the variable is actually "incorrect" at that point, or is it changed by the WriteLn> Process?

I have done quite a bit of work filtering text. It is possible to write a very simple "Text Filter" in macroscript or you could probably invoke some VBScript to do the same.

but my main suggestion is to run the script while watching the values in the WatchList window. When is the ReadLn result populated, when does it change, what does it look like?

kpassaur
Automation Wizard
Posts: 696
Joined: Wed Jul 07, 2004 1:55 pm

What ReadLn is finding

Post by kpassaur » Mon Jun 26, 2006 2:20 pm

What happens is that some parts are skipped, so whatever is in the file is teling Readln there is no line there. Then it will read a line and put in charactors that look like a |.

I have finnally gotten to see the results with the characters removed with a program called View.exe.

However, since this is a database file, with the charactors removed and not replaced with something else I have no delimeters to read. The classic garbage in garbage out. I have found a DLL that is supposed to read the file; howver, it loads but does not read it. I have a message to the author of the DLL to see how that goes. What is surprising though is that one of the fiieds that I want is not listed as being available in his DLL. (This is also skipped with readln)

I have found on Rent a Coder someone took a project for 100 that reads all of these files, puts them in an SQL database and when they are changed in the SQL Database the origional files are also updated. That's not what I was looking to do but it is neat.

There is a sample file here

http://www.edocfile.com/downloads/AUT06060.brw

If you want to take a look at it - it has me puzzled

User avatar
pgriffin
Automation Wizard
Posts: 460
Joined: Wed Apr 06, 2005 5:56 pm
Location: US and Europe

Post by pgriffin » Mon Jun 26, 2006 3:40 pm

So, are you stating that once the data is in a MacroScript variable via ReadLn, it is already "too late" to further process the data since there are parts of the data missing?

...if you use ReadFile, is the data complete?

I downloaded your sample file. I'll take a look.

kpassaur
Automation Wizard
Posts: 696
Joined: Wed Jul 07, 2004 1:55 pm

Read File gives me nothing

Post by kpassaur » Mon Jun 26, 2006 4:14 pm

Read file gave me nothing. Readln gives me some, If I open it in Note Pad and save it, I can get more of it. What I really want to do is read it as a delimited file so I can pull whatever I want out of it quickly.

For Instance one of items is a Name that I want to get, I cannot get it with ReadLn but I can see it with Notepad. What I need to figure out is what they are using to separate the fields and find a way to replace it with something special. That would give me most of the informaion.

User avatar
Bob Hansen
Automation Wizard
Posts: 2475
Joined: Tue Sep 24, 2002 3:47 am
Location: Salem, New Hampshire, US
Contact:

Post by Bob Hansen » Mon Jun 26, 2006 4:24 pm

I downloaded your file and looked at it with TextPad. This is what I saw:

Image.

Not sure what you want to do with this, but I suspect that TextPad will be able to extract what you want.

TextPad has Regular Expressions can remove all non alpha-num characters.

Macro Scheduler also uses VisualBasic which also has Regular Expression capability.
Hope this was helpful..................good luck,
Bob
A humble man and PROUD of it!

kpassaur
Automation Wizard
Posts: 696
Joined: Wed Jul 07, 2004 1:55 pm

Pulling it out with text pad

Post by kpassaur » Mon Jun 26, 2006 4:40 pm

Bob, I looked at it with Text Pad and well, I don't want to do it manually. As there could be quite a few of these files and I may have missed it but I didn't see a way to use it in a batch mode from a Command Line.

With what is being displayed, it appears the little square boxes are some kind of delimiter. However, when I selected one in text pad and copied and pasted it into the search function it would find some but not all of the little boxes. So, I replaced them by deleteing them and tried again with one that was not found and it found some more and I deleted them as well. The real key here for me would be to have a script that finds these boxes and allows me to read the content between them.

So, I quess the real key here is to identify what those little boxes are and how to replace them or read what is between them.

User avatar
pgriffin
Automation Wizard
Posts: 460
Joined: Wed Apr 06, 2005 5:56 pm
Location: US and Europe

Post by pgriffin » Mon Jun 26, 2006 4:44 pm

Keith,

I am working on a filter to allow you to pull out only the characters you would like. In my first pass, I will take the approach to select only the good characters as opposed to identifying the bad characters. These records are very quirky, as you well know.

Should be just a few minutes more...

User avatar
pgriffin
Automation Wizard
Posts: 460
Joined: Wed Apr 06, 2005 5:56 pm
Location: US and Europe

Post by pgriffin » Tue Jun 27, 2006 1:40 am

Keith,

I have some VBScript to return the ascii code of any character in a string, but some of the characters in your file still don't return a valid ascii character. It was my hope that I could analyze the file to remove any ascii chars that were not wanted, but that is proving difficult....

...stay tuned...

User avatar
Bob Hansen
Automation Wizard
Posts: 2475
Joined: Tue Sep 24, 2002 3:47 am
Location: Salem, New Hampshire, US
Contact:

Post by Bob Hansen » Tue Jun 27, 2006 7:35 am

We could continue this, but we are getting off topic re Macro Scheduler.

I still think that TextPad can do this with Regular Expressions. I don't have time to look more closely right now, but suggest submitting request to their forum at http://www.textpad.info/forum/index.php

Providing a link to the image that I supplied earlier, will be very helpful. People there are like folks here at Macro Scheduler, very willing to help out, responses usually within a few hours or less.
Hope this was helpful..................good luck,
Bob
A humble man and PROUD of it!

Post Reply
Sign up to our newsletter for free automation tips, tricks & discounts