remove non ascii chartors
Moderators: JRL, Dorian (MJT support)
remove non ascii chartors
I am trying to parse out a file that is not a text file. It is a file created by an application that stores database records as a single file for each record. The company does not offer an SDK
If I open a data file I can see what I want to capture in Note Pad; however, when I read it using Readln it skips the data that I want. I am sure the file is either compiled or something (When I open it in Word there are page breaks in it (this is not shown in NotePad).
This is where it gets confusing to me. If I open the file in Note Pad and copy its contents to the clipboard, then open up another instance of Note Pad and save it, I can read the saved (copied and pasted version).
I really don't want to to write a script that opens the file in Note Pad, copies the data into a new file and then parses it, and then deletes it as there are thousands of files.
Does anyone know of a way to strip out all the non ascii charctors and dump the data into the clipboard or something like that, or of a utility or dll that accepts all types of charactors and returns only the acsii ones.
If I open a data file I can see what I want to capture in Note Pad; however, when I read it using Readln it skips the data that I want. I am sure the file is either compiled or something (When I open it in Word there are page breaks in it (this is not shown in NotePad).
This is where it gets confusing to me. If I open the file in Note Pad and copy its contents to the clipboard, then open up another instance of Note Pad and save it, I can read the saved (copied and pasted version).
I really don't want to to write a script that opens the file in Note Pad, copies the data into a new file and then parses it, and then deletes it as there are thousands of files.
Does anyone know of a way to strip out all the non ascii charctors and dump the data into the clipboard or something like that, or of a utility or dll that accepts all types of charactors and returns only the acsii ones.
- Bob Hansen
- Automation Wizard
- Posts: 2475
- Joined: Tue Sep 24, 2002 3:47 am
- Location: Salem, New Hampshire, US
- Contact:
I use TextPad almost daily to parse files and manipulate text files.
Check at http://textpad.com/products/textpad/features.html for a partial list of features.
Note, this is a text editor, not a word processor. No special fonts, enhancements for fonts, size, bold, etc. But it is very powerful and has a strong Regular Expression Search/Replace feature which is probably the feature that you may find most useful for the problem you have described.
It also has syntax capabilities that can be used with scripts from Macro Scheduler.
Check at http://textpad.com/products/textpad/features.html for a partial list of features.
Note, this is a text editor, not a word processor. No special fonts, enhancements for fonts, size, bold, etc. But it is very powerful and has a strong Regular Expression Search/Replace feature which is probably the feature that you may find most useful for the problem you have described.
It also has syntax capabilities that can be used with scripts from Macro Scheduler.
Hope this was helpful..................good luck,
Bob
A humble man and PROUD of it!
Bob
A humble man and PROUD of it!
Have you tried running your script from the editor with the Watch List open? I'm curious as to what value would be shown in the variable holding the ReadLn> result. If the variable is actually "incorrect" at that point, or is it changed by the WriteLn> Process?
I have done quite a bit of work filtering text. It is possible to write a very simple "Text Filter" in macroscript or you could probably invoke some VBScript to do the same.
but my main suggestion is to run the script while watching the values in the WatchList window. When is the ReadLn result populated, when does it change, what does it look like?
I have done quite a bit of work filtering text. It is possible to write a very simple "Text Filter" in macroscript or you could probably invoke some VBScript to do the same.
but my main suggestion is to run the script while watching the values in the WatchList window. When is the ReadLn result populated, when does it change, what does it look like?
What ReadLn is finding
What happens is that some parts are skipped, so whatever is in the file is teling Readln there is no line there. Then it will read a line and put in charactors that look like a |.
I have finnally gotten to see the results with the characters removed with a program called View.exe.
However, since this is a database file, with the charactors removed and not replaced with something else I have no delimeters to read. The classic garbage in garbage out. I have found a DLL that is supposed to read the file; howver, it loads but does not read it. I have a message to the author of the DLL to see how that goes. What is surprising though is that one of the fiieds that I want is not listed as being available in his DLL. (This is also skipped with readln)
I have found on Rent a Coder someone took a project for 100 that reads all of these files, puts them in an SQL database and when they are changed in the SQL Database the origional files are also updated. That's not what I was looking to do but it is neat.
There is a sample file here
http://www.edocfile.com/downloads/AUT06060.brw
If you want to take a look at it - it has me puzzled
I have finnally gotten to see the results with the characters removed with a program called View.exe.
However, since this is a database file, with the charactors removed and not replaced with something else I have no delimeters to read. The classic garbage in garbage out. I have found a DLL that is supposed to read the file; howver, it loads but does not read it. I have a message to the author of the DLL to see how that goes. What is surprising though is that one of the fiieds that I want is not listed as being available in his DLL. (This is also skipped with readln)
I have found on Rent a Coder someone took a project for 100 that reads all of these files, puts them in an SQL database and when they are changed in the SQL Database the origional files are also updated. That's not what I was looking to do but it is neat.
There is a sample file here
http://www.edocfile.com/downloads/AUT06060.brw
If you want to take a look at it - it has me puzzled
Read File gives me nothing
Read file gave me nothing. Readln gives me some, If I open it in Note Pad and save it, I can get more of it. What I really want to do is read it as a delimited file so I can pull whatever I want out of it quickly.
For Instance one of items is a Name that I want to get, I cannot get it with ReadLn but I can see it with Notepad. What I need to figure out is what they are using to separate the fields and find a way to replace it with something special. That would give me most of the informaion.
For Instance one of items is a Name that I want to get, I cannot get it with ReadLn but I can see it with Notepad. What I need to figure out is what they are using to separate the fields and find a way to replace it with something special. That would give me most of the informaion.
- Bob Hansen
- Automation Wizard
- Posts: 2475
- Joined: Tue Sep 24, 2002 3:47 am
- Location: Salem, New Hampshire, US
- Contact:
I downloaded your file and looked at it with TextPad. This is what I saw:
.
Not sure what you want to do with this, but I suspect that TextPad will be able to extract what you want.
TextPad has Regular Expressions can remove all non alpha-num characters.
Macro Scheduler also uses VisualBasic which also has Regular Expression capability.

Not sure what you want to do with this, but I suspect that TextPad will be able to extract what you want.
TextPad has Regular Expressions can remove all non alpha-num characters.
Macro Scheduler also uses VisualBasic which also has Regular Expression capability.
Hope this was helpful..................good luck,
Bob
A humble man and PROUD of it!
Bob
A humble man and PROUD of it!
Pulling it out with text pad
Bob, I looked at it with Text Pad and well, I don't want to do it manually. As there could be quite a few of these files and I may have missed it but I didn't see a way to use it in a batch mode from a Command Line.
With what is being displayed, it appears the little square boxes are some kind of delimiter. However, when I selected one in text pad and copied and pasted it into the search function it would find some but not all of the little boxes. So, I replaced them by deleteing them and tried again with one that was not found and it found some more and I deleted them as well. The real key here for me would be to have a script that finds these boxes and allows me to read the content between them.
So, I quess the real key here is to identify what those little boxes are and how to replace them or read what is between them.
With what is being displayed, it appears the little square boxes are some kind of delimiter. However, when I selected one in text pad and copied and pasted it into the search function it would find some but not all of the little boxes. So, I replaced them by deleteing them and tried again with one that was not found and it found some more and I deleted them as well. The real key here for me would be to have a script that finds these boxes and allows me to read the content between them.
So, I quess the real key here is to identify what those little boxes are and how to replace them or read what is between them.
Keith,
I am working on a filter to allow you to pull out only the characters you would like. In my first pass, I will take the approach to select only the good characters as opposed to identifying the bad characters. These records are very quirky, as you well know.
Should be just a few minutes more...
I am working on a filter to allow you to pull out only the characters you would like. In my first pass, I will take the approach to select only the good characters as opposed to identifying the bad characters. These records are very quirky, as you well know.
Should be just a few minutes more...
Keith,
I have some VBScript to return the ascii code of any character in a string, but some of the characters in your file still don't return a valid ascii character. It was my hope that I could analyze the file to remove any ascii chars that were not wanted, but that is proving difficult....
...stay tuned...
I have some VBScript to return the ascii code of any character in a string, but some of the characters in your file still don't return a valid ascii character. It was my hope that I could analyze the file to remove any ascii chars that were not wanted, but that is proving difficult....
...stay tuned...
- Bob Hansen
- Automation Wizard
- Posts: 2475
- Joined: Tue Sep 24, 2002 3:47 am
- Location: Salem, New Hampshire, US
- Contact:
We could continue this, but we are getting off topic re Macro Scheduler.
I still think that TextPad can do this with Regular Expressions. I don't have time to look more closely right now, but suggest submitting request to their forum at http://www.textpad.info/forum/index.php
Providing a link to the image that I supplied earlier, will be very helpful. People there are like folks here at Macro Scheduler, very willing to help out, responses usually within a few hours or less.
I still think that TextPad can do this with Regular Expressions. I don't have time to look more closely right now, but suggest submitting request to their forum at http://www.textpad.info/forum/index.php
Providing a link to the image that I supplied earlier, will be very helpful. People there are like folks here at Macro Scheduler, very willing to help out, responses usually within a few hours or less.
Hope this was helpful..................good luck,
Bob
A humble man and PROUD of it!
Bob
A humble man and PROUD of it!