Unicode character - how to replace? XML validation error

Technical support and scripting issues

Moderators: Dorian (MJT support), JRL

Post Reply
User avatar
andrewbarbour
Newbie
Posts: 14
Joined: Thu Aug 26, 2010 7:09 pm

Unicode character - how to replace? XML validation error

Post by andrewbarbour » Mon Jun 22, 2015 2:59 pm

I am extracting the text from an Excel cell doing some manipulation and then writing it out as a line to an XML file.
When I try and validate it using http://xmlvalidation.com/ it tells me there is a problem with a character.... The validator tells me it is "Unicode: 0x1"

I have tried pasting the character into the script and just using StringReplace - but that did not work.

I can manually fix the file - but I would prefer to handle this in the script.

Anybody have any ideas??

User avatar
Marcus Tettmar
Site Admin
Posts: 7380
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Re: Unicode character - how to replace? XML validation error

Post by Marcus Tettmar » Wed Jun 24, 2015 12:20 pm

How are you writing it out? Can you share your code?
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

User avatar
andrewbarbour
Newbie
Posts: 14
Joined: Thu Aug 26, 2010 7:09 pm

Re: Unicode character - how to replace? XML validation error

Post by andrewbarbour » Wed Jun 24, 2015 2:22 pm

Thanks Marcus....

here is an extract of the relevant lines.

A few caveats:
- it is probably way to complicated - as I am a business guy - not a coder
- I know I should use subroutines - I tend to write my first proof of concept in a linear format - then start fresh and write it tight

The following are some of the character related issues I am having problem extracting from the Excel file.

RefID 404 - Authors
Non printing character between n and T of MartinT
accented character a in Fernandez
accented n in Penas
G. Bodes-Pardo, D. Pecos-MartinƒT. Gallego-Izquierdo, J. Salom-Moreno, C. Fernández-de-Las-Peñas, R. Ortega-Santiago

RefID 404 - Abstract
The +/- symbol does not get picked up properly
SD age, 39 ± 13 years),

RefID 505 - Authors
accented u and s
F. Büyükyılmaz, T. Aştı

Ref1077 - Adverse Events
Character between "P" and "0.73"
test p 0.73).

RefID 3571 - Authors
accented g
N. Yağci, F. Uygur and N. Bek

Code: Select all

Let>R=R-2
Let>ColToExtract=R

//Select extract file
Input>varDataExtract,Browse to the extracted data file,NA
Input>strURL,Which column contains a URL?  If none - put "0",0
If>varDataExtract=NA,END
XLOpen>%varDataExtract%,0,xlBook2
//Go through the data extract file and insert each row into the XML

Let>R=2
Let>ID=0
Label>DataExtractLoop
Let>ID=ID+1
Let>C=1
XLGetCell>xlBook2,Sheet1,R,C,strExt
Message>Processing line %R% of file.
If>strExt=##END##,ENDDataExtractLoop
LTrim>strExt,strExt
RTrim>strExt,strExt
WriteLn>%varPath%\%varShortName%.txt,nWLNRes,<Reference>
WriteLn>%varPath%\%varShortName%.txt,nWLNRes,<Head%C%>%strExt%</Head%C%>
WriteLn>%varPath%\%varShortName%.txt,nWLNRes,<id>%ID%</id>
Label>WithinItemLoop
Let>C=C+1
If>C>ColToExtract,NextRow
XLGetCell>xlBook2,Sheet1,R,C,strExt
//Get rid of leading and trailing spaces
LTrim>strExt,strExt
RTrim>strExt,strExt
//Clean up some characters that cause issues in XML
StringReplace>strExt,",",strExt
StringReplace>strExt,<,<,strExt
StringReplace>strExt,>,>,strExt
StringReplace>strExt,',',strExt
//Tried the following replace for non-printing character - did not work
//StringReplace>strExt,,,strExt

// Check to see if currently at the URL column
If>C=StrURL,WriteURL,WriteNonURL
Label>WriteNonURL
WriteLn>%varPath%\%varShortName%.txt,nWLNRes,<Head%C%>%strExt%</Head%C%>

//ADDED THE FOLLOWING FOR URLS
goto>WithinItemLoop
Label>WriteURL
WriteLn>%varPath%\%varShortName%.txt,nWLNRes,<Head%C%><![CDATA[<a href="%strExt%">Link</a>]]></Head%C%>
goto>WithinItemLoop

Label>NextRow
Let>R=R+1
WriteLn>%varPath%\%varShortName%.txt,nWLNRes,</Reference>
goto>DataExtractLoop

Label>ENDDataExtractLoop

// Close off the XML Report
WriteLn>%varPath%\%varShortName%.txt,nWLNRes,</Intro>
WriteLn>%varPath%\%varShortName%.txt,nWLNRes,</Refreport>
//rename file as XML
copyfile>%varPath%\%varShortName%.txt,%varPath%\%varShortName%.xml

Label>END

User avatar
Dorian (MJT support)
Automation Wizard
Posts: 1354
Joined: Sun Nov 03, 2002 3:19 am
Contact:

Re: Unicode character - how to replace? XML validation error

Post by Dorian (MJT support) » Fri Jun 26, 2015 9:37 pm

Hi Andrew,

I just replied to your ticket, but for the benefit of others, have attached a test script which is successfully using StringReplace to replace those Unicode characters, including the non printing ones. I simply copied and pasted the characters from your Excel file into the script editor.

Kind regards,

Dorian Ellis
MJT Support.
Yes, we have a Custom Scripting Service. Message me or go here

User avatar
andrewbarbour
Newbie
Posts: 14
Joined: Thu Aug 26, 2010 7:09 pm

Re: Unicode character - how to replace? XML validation error

Post by andrewbarbour » Sun Jun 28, 2015 3:43 pm

Works like a charm! Thanks.

Post Reply
Sign up to our newsletter for free automation tips, tricks & discounts