Announcement

Collapse
No announcement yet.

[SOLVED] CSV and UTF8 Encoding

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • [SOLVED] CSV and UTF8 Encoding

    Hi guys,

    We're trying to upload data in Chinese characters but it seems like CSV doesn't support UTF8 encoding, so has anyone been able to upload double byte (Asian languages) data into OTM using CSV? Or must we use XML? Thanks!

    Kee

  • #2
    Re: CSV and UTF8 Encoding

    Hi Kee,

    If you are doing it on OTM 5.0, you should not be hitting this problem - Metalink is the next option. Otherwise if it is OTM 5.5 you are loading into then read on!

    Yes you can load UTF-8 CSV files (in 5.0 and 5.5). However, are you doing this in Windows using Notepad or Excel? If you are, then you will have problems loading it into OTM 5.5 because Windows automatically pads the BOM characters (3 bytes) into the beginning of the file.

    You need to strip the BOM characters first before loading the file via CSV upload.

    I have been loading UTF-8 CSV files on 5.0 regularly so it should not be an issue but when we started using 5.5, the BOM gave us quite a bit of headaches.

    Ps: I have passed a tool called Unicsved to Simon. This is a tool that will allow you to save unicode text files (tab delimited etc) to CSV format for loading
    Last edited by ianlo; June 14, 2007, 12:33.

    Comment


    • #3
      Re: CSV and UTF8 Encoding

      Hi Ian,

      Indeed we're trying to upload UTF-8 CSV in 5.5, thank you so much for your help!

      Kee

      Comment


      • #4
        Re: CSV and UTF8 Encoding

        Hi Kee,

        No problem. This also applies to XML files btw. If you need a BOM stripper for XML, you can download Xerces from apache.org and compile their example programs. There is a DOMPrint program that can remove the BOM.

        Comment


        • #5
          Re: CSV and UTF8 Encoding

          Hi Ian,

          Perhaps i'm not using the tool correctly as I'm not able to preserve the the chinese char despite encoding it in UTF-8 w/o BOM, below is the process I went thru,

          1. Save LOCATION.txt (exported from OTM) in as text file UTF-8 using Notepad
          2. Open LOCATION.txt using uniCSVed and encode it in UTF-8 wo BOM, save it as a CSV file.
          3. Open the LOCATION.csv file in Excel to add/format data to be uploaded
          4. Save the LOCATION.csv in Excel (all chinese char turn into after the save)

          Any thoughts/suggestions, thanks!

          kee

          Comment


          • #6
            Re: CSV and UTF8 Encoding

            Hi Kee,

            The problem is that Excel cannot save csv in UTF-8 format. It can only save UTF-8 as a tab delimited file. (save as Unicode)

            You should save your modified LOCATION.csv file as a tab delimited UTF-8 text file and then use UniCSVed to save it as a CSV w/o BOM.

            Hope this helps!

            You can call me if you need any help or look me up in Skype

            Ian

            Comment

            Working...
            X