Friday, May 18, 2012 Register  Login

This site uses DNS Made Easy. Use it for reliable and professional DNS services.

RSS Feeds
Categories
  
Blog Archives
  
Blog

Better file encoding detection for Text File Splitter

A Text File Splitter user had a problem with the file encodings. This is a known issue, that I'm happy to report is almost resolved. He gave me a file for testing, and that has helped enormously. Thanks Zhou!

Work on Text File Splitter 3.0 has crawled, but I had already converted the code from 2.2.1 over to .NET 4.0. I decided to just finish this work, and release it as 2.5.0.

Here's a screenshot of Text File Splitter detecting a UTF-8 file without the Byte Order Mark (BOM):

Here's a screenshot of a file with a BOM:

I'm using a library called "ude", which is a C# port of the Mozilla Universal Charset Detector. I had to put a bug fix to deal with very large files. At least the file encoding detection, first half of this feature, is now done. Now I need to deal with the encoding on the file chunks. This has taken a lot more time and code than I expected. Hopefully, this will solve this nagging issue once and for all.

I don't have a date for when this version will be released. I still need to update pages in the new wiki (http://docs.systemwidgets.com). You guys will be able to start creating your own splitting strategies, once I get all of this work done. The wiki talks about version 3.0, but you will be able to do this with version 2.5.

posted @ Friday, January 20, 2012 11:28 PM by Hector Sosa, Jr

Actions:Tweet This Share on Facebook Share on LinkedIn Emakl Permalink del.icio.us

Previous Page | Next Page

COMMENTS

Name (required)

Email (required)

Website

CAPTCHA image
Enter the code shown above:

Terms Of Use | Privacy Statement | SystemWidgets
Copyright 2002-2012 by SystemWidgets
Google Analytics Alternative