EDIT: Correct solution was provided by Ulrich Eckhardt. The HTML report did not contain the metachar set and was interpreting a different encoding. By putting this snippet into the HTML report’s head, the issue was solved.
<head> <meta charset="UTF-8"> </head>
I’ve run into an issue where طريق دخان appears like Ø·Ø±ÙŠÙ‚ Ø¯Ø®Ø§Ù† on an HTML report, using python 2.7 running on a centos7 docker container. (And other non-ascii letters also appear with the same issue)
The same script on my local machine displays the characters correctly, the problem is probably some environment setting that I didn’t add in the dockerfile.
I’d like to either know what docker setting I’m missing, or what encoding issue causes طريق دخان to convert to Ø·Ø±ÙŠÙ‚ Ø¯Ø®Ø§Ù†
Quick overview of the script:
- The script downloads a JSON file that contains street names (such as
- On the raw JSON file, that name would appear like this:
- The JSON is fetched using requests.get(), which should auto-convert to unicode.
- The script would output the unicode strings into an HTML report
I’ve modified this library code slightly to work with unicode. (Otherwise it would run into an error: ‘ascii’ codec can’t encode character: ordinal not in range(128) ) Now it’ll encode the cell into utf-8, before converting it into a string.
if(type(self.text) == unicode): text = str((self.text).encode('utf-8')) else: text = str(self.text)
On my local machine, the HTML report would have cells that correctly display the non-ascii letters on google chrome.
When the same script is run on docker, the HTML report has outputs that look like this: Ø·Ø±ÙŠÙ‚ Ø¯Ø®Ø§Ù† on google chrome.
I wish I could run this on python 3, but I’m stuck with 2.7 :[
I’ve tried to add these things to the dockerfile without success:
- ENV PYTHONIOENCODING=utf-8
- RUN yum -y -q reinstall glibc-common
- RUN locale-gen en_US.UTF-8
- ENV LANG en_US.UTF-8
- ENV LANGUAGE en_US:en
- ENV LC_ALL en_US.UTF-8