-
-
Save kuba-orlik/999ca634dba613ba6a1c to your computer and use it in GitHub Desktop.
| #include "stdafx.h" | |
| #include <wtypes.h> | |
| #include <comutil.h> | |
| #pragma comment(lib,"comsuppw.lib") | |
| #include <string> | |
| #include <string.h> | |
| #include <stdio.h> | |
| using namespace std; | |
| string bstr_to_str(BSTR source){ | |
| //source = L"lol2inside"; | |
| _bstr_t wrapped_bstr = _bstr_t(source); | |
| int length = wrapped_bstr.length(); | |
| char* char_array = new char[length]; | |
| strcpy_s(char_array, length+1, wrapped_bstr); | |
| return char_array; | |
| } | |
| int _tmain(int argc, _TCHAR* argv[]){ | |
| BSTR bstr_var = SysAllocString(L"I am bstr"); | |
| string str = bstr_to_str(bstr_var); | |
| printf("result: %s\n", str.c_str());//result: I am bstr | |
| getchar(); | |
| return 0; | |
| } | |
Hi,
this handling is problematic AFAICS.
While BSTR is wide-typed (thus: Unicode-compliant; UTF-16 / UTF-32), this post-conversion std::string content most likely is not Unicode-compliant (...any more!!!), since: ACP (activeCodePage) legacy codepages crap (non-compliance decay site is: _bstr_t query at strcpy_s() line).
However, any byte-typed string representation pretty much MUST be using UTF-8 encoding (std::string[-means-utf8]), else not PROPERLY CONSISTENTLY PRESERVING *) Unicode compliance (a common restriction/weakness of byte-typed encodings) ===> DATA CORRUPTION bug level type very easily ensuing.
(woefully regional-specific-restricted codepages encoding protocol crap MUST NOT be used - unless actually required: to correctly fulfill existing established legacy protocol situations)
https://utf8everywhere.org/
*) firmly consistently end-to-end(!!)
Question would be how transcoding (here: UTF-16 -> UTF-8) would ideally be done then.
Perhaps consume ATL dependency (atlconv.h).
HOWEVER, WARNING:
Micro$oft atlconv.h transcoding is crappy not protocol-consistent handling (c.f. source comment // Codepage doesn't matter) -
IOW several macro variants which are Win32 T protocol affected, CA2CT etc. - DO NOT do required transcoding (ACP to UTF-8 is a valid (representable!) **) transcoding transition, thus MUST be carried out - but it isn't!!!! ===> DATA CORRUPTION bug level type).
**) well, for most ACP codepages (unless there are codepoints [mapping] support issues), I'd think...
So, probably it is a much better idea to
instead be consuming transcoding functionality that is properly cross-platform and simple (plain Unicode-compliant encodings subset support only) and rather more std::string-typed-based. E.g.
codecvtor boost::locale::conv or so.
_bstr_t::length() consumption might be problematic (!CONSISTENCY) -
it possibly (yay MSDN docs crap!!) returns the number of wide-typed elements, which definitely often is NOT what the _bstr_t::operator char*() side (ACP-transcoded other side?) actually has.
===> get char-typed side ***) then determine its [actual] length (strlen()).
***) ...but of course that one is still not Unicode-compliant (since ACP-crap-broken - except probably for ACP UTF-8 config setting situation, in some(!) newer Windows 10/11 environment situations)
This strcpy_s() handling probably has off-by-1 bug type (buffer allocation 1 less than specified).
new char[length] is a MEMORY RESOURCE LEAK bug level type
===> probably should assign to an actually named std::string variable, then free raw memory resource, then return that std::string.
Annoying extra and woefully ALWAYS ****) b0rken since unsafe (often non-RAII - so WhyTH not std::vector??) external raw allocation activity most likely can be avoided anyway, by using std::string stuff directly -
see Create a C++ string using printf-style formatting
****) ample colorful personal experience...
This sample possibly should be explicitly consuming <tchar.h>, for its _tmain (etc.?) consumption - IWYU.
using namespace generally is rather not recommended (scope pollution - even up to such relatively restricted/controlled scope situations) - risk of symbol conflicts... (potentially silent! Thus NOT Fail-Fast / Shift-Left)
Filesystem item names (here: "BSTR to std-string.cpp") better should not contain special characters such as spaces -
POSIX shell IFS separator config default is space, thus
having "nice" effects with
"simple" (not specially customized) shell command execution such as
find|xargs grep Foo
BTW there is a detailed article explaining various direct (i.e., non-transcoding!!) assignments of string types, at
How to convert between different types of counted-string string types.
Peripheral side note: filesystem API (boost::filesystem, std::filesystem) interface behaviour/usability is extremely problematic, due to
Windows-specific ACP[-broken]-hampered behaviour (c.f. required u8path / u8string workaround helpers; alternative workaround: feeding Unicode-compliant wide-typed data). See e.g.:
- [std-discussion] std::filesystem support for UTF-8 encoded std::string(s)
- I'm a bit concerned about the direction of
std::filesystem::path.
(and several other Internet activities)
HTH and HAND!
Thank you