Language polyglots could be detected by scanning for the tokens in all recognized languages and counting occurrences. Esolangs often ignore any text outside their grammar, making them ideal for writing polyglots.
Multi-character tokens could be scanned by Aho–Corasick with overlapping search:
[Space]
[Tab]
[LF]
hoo
hooo
hoooo
hoos
hooos
hoooos
wraagh
wraaagh
wraaaagh
Ook.
Ook?
Ook!
func
let
if
elsif
else
while
return
==
!=
<=
>=
+=
-=
*=
/=
%=
(perhaps less false positives when including colons
after keywords)Single-character tokens could be detected by simple UTF-8 codepoint iteration and lookups:
草
泥
马
河
蟹
ウ
ホ
ッ
ー
番
茄
干
水
沝
淼
>
<
+
-
.
,
[
]
0
1
~
#
@
$
[
]
{
}
-
|
:
;
\
/
(
)
>
<
^
v
*
`
+
-
*
/
÷
%
^
&
o
x
=
!
≠
>
G
≥
<
L
≤
i
d
s
o
(
)
{
}
[
]
<
>
=
:
;
,
+
-
*
/
%
!
&