你真正想做什么

Question

我有一个包含以下数据的文件。我只希望ownerId数字和profileID值以:分隔。

我的档案：

ObjectId("57a046a06f858a9c73b3468a"), "ownerId" : "923003345778", "profileId" : "FreeBundles,LBCNorthParentOffer", "instanceId" : null, "queuedFor" : "unassigned", "state" : "active", "createDateTime" : 1470121632, "startDateTime" : 1470121632, "expireDateTime" : 1485673632, "removeDateTime" : 1487747232, "extensionDateTime" : null, "cancelled" : false, "mode" : "onceOff", "nextMode" : "none", "profileData" : { "serviceProfileId" : "ecs19", "counter" : 1 } }
 ObjectId("57a046a06f858a9c73b34688"), "cancelled" : false, "createDateTime" : 1470121632, "expireDateTime" : 1557514799, "extensionDateTime" : null, "instanceId" : null, "mode" : "onceOff", "nextMode" : "none", "ownerId" : "923003345778", "profileData" : { "serviceProfileId" : "ecs19", "counter" : 1 }, "profileId" : "Prov3G,HLRProv", "queuedFor" : "unassigned", "removeDateTime" : 1557514799, "startDateTime" : 1470121632, "state" : "active" }
 ObjectId("56d48bd38a8b93baa708fcfa"), "ownerId" : "923003309452", "profileId" : "DiscountOnUsage,Segment04", "instanceId" : null, "queuedFor" : "unassigned", "state" : "active", "createDateTime" : 1456770003, "startDateTime" : 1456770003, "expireDateTime" : null, "removeDateTime" : null, "extensionDateTime" : null, "cancelled" : false, "mode" : "onceOff", "nextMode" : "none", "profileData" : { "serviceProfileId" : "ecs19", "counter" : 1 } }
 ObjectId("560ed95f6ca6e0703cf26fcc"), "cancelled" : false, "createDateTime" : 1443813727, "expireDateTime" : 1544381999, "extensionDateTime" : null, "instanceId" : null, "mode" : "onceOff", "nextMode" : "none", "ownerId" : "923003309452", "profileData" : { "serviceProfileId" : "ecs19", "counter" : 1 }, "profileId" : "Prov3G,HLRProv", "queuedFor" : "unassigned", "removeDateTime" : 1544381999, "startDateTime" : 1443813727, "state" : "active" }

输出：

923003345778 : FreeBundles,LBCNorthParentOffer

923003345778 : Prov3G,HLRProv

923003309452 : DiscountOnUsage,Segment04

923003309452 : Prov3G,HLRProv

如果有人知道，请详细解释我的答案。

Answer 1

$ sed 's/.*ObjectId("\([^"]*\).*"profileId" *: *"\([^"]*\).*/\1 : \2/' file
57a046a06f858a9c73b3468a : FreeBundles,LBCNorthParentOffer
57a046a06f858a9c73b34688 : Prov3G,HLRProv
56d48bd38a8b93baa708fcfa : DiscountOnUsage,Segment04
560ed95f6ca6e0703cf26fcc : Prov3G,HLRProv

我真的不认为需要任何解释，因为它很直接，但如果您有任何问题，请告诉我。

Answer 2

这是一个让你自己陷入困境的尴尬局面。

通常，您不希望使用sed等纯文本工具处理结构化数据。您提出的任何解决方案在格式更改时都会变得脆弱（例如JSON字段之间的空格或换行符），某些极端情况（例如带有引号的JSON字符串）很难处理它。如果你有JSON，你想使用JSON工具来处理它。

但是，你并没有完全拥有JSON。这是BSON（可能来自MongoDB）的文本表示，已经有一些部分被切断。

你真正想做什么

解决此问题的一种理智方法是让MongoDB为您提供JSON并让jq之类的内容进行格式化。一旦你有了一个合适的JSON文件，这就像

一样简单

jq -r '"\(.ownerId) : \(.profileId)"' file.json

mongoexport可能是您的朋友，或者在查询中将JSON.stringify()放在MongoDB shell ¹;这取决于你如何获得这些数据。这种方法需要你保存未加载的数据，但无论如何我怀疑无论是什么让你把BSON切成碎片都应该用类似的东西替换以提高可靠性。

¹如果您从MongoDB shell获取数据，您可能需要考虑在那里进行格式化。

如何使用sed

更深入地解决这个问题

但是，由于你目前没有适当的JSON，你可能想尝试用sed来解决这个问题。这是一个糟糕的想法，我不能强调你永远不会想要在生产环境中这样做。如果你这样做，你就会在比以前更深刻的混乱，这种恶性循环不是一个快乐的地方。

所以，我要告诉你的是你作为一次性匆忙做的事情，并且永远不会再次使用，因为你保证下次会正确地做到这一点。您想仔细检查结果。这里有：

sed 'h;/^.*"profileId"[[:space:]]*:[[:space:]]*"\([^"]*\)".*/!d;s//\1/;x;/^.*"ownerId"[[:space:]]*:[[:space:]]*"\([^"]*\)".*/!d;s//\1/;G;s/\n/ : /' file.bsonish

这对输入数据做出如下假设：

每行一个完整对象。在错误的地方换行将打破这一点。
"或ownerId字段

profileID

此外，它不会识别损坏的数据，这总是一个很好的功能。从好的方面来说，它不需要ownerId和profileId字段以任何特定顺序显示。

它的工作原理如下：

# Save a copy of the input data; we'll isolate the fields separately.
h

# See if there's a profileId field. If not, the line is silently dropped.
/^.*"profileId"[[:space:]]*:[[:space:]]*"\([^"]*\)".*/!d
# Isolate that profileId field. // in this context means: reuse the last
# regex (the big one)
s//\1/

# Now swap in the saved input data. We'll get ownerId next.
x
# Isolate ownerId as before. If there is no ownerId field, drop line silently.
/^.*"ownerId"[[:space:]]*:[[:space:]]*"\([^"]*\)".*/!d
s//\1/

# append profileId field in hold buffer to what we have
G

# Replace the newline between the two with a colon and some spaces.
s/\n/ : /

使用sed我想只在行

2 个答案:

你真正想做什么

如何使用sed